optimal size for Hbase.hregion.memstore.flush.size and its impact
Hi all, The default size of Hbase.hregion.memstore.flush.size is define as 128 MB for Hbase.hregion.memstore.flush.size. Could anyone kindly explain what would be the impact if we increase this to a higher value 512 MB or 800 MB or higher. We have a very write heavy cluster. Also we run periodic end point co processor based jobs that operate on the data written in the last 10-15 mins, every 10 minute. We are trying to manage the memstore flush operations such that the hot data remains in memstore for at least 30-40 mins or longer, so that the job hits disk every 3rd or 4th time it tries to operate on the hot data (it does scan). We have region server heap size of 20 GB and set the, hbase.regionserver.global.memstore.lowerLimit = .45 hbase.regionserver.global.memstore.upperLimit = .55 We observed that if we set the Hbase.hregion.memstore.flush.size=128MB only 10% of the heap is utilized by memstore, after that memstore flushes. At Hbase.hregion.memstore.flush.size=512MB, we are able to increase the heap utelization to by memstore to 35%. It would be very helpful for us to understand the implication of higher Hbase.hregion.memstore.flush.size for a long running cluster. Thanks, Gautam
How to scan only Memstore from end point co-processor
Hi all, Here is our use case, We have a very write heavy cluster. Also we run periodic end point co processor based jobs that operate on the data written in the last 10-15 mins, every 10 minute. Is there a way to only query in the MemStore from the end point co-processor? The periodic job scans for data using a time range. We would like to implement a simple logic, a. if query time range is within MemStore's TimeRangeTracker, then query only memstore. b. If end Time of the query time range is within MemStore's TimeRangeTracker, but query start Time is outside MemStore's TimeRangeTracker (memstore flush happened), then query both MemStore and Files. c. If start time and end time of the query is outside of MemStore TimeRangeTracker we query only files. The incoming data is time series and we do not allow old data (out of sync from clock) to come into the system(HBase). Cloudera has a scanner org.apache.hadoop.hbase.regionserver.InternalScan, that has methods like checkOnlyMemStore() and checkOnlyStoreFiles(). Is this available in Trunk? Also, how do I access the Memstore for a Column Family in the end point co-processor from CoprocessorEnvironment?
Re: How to scan only Memstore from end point co-processor
Thanks Vladimir. We will try this out soon. Regards, Gautam On Mon, Jun 1, 2015 at 12:22 AM, Vladimir Rodionov vladrodio...@gmail.com wrote: InternalScan has ctor from Scan object See https://issues.apache.org/jira/browse/HBASE-12720 You can instantiate InternalScan from Scan, set checkOnlyMemStore, then open RegionScanner, but the best approach is to cache data on write and run regular RegionScanner from memstore and block cache. best, -Vlad On Sun, May 31, 2015 at 11:45 PM, Anoop John anoop.hb...@gmail.com wrote: If your scan is having a time range specified in it, HBase internally will check this against the time range of files etc and will avoid those which are clearly out of your interested time range. You dont have to do any thing for this. Make sure you set the TimeRange for ur read -Anoop- On Mon, Jun 1, 2015 at 12:09 PM, ramkrishna vasudevan ramkrishna.s.vasude...@gmail.com wrote: We have a postScannerOpen hook in the CP but that may not give you a direct access to know which one are the internal scanners on the Memstore and which one are on the store files. But this is possible but we may need to add some new hooks at this place where we explicitly add the internal scanners required for a scan. But still a general question - are you sure that your data will be only in the memstore and that the latest data would not have been flushed by that time from your memstore to the Hfiles. I see that your scenario is write centric and how can you guarentee your data can be in memstore only? Though your time range may say it is the latest data (may be 10 to 15 min) but you should be able to configure your memstore flushing in such a way that there are no flushes happening for the latest data in that 10 to 15 min time. Just saying my thoughts here. On Mon, Jun 1, 2015 at 11:46 AM, Gautam Borah gbo...@appdynamics.com wrote: Hi all, Here is our use case, We have a very write heavy cluster. Also we run periodic end point co processor based jobs that operate on the data written in the last 10-15 mins, every 10 minute. Is there a way to only query in the MemStore from the end point co-processor? The periodic job scans for data using a time range. We would like to implement a simple logic, a. if query time range is within MemStore's TimeRangeTracker, then query only memstore. b. If end Time of the query time range is within MemStore's TimeRangeTracker, but query start Time is outside MemStore's TimeRangeTracker (memstore flush happened), then query both MemStore and Files. c. If start time and end time of the query is outside of MemStore TimeRangeTracker we query only files. The incoming data is time series and we do not allow old data (out of sync from clock) to come into the system(HBase). Cloudera has a scanner org.apache.hadoop.hbase.regionserver.InternalScan, that has methods like checkOnlyMemStore() and checkOnlyStoreFiles(). Is this available in Trunk? Also, how do I access the Memstore for a Column Family in the end point co-processor from CoprocessorEnvironment?
impact of using higher Hbase.hregion.memstore.flush.size=512MB
Hi all, The default size of Hbase.hregion.memstore.flush.size is define as 128 MB . Could anyone kindly explain what would be the impact if we increase this to a higher value 512 MB or 800 MB or higher. We have a very write heavy cluster. Also we run periodic end point co processor based jobs that operate on the data written in the last 10-15 mins, every 10 minute. We are trying to manage the memstore flush operations such that the hot data remains in memstore for at least 30-40 mins or longer, so that the job hits disk every 3rd or 4th time it tries to operate on the hot data (it does scan). We have region server heap size of 20 GB and set the, hbase.regionserver.global.memstore.lowerLimit = .45 hbase.regionserver.global.memstore.upperLimit = .55 We observed that if we set the Hbase.hregion.memstore.flush.size=128MB only 10% of the heap is utilized by memstore, after that memstore flushes. At Hbase.hregion.memstore.flush.size=512MB, we are able to increase the heap utilization to by memstore to 35%. It would be very helpful for us to understand the implication of higher Hbase.hregion.memstore.flush.size for a long running cluster. Thanks, Gautam
Re: impact of using higher Hbase.hregion.memstore.flush.size=512MB
Hi Esteban, Thanks for your response. hbase.rs.cacheblocksonwrite would be very useful for us. We have set hbase.regionserver.maxlogs appropriately to avoid flush across memstores. Also set hbase.regionserver.optionalcacheflushinterval to 0 to disable periodic flushing, we do not write anything by passing the WAL. We are running the cluster with conservative limits, so that if a region server crashes, others can take the extra load without hitting the memstore flushing limits. We are running the cluster now at 800MB flush size, initial job runs are fine. We will run it for couple of days and check the status. Thanks again. Gautam On Wed, May 27, 2015 at 2:15 PM, Esteban Gutierrez este...@cloudera.com wrote: Gautam, Yes, you can increase the size of the memstore to values larger to 128MB but usually you go by increasing hbase.hregion.memstore.block.multiplier only. Depending on the version of HBase you are running many things can happen, e.g. multiple memstores can be flushed at once and/or the memstores will be flushed if there are some rows in memory (30 million) or if the store hasn't been flushed in an hour, the rate of the flushes can be tuned and also if you are hitting the max number of HLogs that can trigger a flush. One problem running with large memstores is mostly how many regions you will have per RS and if using some encoding and/or compression codec is being used might cause the flush to take longer or use more CPU resources or push back clients b/c you haven't flushed some regions to disk. Based on the the behavior that you have described on the heap utilization sounds like you are not fully utilizing the memstores and you are below the lower limit, so depending on the version of HBase and available resources you might want to use hbase.rs.cacheblocksonwrite instead to keep some of the hot data in the block cache. cheers, esteban. -- Cloudera, Inc. On Wed, May 27, 2015 at 1:58 PM, Gautam Borah gbo...@appdynamics.com wrote: Hi all, The default size of Hbase.hregion.memstore.flush.size is define as 128 MB . Could anyone kindly explain what would be the impact if we increase this to a higher value 512 MB or 800 MB or higher. We have a very write heavy cluster. Also we run periodic end point co processor based jobs that operate on the data written in the last 10-15 mins, every 10 minute. We are trying to manage the memstore flush operations such that the hot data remains in memstore for at least 30-40 mins or longer, so that the job hits disk every 3rd or 4th time it tries to operate on the hot data (it does scan). We have region server heap size of 20 GB and set the, hbase.regionserver.global.memstore.lowerLimit = .45 hbase.regionserver.global.memstore.upperLimit = .55 We observed that if we set the Hbase.hregion.memstore.flush.size=128MB only 10% of the heap is utilized by memstore, after that memstore flushes. At Hbase.hregion.memstore.flush.size=512MB, we are able to increase the heap utilization to by memstore to 35%. It would be very helpful for us to understand the implication of higher Hbase.hregion.memstore.flush.size for a long running cluster. Thanks, Gautam
Re: Hbase row ingestion ..
Thanks Guys for responding! Michael, I indeed should have elaborated on our current rowkey design. Re: hotspotting, We'r doing exactly what you'r suggesting, i.e. fanning out into buckets where the bucket location is a hash(message_unique_fields) (we use murmur3). So our write pattern is extremely even on the regions and region-servers. We also pre-split our table into 480 buckets (that number is based on our experience with the rate of change of cluster size). So no complaints on the relative load on regions. We'v designed the rowkey as per our usecase and are pretty happy with it. I'm happy to keep the rowkey size the way it is but was concerned that we redundantly write that very rowkey for each column (which isn't really needed). This column qualifier optimization is over and above what we'r already doing to scale on writes. And was wondering if that could get use improvements on write times. But I could be wrong if that cost, of repeating rowkey for each cell, is purely incurred on the RS side and doesn't affect the write call directly. Lemme also point out we'r on Hbase 0.98.6 currently. James, That talk is awesome sauce! Especially the way you guys analyzed your design with that lovely visualization. Any chance that's on a github repo :-) ? Would be extremely useful for folks like us. Rowkey design has been the center of our attention for weeks/months on end and a quicker feedback loop like this viz would really speed up that process. Thanks again guys. All of this helps. -Gautam. On Thu, Apr 30, 2015 at 7:35 AM, James Estes james.es...@gmail.com wrote: Guatam, Michael makes a lot of good points. Especially the importance of analyzing your use case for determining the row key design. We (Jive) did a talk at HBasecon a couple years back talking about our row key redesign to vastly improve performance. It also talks a little about the write path and has a (crude) visualization of the impact of the old and new row key designs. Your use case is likely different than ours was, but it may be helpful to hear our experience with row key design http://www.cloudera.com/content/cloudera/en/resources/library/hbasecon/video-hbasecon-2012-real-performance-gains-with-real-time-data.html James On Apr 30, 2015, at 7:51 AM, Michael Segel michael_se...@hotmail.com wrote: I wouldn’t call storing attributes in separate columns a ‘rigid schema’. You are correct that you could write your data as a CLOB/BLOB and store it in a single cell. The upside is that its more efficient. The downside is that its really an all or nothing fetch and then you need to write the extra code to pull data from the Avro CLOB. (Which does fit your use case.) This is a normal pattern and gives HBase an extra dimension of storage. With respect to the row key… look at your main use case. The size of the row key may be a necessary evil in terms of getting the unique document. (clob/blob). In terms of performance gains… you need to look at it this way… the cost of inserting a row is what it is. There will always be a cost for insertion. There will always be a minimum rowkey size required by your use case. The next issue is if you are ‘hot spotting’. Note that I’m not talking about the initial start of loading in to a table, but if all of your data is going to the last region written because the rowkey is sequential. Here, you may look at hashing the rowkey (SHA-1 or SHA-2) which may shrink your row key (depending on your current rowkey length). The downside here is that you will lose your ability to perform range scans. So if your access pattern is get() rather than scan(), this will work. Note too that I recommended SHA-1 or SHA-2 for the hash. MD5 works, and is faster, but there’s a greater chance of a hash collision. SHA-1 has a mathematical chance of a collision depending on data set, but I’ve never heard of anyone finding a collision. SHA-2 doesn’t have that problem, but I don’t know if its part of the core java packages. Again here, the upside is that you’re going to get a fairly even distribution across your cluster. (Which you didn’t describe. That too could be a factor in performance.) HTH On Apr 29, 2015, at 8:03 PM, Gautam gautamkows...@gmail.com wrote: Thanks for the quick response! Our read path is fairly straightforward and very deterministic. We always push down predicates at the rowkey level and read the row's full payload ( never do projection/filtering over CQs ). So.. I could, in theory, expect a gain as much as the current overhead of [ 40 * sizeof(rowkey) ] ? Curious to understand more about how much of that overhead is actually incurred over the network and how much on the RS side. At least to the extent it affects the put() / flush() calls. Lemme know if there are particular parts of the code or documentation I should be looking at for this. Would like to learn about the memory/netwokr
Re: Hbase row ingestion ..
.. I'd like to add that we have a very fat rowkey. - Thanks. On Wed, Apr 29, 2015 at 5:30 PM, Gautam gautamkows...@gmail.com wrote: Hello, We'v been fighting some ingestion perf issues on hbase and I have been looking at the write path in particular. Trying to optimize on write path currently. We have around 40 column qualifiers (under single CF) for each row. So I understand that each put(row) written into hbase would translate into 40 (rowkey, cq, ts) cells in Hbase. If I switched to an Avro object based schema instead there would be a single (rowkey, avro_cq, ts) cell per row ( all fields shoved into a single Avro blob). Question is, would this approach really translate into any write-path perf benefits? Cheers, -Gautam. -- If you really want something in this life, you have to work for it. Now, quiet! They're about to announce the lottery numbers...
Hbase row ingestion ..
Hello, We'v been fighting some ingestion perf issues on hbase and I have been looking at the write path in particular. Trying to optimize on write path currently. We have around 40 column qualifiers (under single CF) for each row. So I understand that each put(row) written into hbase would translate into 40 (rowkey, cq, ts) cells in Hbase. If I switched to an Avro object based schema instead there would be a single (rowkey, avro_cq, ts) cell per row ( all fields shoved into a single Avro blob). Question is, would this approach really translate into any write-path perf benefits? Cheers, -Gautam.
Re: Increasing write throughput..
Thanks Anoop, Ted for the replies. This helped me understand Hbase's write path a lot more. After going through the literature and your comments on what triggers memstore flushes, Did the following : - Added 4 nodes ( all 8+4 = 12 RSs have 48000M heap each) - changed hbase.regionserver.maxlogs = 150 (default 32) - hbase.hregion.memstore.flush.size = 536870912 ( as before ) - hbase.hstore.blockingStoreFiles = 120 - merged tiny/empty regions and brought down regions to 30% for this table ( before: 1646 , after merge: ~600 ) - lowerLimit/ upperLimit on memstore are defaults (0.38 , 0.4) , RS MAX_HEAP_SIZE = 48000M Snapshot of the HMaster RSs with req per sec [1]. Snapshot of hbase tables: [2]. The region count per RS is now around 100 (evenly distributed) and so are the requests per sec. Based on the memstore size math, the flush size should now be = 48000*0.4/100 = 192M ? I still consistently see the memstore flushes at ~128M.. it barely ever goes above that number. Also uploaded last 1000 lines of RS log after above settings + restart [3] Here's the verbatim hbase-site.xml [4] Cheers, -Gautam. [1] - postimg.org/image/t2cxb18sh [2] - postimg.org/image/v3zaz9571 [3] - pastebin.com/HXK4s8zR [4] - pastebin.com/av9XxecY On Sun, Nov 2, 2014 at 5:46 PM, Anoop John anoop.hb...@gmail.com wrote: You have ~280 regions per RS. And ur memstore size % is 40% and heap size 48GB This mean the heap size for memstore is 48 * 0.4 = 19.2GB ( I am just considering the upper water mark alone) If u have to consider all 280 regions each with 512 MB heap you need much more size of heap. And your writes are distributed to all regions right? So you will be seeing flushes because of global heap pressure. Increasing the xmx and flush size alone wont help. You need to consider the regions# and writes When you tune this the next step will be to tune the HLog and its rolling. That depends on your cell size as well. By default when we reach 95% size of HDFS block size, we roll to a new HLog file. And by default when we reach 32 Log files, we force flushes. FYI. -Anoop- On Sat, Nov 1, 2014 at 10:54 PM, Ted Yu yuzhih...@gmail.com wrote: Please read 9.7.7.2. MemStoreFlush under http://hbase.apache.org/book.html#regions.arch Cheers On Fri, Oct 31, 2014 at 11:16 AM, Gautam Kowshik gautamkows...@gmail.com wrote: - Sorry bout the raw image upload, here’s the tsdb snapshot : http://postimg.org/image/gq4nf96x9/ - Hbase version 98.1 (CDH 5.1 distro) - hbase-site pastebin : http://pastebin.com/fEctQ3im - this table ‘msg' has been pre-split with 240 regions and writes are evenly distributed into 240 buckets. ( the bucket is a prefix to the row key ) . These regions are well spread across the 8 RSs. Although over time these 240 have split and now become 2440 .. each region server has ~280 regions. - last 500 lines of log from one RS : http://pastebin.com/8MwYMZPb Al - no hot regions from what i can tell. One of my main concerns was why even after setting the memstore flush size to 512M is it still flushing at 128M. Is there a setting i’v missed ? I’l try to get more details as i find them. Thanks and Cheers, -Gautam. On Oct 31, 2014, at 10:47 AM, Stack st...@duboce.net wrote: What version of hbase (later versions have improvements in write throughput, especially when many writing threads). Post a pastebin of regionserver log in steadystate if you don't mind. About how many writers going into server at a time? How many regions on server. All being written to at same rate or you have hotties? Thanks, St.Ack On Fri, Oct 31, 2014 at 10:22 AM, Gautam gautamkows...@gmail.com wrote: I'm trying to increase write throughput of our hbase cluster. we'r currently doing around 7500 messages per sec per node. I think we have room for improvement. Especially since the heap is under utilized and memstore size doesn't seem to fluctuate much between regular and peak ingestion loads. We mainly have one large table that we write most of the data to. Other tables are mainly opentsdb and some relatively small summary tables. This table is read in batch once a day but otherwise is mostly serving writes 99% of the time. This large table has 1 CF and get's flushed at around ~128M fairly regularly like below.. {log} 2014-10-31 16:56:09,499 INFO org.apache.hadoop.hbase.regionserver.HRegion: Finished memstore flush of ~128.2 M/134459888, currentsize=879.5 K/900640 for region msg,00102014100515impression\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x002014100515040200049358\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x004138647301\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00
Increasing write throughput..
I'm trying to increase write throughput of our hbase cluster. we'r currently doing around 7500 messages per sec per node. I think we have room for improvement. Especially since the heap is under utilized and memstore size doesn't seem to fluctuate much between regular and peak ingestion loads. We mainly have one large table that we write most of the data to. Other tables are mainly opentsdb and some relatively small summary tables. This table is read in batch once a day but otherwise is mostly serving writes 99% of the time. This large table has 1 CF and get's flushed at around ~128M fairly regularly like below.. {log} 2014-10-31 16:56:09,499 INFO org.apache.hadoop.hbase.regionserver.HRegion: Finished memstore flush of ~128.2 M/134459888, currentsize=879.5 K/900640 for region msg,00102014100515impression\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x002014100515040200049358\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x004138647301\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x0002e5a329d2171149bcc1e83ed129312b\x00\x00\x00\x00,1413909604591.828e03c0475b699278256d4b5b9638a2. in 640ms, sequenceid=16861176169, compaction requested=true {log} Here's a pastebin of my hbase site : http://pastebin.com/fEctQ3im What i'v tried.. - turned of major compactions , and handling these manually. - bumped up heap Xmx from 24G to 48 G - hbase.hregion.memstore.flush.size = 512M - lowerLimit/ upperLimit on memstore are defaults (0.38 , 0.4) since the global heap has enough space to accommodate the default percentages. - Currently running Hbase 98.1 on an 8 node cluster that's scaled up to 128GB RAM. There hasn't been any appreciable increase in write perf. Still hovering around the 7500 per node write throughput number. The flushes still seem to be hapenning at 128M (instead of the expected 512) I'v attached a snapshot of the memstore size vs. flushQueueLen. the block caches are utilizing the extra heap space but not the memstore. The flush Queue lengths have increased which leads me to believe that it's flushing way too often without any increase in throughput. Please let me know where i should dig further. That's a long email, thanks for reading through :-) Cheers, -Gautam.
Re: Increasing write throughput..
- Sorry bout the raw image upload, here’s the tsdb snapshot : http://postimg.org/image/gq4nf96x9/ - Hbase version 98.1 (CDH 5.1 distro) - hbase-site pastebin : http://pastebin.com/fEctQ3im - this table ‘msg' has been pre-split with 240 regions and writes are evenly distributed into 240 buckets. ( the bucket is a prefix to the row key ) . These regions are well spread across the 8 RSs. Although over time these 240 have split and now become 2440 .. each region server has ~280 regions. - last 500 lines of log from one RS : http://pastebin.com/8MwYMZPb Al - no hot regions from what i can tell. One of my main concerns was why even after setting the memstore flush size to 512M is it still flushing at 128M. Is there a setting i’v missed ? I’l try to get more details as i find them. Thanks and Cheers, -Gautam. On Oct 31, 2014, at 10:47 AM, Stack st...@duboce.net wrote: What version of hbase (later versions have improvements in write throughput, especially when many writing threads). Post a pastebin of regionserver log in steadystate if you don't mind. About how many writers going into server at a time? How many regions on server. All being written to at same rate or you have hotties? Thanks, St.Ack On Fri, Oct 31, 2014 at 10:22 AM, Gautam gautamkows...@gmail.com wrote: I'm trying to increase write throughput of our hbase cluster. we'r currently doing around 7500 messages per sec per node. I think we have room for improvement. Especially since the heap is under utilized and memstore size doesn't seem to fluctuate much between regular and peak ingestion loads. We mainly have one large table that we write most of the data to. Other tables are mainly opentsdb and some relatively small summary tables. This table is read in batch once a day but otherwise is mostly serving writes 99% of the time. This large table has 1 CF and get's flushed at around ~128M fairly regularly like below.. {log} 2014-10-31 16:56:09,499 INFO org.apache.hadoop.hbase.regionserver.HRegion: Finished memstore flush of ~128.2 M/134459888, currentsize=879.5 K/900640 for region msg,00102014100515impression\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x002014100515040200049358\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x004138647301\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x0002e5a329d2171149bcc1e83ed129312b\x00\x00\x00\x00,1413909604591.828e03c0475b699278256d4b5b9638a2. in 640ms, sequenceid=16861176169, compaction requested=true {log} Here's a pastebin of my hbase site : http://pastebin.com/fEctQ3im What i'v tried.. - turned of major compactions , and handling these manually. - bumped up heap Xmx from 24G to 48 G - hbase.hregion.memstore.flush.size = 512M - lowerLimit/ upperLimit on memstore are defaults (0.38 , 0.4) since the global heap has enough space to accommodate the default percentages. - Currently running Hbase 98.1 on an 8 node cluster that's scaled up to 128GB RAM. There hasn't been any appreciable increase in write perf. Still hovering around the 7500 per node write throughput number. The flushes still seem to be hapenning at 128M (instead of the expected 512) I'v attached a snapshot of the memstore size vs. flushQueueLen. the block caches are utilizing the extra heap space but not the memstore. The flush Queue lengths have increased which leads me to believe that it's flushing way too often without any increase in throughput. Please let me know where i should dig further. That's a long email, thanks for reading through :-) Cheers, -Gautam.
Re: Copying data from 94 to 98 ..
Jerry, Can you elaborate on what you mean by export table to hdfs? I initially tried running the export on src cluster (-copy-to hdfs://dest/hbase ), it complains while trying to write the data to dest cluster (due to the hdfs protocol version mismatch). Then I tried running export on dest cluster (-copy-from hftp://src/hbase). On Mon, Sep 15, 2014 at 10:36 PM, Jerry He jerry...@gmail.com wrote: While you continue on the snapshot approach, have you tried to Export the table in 0.94 to hdfs, and then Import the data from hdfs to 0.98? On Sep 15, 2014 10:19 PM, Matteo Bertozzi theo.berto...@gmail.com wrote: can you post the full exception and the file path ? maybe there is a bug in looking up the reference file. It seems to not be able to find enough data in the file... Matteo On Mon, Sep 15, 2014 at 10:08 PM, Gautam gautamkows...@gmail.com wrote: Thanks for the reply Matteo. This is exactly what I did. I modified the source cluster's dir structure to mimic that of the 98 cluster. I even got as far as it trying to look through the reference files. I end up with this exception : 14/09/15 23:34:59 ERROR snapshot.ExportSnapshot: Snapshot export failed java.io.IOException at org.apache.hadoop.hbase.snapshot.SnapshotManifestV1.loadRegionManifests(SnapshotManifestV1.java:145) at org.apache.hadoop.hbase.snapshot.SnapshotManifest.load(SnapshotManifest.java:265) at org.apache.hadoop.hbase.snapshot.SnapshotManifest.open(SnapshotManifest.java:119) at org.apache.hadoop.hbase.snapshot.SnapshotReferenceUtil.visitTableStoreFiles(SnapshotReferenceUtil.java:125) ... .. Caused by: java.io.IOException: read=-1, wanted=4 at org.apache.hadoop.hbase.io.Reference.read(Reference.java:175) at org.apache.hadoop.hbase.regionserver.StoreFileInfo.init(StoreFileInfo.java:115) at org.apache.hadoop.hbase.regionserver.HRegionFileSystem.getStoreFiles(HRegionFileSystem.java .. This and Ted's reply about HBASE-7987 leads me to believe that the export tool from my distro is in capable of working around the regionManifest file requirement. I'm now left with the option of downgrading my dest cluster to 94, copying data and then upgrading using the upgrade migration tool. Wanted to know if others have tried this or there are other things I can do. If not, i'l just go ahead and do this :-) Cheers, -Gautam. On Mon, Sep 15, 2014 at 8:10 PM, Matteo Bertozzi theo.berto...@gmail.com wrote: 94 and 98 differs in directory layout so 98 is not able to read 94 layout unless you run the migration tool which is basically moving all the data in a default namespace directory e.g. /hbase/table - /hbase/data/default/table /hbase/.archive/table - /hbase/archive/default/table Matteo On Mon, Sep 15, 2014 at 6:17 PM, Gautam gautamkows...@gmail.com wrote: Yep, looks like the CDH distro backports HBASE-7987. Having said that, is there a transition path for us or are we hosed :-) ? In general, what's the recommended way to achieve this, at this point I feel i'm going around the system to achieve what I want. If nothing else works with export snapshot should I just downgrade to 94, export snapshot and then upgrade to 98? Is the upgrade migration path different from what export snapshot does (i'd imagine yes)? Cheers, -Gautam. On Mon, Sep 15, 2014 at 5:14 PM, Ted Yu yuzhih...@gmail.com wrote: bq. 98.1 on dest cluster Looking at the history for SnapshotManifestV1, it came with HBASE-7987 which went to 0.99.0 Perhaps you're using a distro with HBASE-7987 ? On Mon, Sep 15, 2014 at 4:58 PM, Gautam gautamkows...@gmail.com wrote: Hello, I'm trying to copy data between Hbase clusters on different versions. I am using : /usr/bin/hbase org.apache.hadoop.hbase.snapshot.ExportSnapshot -chuser hbase -chgroup hadoop -snapshot msg_snapshot -mappers 50 -copy-from hftp://src-cluster:50070/hbase -copy-to hdfs:/dest-cluster:8020/hbase Till now, based on various tips from the mailing list, I have modified the source cluster data dir paths to mimic the 98 convention (archive, table data paths, etc). This helped in jumping some roadblocks but not all. This is what I see now : 14/09/15 23:34:59 ERROR snapshot.ExportSnapshot: Snapshot export failed java.io.IOException at org.apache.hadoop.hbase.snapshot.SnapshotManifestV1.loadRegionManifests(SnapshotManifestV1.java:145
Re: Copying data from 94 to 98 ..
Yep, looks like the CDH distro backports HBASE-7987. Having said that, is there a transition path for us or are we hosed :-) ? In general, what's the recommended way to achieve this, at this point I feel i'm going around the system to achieve what I want. If nothing else works with export snapshot should I just downgrade to 94, export snapshot and then upgrade to 98? Is the upgrade migration path different from what export snapshot does (i'd imagine yes)? Cheers, -Gautam. On Mon, Sep 15, 2014 at 5:14 PM, Ted Yu yuzhih...@gmail.com wrote: bq. 98.1 on dest cluster Looking at the history for SnapshotManifestV1, it came with HBASE-7987 which went to 0.99.0 Perhaps you're using a distro with HBASE-7987 ? On Mon, Sep 15, 2014 at 4:58 PM, Gautam gautamkows...@gmail.com wrote: Hello, I'm trying to copy data between Hbase clusters on different versions. I am using : /usr/bin/hbase org.apache.hadoop.hbase.snapshot.ExportSnapshot -chuser hbase -chgroup hadoop -snapshot msg_snapshot -mappers 50 -copy-from hftp://src-cluster:50070/hbase -copy-to hdfs:/dest-cluster:8020/hbase Till now, based on various tips from the mailing list, I have modified the source cluster data dir paths to mimic the 98 convention (archive, table data paths, etc). This helped in jumping some roadblocks but not all. This is what I see now : 14/09/15 23:34:59 ERROR snapshot.ExportSnapshot: Snapshot export failed java.io.IOException at org.apache.hadoop.hbase.snapshot.SnapshotManifestV1.loadRegionManifests(SnapshotManifestV1.java:145) at org.apache.hadoop.hbase.snapshot.SnapshotManifest.load(SnapshotManifest.java:265) at org.apache.hadoop.hbase.snapshot.SnapshotManifest.open(SnapshotManifest.java:119) at org.apache.hadoop.hbase.snapshot.SnapshotReferenceUtil.visitTableStoreFiles(SnapshotReferenceUtil.java:125) ... .. Caused by: java.io.IOException: read=-1, wanted=4 at org.apache.hadoop.hbase.io.Reference.read(Reference.java:175) at org.apache.hadoop.hbase.regionserver.StoreFileInfo.init(StoreFileInfo.java:115) at org.apache.hadoop.hbase.regionserver.HRegionFileSystem.getStoreFiles(HRegionFileSystem.java .. Fails while trying to read refernce hfile. Is this something folks have done before and/or is possible to do? I'd really like to do this without having to upgrade my source cluster or downgrade my dest cluster. I'm using 94.6 on source cluster and 98.1 on dest cluster. Cheers, -Gautam. -- If you really want something in this life, you have to work for it. Now, quiet! They're about to announce the lottery numbers...
Re: Copying data from 94 to 98 ..
Thanks for the reply Matteo. This is exactly what I did. I modified the source cluster's dir structure to mimic that of the 98 cluster. I even got as far as it trying to look through the reference files. I end up with this exception : 14/09/15 23:34:59 ERROR snapshot.ExportSnapshot: Snapshot export failed java.io.IOException at org.apache.hadoop.hbase.snapshot.SnapshotManifestV1.loadRegionManifests(SnapshotManifestV1.java:145) at org.apache.hadoop.hbase.snapshot.SnapshotManifest.load(SnapshotManifest.java:265) at org.apache.hadoop.hbase.snapshot.SnapshotManifest.open(SnapshotManifest.java:119) at org.apache.hadoop.hbase.snapshot.SnapshotReferenceUtil.visitTableStoreFiles(SnapshotReferenceUtil.java:125) ... .. Caused by: java.io.IOException: read=-1, wanted=4 at org.apache.hadoop.hbase.io.Reference.read(Reference.java:175) at org.apache.hadoop.hbase.regionserver.StoreFileInfo.init(StoreFileInfo.java:115) at org.apache.hadoop.hbase.regionserver.HRegionFileSystem.getStoreFiles(HRegionFileSystem.java .. This and Ted's reply about HBASE-7987 leads me to believe that the export tool from my distro is in capable of working around the regionManifest file requirement. I'm now left with the option of downgrading my dest cluster to 94, copying data and then upgrading using the upgrade migration tool. Wanted to know if others have tried this or there are other things I can do. If not, i'l just go ahead and do this :-) Cheers, -Gautam. On Mon, Sep 15, 2014 at 8:10 PM, Matteo Bertozzi theo.berto...@gmail.com wrote: 94 and 98 differs in directory layout so 98 is not able to read 94 layout unless you run the migration tool which is basically moving all the data in a default namespace directory e.g. /hbase/table - /hbase/data/default/table /hbase/.archive/table - /hbase/archive/default/table Matteo On Mon, Sep 15, 2014 at 6:17 PM, Gautam gautamkows...@gmail.com wrote: Yep, looks like the CDH distro backports HBASE-7987. Having said that, is there a transition path for us or are we hosed :-) ? In general, what's the recommended way to achieve this, at this point I feel i'm going around the system to achieve what I want. If nothing else works with export snapshot should I just downgrade to 94, export snapshot and then upgrade to 98? Is the upgrade migration path different from what export snapshot does (i'd imagine yes)? Cheers, -Gautam. On Mon, Sep 15, 2014 at 5:14 PM, Ted Yu yuzhih...@gmail.com wrote: bq. 98.1 on dest cluster Looking at the history for SnapshotManifestV1, it came with HBASE-7987 which went to 0.99.0 Perhaps you're using a distro with HBASE-7987 ? On Mon, Sep 15, 2014 at 4:58 PM, Gautam gautamkows...@gmail.com wrote: Hello, I'm trying to copy data between Hbase clusters on different versions. I am using : /usr/bin/hbase org.apache.hadoop.hbase.snapshot.ExportSnapshot -chuser hbase -chgroup hadoop -snapshot msg_snapshot -mappers 50 -copy-from hftp://src-cluster:50070/hbase -copy-to hdfs:/dest-cluster:8020/hbase Till now, based on various tips from the mailing list, I have modified the source cluster data dir paths to mimic the 98 convention (archive, table data paths, etc). This helped in jumping some roadblocks but not all. This is what I see now : 14/09/15 23:34:59 ERROR snapshot.ExportSnapshot: Snapshot export failed java.io.IOException at org.apache.hadoop.hbase.snapshot.SnapshotManifestV1.loadRegionManifests(SnapshotManifestV1.java:145) at org.apache.hadoop.hbase.snapshot.SnapshotManifest.load(SnapshotManifest.java:265) at org.apache.hadoop.hbase.snapshot.SnapshotManifest.open(SnapshotManifest.java:119) at org.apache.hadoop.hbase.snapshot.SnapshotReferenceUtil.visitTableStoreFiles(SnapshotReferenceUtil.java:125) ... .. Caused by: java.io.IOException: read=-1, wanted=4 at org.apache.hadoop.hbase.io.Reference.read(Reference.java:175) at org.apache.hadoop.hbase.regionserver.StoreFileInfo.init(StoreFileInfo.java:115) at org.apache.hadoop.hbase.regionserver.HRegionFileSystem.getStoreFiles(HRegionFileSystem.java .. Fails while trying to read refernce hfile. Is this something folks have done before and/or is possible to do? I'd really like to do this without having to upgrade my source cluster or downgrade my dest cluster. I'm using 94.6 on source cluster and 98.1 on dest cluster. Cheers, -Gautam. -- If you really want something in this life, you have to work for it. Now, quiet! They're about to announce the lottery numbers... -- If you really want something in this life, you have to work for it. Now, quiet! They're about to announce the lottery numbers...
Re: Copying data from 94 to 98 ..
14/09/15 23:34:59 DEBUG snapshot.SnapshotManifestV1: Adding reference for file (4/4): hftp:// master42.stg.com:50070/hbase/.hbase-snapshot/msg_snapshot/84f60fc2aa7e96df91e6289e6c19dc25/c/afe341e4149649578c5861e32494dbec 14/09/15 23:34:59 ERROR snapshot.ExportSnapshot: Snapshot export failed java.io.IOException at org.apache.hadoop.hbase.snapshot.SnapshotManifestV1.loadRegionManifests(SnapshotManifestV1.java:145) at org.apache.hadoop.hbase.snapshot.SnapshotManifest.load(SnapshotManifest.java:265) at org.apache.hadoop.hbase.snapshot.SnapshotManifest.open(SnapshotManifest.java:119) at org.apache.hadoop.hbase.snapshot.SnapshotReferenceUtil.visitTableStoreFiles(SnapshotReferenceUtil.java:125) at org.apache.hadoop.hbase.snapshot.SnapshotReferenceUtil.visitReferencedFiles(SnapshotReferenceUtil.java:108) at org.apache.hadoop.hbase.snapshot.ExportSnapshot.getSnapshotFiles(ExportSnapshot.java:479) at org.apache.hadoop.hbase.snapshot.ExportSnapshot.access$200(ExportSnapshot.java:89) at org.apache.hadoop.hbase.snapshot.ExportSnapshot$ExportSnapshotInputFormat.getSplits(ExportSnapshot.java:600) at org.apache.hadoop.mapred.JobClient.writeNewSplits(JobClient.java:1107) at org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:1124) at org.apache.hadoop.mapred.JobClient.access$600(JobClient.java:178) at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:1023) at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:976) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1554) at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:976) at org.apache.hadoop.mapreduce.Job.submit(Job.java:582) at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:612) at org.apache.hadoop.hbase.snapshot.ExportSnapshot.runCopyJob(ExportSnapshot.java:751) at org.apache.hadoop.hbase.snapshot.ExportSnapshot.run(ExportSnapshot.java:905) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.hadoop.hbase.snapshot.ExportSnapshot.innerMain(ExportSnapshot.java:975) at org.apache.hadoop.hbase.snapshot.ExportSnapshot.main(ExportSnapshot.java:979) Caused by: java.io.IOException: read=-1, wanted=4 at org.apache.hadoop.hbase.io.Reference.read(Reference.java:175) at org.apache.hadoop.hbase.regionserver.StoreFileInfo.init(StoreFileInfo.java:115) at org.apache.hadoop.hbase.regionserver.HRegionFileSystem.getStoreFiles(HRegionFileSystem.java:204) at org.apache.hadoop.hbase.snapshot.SnapshotManifestV1.buildManifestFromDisk(SnapshotManifestV1.java:179) at org.apache.hadoop.hbase.snapshot.SnapshotManifestV1$1.call(SnapshotManifestV1.java:131) at org.apache.hadoop.hbase.snapshot.SnapshotManifestV1$1.call(SnapshotManifestV1.java:127) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) On Mon, Sep 15, 2014 at 10:18 PM, Matteo Bertozzi theo.berto...@gmail.com wrote: can you post the full exception and the file path ? maybe there is a bug in looking up the reference file. It seems to not be able to find enough data in the file... Matteo On Mon, Sep 15, 2014 at 10:08 PM, Gautam gautamkows...@gmail.com wrote: Thanks for the reply Matteo. This is exactly what I did. I modified the source cluster's dir structure to mimic that of the 98 cluster. I even got as far as it trying to look through the reference files. I end up with this exception : 14/09/15 23:34:59 ERROR snapshot.ExportSnapshot: Snapshot export failed java.io.IOException at org.apache.hadoop.hbase.snapshot.SnapshotManifestV1.loadRegionManifests(SnapshotManifestV1.java:145) at org.apache.hadoop.hbase.snapshot.SnapshotManifest.load(SnapshotManifest.java:265) at org.apache.hadoop.hbase.snapshot.SnapshotManifest.open(SnapshotManifest.java:119) at org.apache.hadoop.hbase.snapshot.SnapshotReferenceUtil.visitTableStoreFiles(SnapshotReferenceUtil.java:125) ... .. Caused by: java.io.IOException: read=-1, wanted=4 at org.apache.hadoop.hbase.io.Reference.read(Reference.java:175) at org.apache.hadoop.hbase.regionserver.StoreFileInfo.init(StoreFileInfo.java:115) at org.apache.hadoop.hbase.regionserver.HRegionFileSystem.getStoreFiles(HRegionFileSystem.java .. This and Ted's reply about HBASE-7987 leads me to believe that the export tool from my distro is in capable of working around the regionManifest file requirement. I'm now left with the option of downgrading my dest cluster to 94, copying data and then upgrading using the upgrade migration tool. Wanted to know if others have
Hbase Scan/Snapshot Performance...
Hello, We'v been using and loving Hbase for couple of months now. Our primary usecase for Hbase is writing events in stream to an online time series Hbase table. Every so often we run medium to large batch scan MR jobs on sections (1hour, 1 day, 1 week) of this same time series table. This online table is now showing spikes whenever these large batched read jobs are run. Write throughput goes down while these sequential scans are running on the table. We'v been playing around with snapshots and are considering using snapshots to take over the responsibility for running these scheduled hourly, daily, weekly jobs so that the online table doesn't get affected. From preliminary tests it looks like online snapshots take waay too long. The snapshot job times out after 60secs. The time was spent flushing the memstores on all region servers (as expected) which seems to take too long. Also it seems from the RS logs like this is done serially. Offline snapshots isn't an option since we can't disable this table which serves the event writing. *We'r running Hbase 94.6. Tried benchmarking snapshotting on a 9TB Table with 240 regions, 1 Column Family, 4 region servers. * All in all, I'd like to ask if things would improve if we upgraded to Hbase 0.98.+ Are there known benchmark numbers on expected snapshot performance for 94.+ vs. 98.+ ? In an ideal scenario we'd like these MR jobs to dynamically take a snapshot, run the job, delete/re-use the snapshot based on freshness. At the least, we need the snapshot to be fresh until the last hour. Also from what I understand in Hbase, scans are not consistent at the table level but are at the row level. Are there other ways I can query the online table without hurting the write throughput? Cheers, -Gautam.
Re: Hbase Scan/Snapshot Performance...
Thanks for the replies.. Matteo, We'r running 94.6 since February so, sadly the prod cluster doesn't have this SKIP_FLUSH option right now. Would be great if there are options I could use right now until we upgrade to 98. Ted, Thanks for the jira. That is exactly what we intend to use for running the MR jobs over snapshots. Just wanted to know how easy/lightweight snapshotting can be before we set our eyes on moving the whole thing over. Cheers, -Gautam. On Tue, Aug 12, 2014 at 3:24 PM, Ted Yu yuzhih...@gmail.com wrote: Gautum: Please take a look at this: HBASE-8369 MapReduce over snapshot files Cheers On Tue, Aug 12, 2014 at 3:11 PM, Matteo Bertozzi theo.berto...@gmail.com wrote: There is HBASE-10935, included in 0.94.21 where you can specify to skip the memstore flush and the result will be the online version of an offline snapshot snapshot 'sourceTable', 'snapshotName', {SKIP_FLUSH = true} On Tue, Aug 12, 2014 at 10:58 PM, Gautam gautamkows...@gmail.com wrote: Hello, We'v been using and loving Hbase for couple of months now. Our primary usecase for Hbase is writing events in stream to an online time series Hbase table. Every so often we run medium to large batch scan MR jobs on sections (1hour, 1 day, 1 week) of this same time series table. This online table is now showing spikes whenever these large batched read jobs are run. Write throughput goes down while these sequential scans are running on the table. We'v been playing around with snapshots and are considering using snapshots to take over the responsibility for running these scheduled hourly, daily, weekly jobs so that the online table doesn't get affected. From preliminary tests it looks like online snapshots take waay too long. The snapshot job times out after 60secs. The time was spent flushing the memstores on all region servers (as expected) which seems to take too long. Also it seems from the RS logs like this is done serially. Offline snapshots isn't an option since we can't disable this table which serves the event writing. *We'r running Hbase 94.6. Tried benchmarking snapshotting on a 9TB Table with 240 regions, 1 Column Family, 4 region servers. * All in all, I'd like to ask if things would improve if we upgraded to Hbase 0.98.+ Are there known benchmark numbers on expected snapshot performance for 94.+ vs. 98.+ ? In an ideal scenario we'd like these MR jobs to dynamically take a snapshot, run the job, delete/re-use the snapshot based on freshness. At the least, we need the snapshot to be fresh until the last hour. Also from what I understand in Hbase, scans are not consistent at the table level but are at the row level. Are there other ways I can query the online table without hurting the write throughput? Cheers, -Gautam. -- If you really want something in this life, you have to work for it. Now, quiet! They're about to announce the lottery numbers...
Re: how to move hbase table data between diffent version hbase?
An earlier thread[1] talks about a similar problem. If the 0.96 cluster is fresh you can copy files across and upgrade 1. http://mail-archives.apache.org/mod_mbox/hbase-user/201311.mbox/%3ccaflnt_ofhg1xgvwygpauymt-m3ncujr9rdqopdi-ad0pzca...@mail.gmail.com%3E On Wed, Jul 9, 2014 at 1:52 PM, ch huang justlo...@gmail.com wrote: hi,maillist: i have two hbase envionment , one is 0.94 based on cdh4.4 ,another is 0.96 based on chd5,i want to move 0.94 table data to 0.96 ,how can i do it? i see doc say hbase 0.96 do a lot of change,it do not compatible with 0.94
best approach for write and immediate read use case
Hello all, I have an use case where I need to write 1 million to 10 million records periodically (with intervals of 1 minutes to 10 minutes), into an HBase table. Once the insert is completed, these records are queried immediately from another program - multiple reads. So, this is one massive write followed by many reads. I have two approaches to insert these records into the HBase table - Use HTable or HTableMultiplexer to stream the data to HBase table. or Write the data to HDFS store as a sequence file (avro in my case) - run map reduce job using HFileOutputFormat and then load the output files into HBase cluster. Something like, LoadIncrementalHFiles loader = new LoadIncrementalHFiles(conf); loader.doBulkLoad(new Path(outputDir), hTable); In my use case which approach would be better? If I use HTable interface, would the inserted data be in the HBase cache, before flushing to the files, for immediate read queries? If I use map reduce job to insert, would the data be loaded into the HBase cache immediately? or only the output files would be copied to respective hbase table specific directories? So, which approach is better for write and then immediate multiple read operations? Thanks, Gautam
Re: best approach for write and immediate read use case
Hi, Average size of my records is 60 bytes - 20 bytes Key and 40 bytes value, table has one column family. I have setup a cluster for testing - 1 master and 3 region servers. Each have a heap size of 3 GB, single cpu. I have pre-split the table into 30 regions. I do not have to keep data forever, I could purge older records periodically. Thanks, Gautam On Fri, Aug 23, 2013 at 3:20 AM, Ted Yu yuzhih...@gmail.com wrote: Can you tell us the average size of your records and how much heap is given to the region servers ? Thanks On Aug 23, 2013, at 12:11 AM, Gautam Borah gautam.bo...@gmail.com wrote: Hello all, I have an use case where I need to write 1 million to 10 million records periodically (with intervals of 1 minutes to 10 minutes), into an HBase table. Once the insert is completed, these records are queried immediately from another program - multiple reads. So, this is one massive write followed by many reads. I have two approaches to insert these records into the HBase table - Use HTable or HTableMultiplexer to stream the data to HBase table. or Write the data to HDFS store as a sequence file (avro in my case) - run map reduce job using HFileOutputFormat and then load the output files into HBase cluster. Something like, LoadIncrementalHFiles loader = new LoadIncrementalHFiles(conf); loader.doBulkLoad(new Path(outputDir), hTable); In my use case which approach would be better? If I use HTable interface, would the inserted data be in the HBase cache, before flushing to the files, for immediate read queries? If I use map reduce job to insert, would the data be loaded into the HBase cache immediately? or only the output files would be copied to respective hbase table specific directories? So, which approach is better for write and then immediate multiple read operations? Thanks, Gautam
Re: best approach for write and immediate read use case
Thanks Ted for your response, and clarifying the behavior for using HTable interface. What would be the behavior for inserting data using map reduce job? would the recently added records be in the memstore? or I need to load them for read queries after the insert is done? Thanks, Gautam On Fri, Aug 23, 2013 at 2:43 PM, Ted Yu yuzhih...@gmail.com wrote: Assuming you are using 0.94, the default value for hbase.regionserver.global.memstore.lowerLimit is 0.35 Meaning, memstore on each region server would be able to hold 3000M * 0.35 / 60 = 17.5 mil records (roughly). bq. If I use HTable interface, would the inserted data be in the HBase cache, before flushing to the files, for immediate read queries? Yes. Cheers On Fri, Aug 23, 2013 at 12:01 PM, Gautam Borah gautam.bo...@gmail.com wrote: Hi, Average size of my records is 60 bytes - 20 bytes Key and 40 bytes value, table has one column family. I have setup a cluster for testing - 1 master and 3 region servers. Each have a heap size of 3 GB, single cpu. I have pre-split the table into 30 regions. I do not have to keep data forever, I could purge older records periodically. Thanks, Gautam On Fri, Aug 23, 2013 at 3:20 AM, Ted Yu yuzhih...@gmail.com wrote: Can you tell us the average size of your records and how much heap is given to the region servers ? Thanks On Aug 23, 2013, at 12:11 AM, Gautam Borah gautam.bo...@gmail.com wrote: Hello all, I have an use case where I need to write 1 million to 10 million records periodically (with intervals of 1 minutes to 10 minutes), into an HBase table. Once the insert is completed, these records are queried immediately from another program - multiple reads. So, this is one massive write followed by many reads. I have two approaches to insert these records into the HBase table - Use HTable or HTableMultiplexer to stream the data to HBase table. or Write the data to HDFS store as a sequence file (avro in my case) - run map reduce job using HFileOutputFormat and then load the output files into HBase cluster. Something like, LoadIncrementalHFiles loader = new LoadIncrementalHFiles(conf); loader.doBulkLoad(new Path(outputDir), hTable); In my use case which approach would be better? If I use HTable interface, would the inserted data be in the HBase cache, before flushing to the files, for immediate read queries? If I use map reduce job to insert, would the data be loaded into the HBase cache immediately? or only the output files would be copied to respective hbase table specific directories? So, which approach is better for write and then immediate multiple read operations? Thanks, Gautam