impact of using higher Hbase.hregion.memstore.flush.size=512MB
Hi all, The default size of Hbase.hregion.memstore.flush.size is define as 128 MB . Could anyone kindly explain what would be the impact if we increase this to a higher value 512 MB or 800 MB or higher. We have a very write heavy cluster. Also we run periodic end point co processor based jobs that operate on the data written in the last 10-15 mins, every 10 minute. We are trying to manage the memstore flush operations such that the hot data remains in memstore for at least 30-40 mins or longer, so that the job hits disk every 3rd or 4th time it tries to operate on the hot data (it does scan). We have region server heap size of 20 GB and set the, hbase.regionserver.global.memstore.lowerLimit = .45 hbase.regionserver.global.memstore.upperLimit = .55 We observed that if we set the Hbase.hregion.memstore.flush.size=128MB only 10% of the heap is utilized by memstore, after that memstore flushes. At Hbase.hregion.memstore.flush.size=512MB, we are able to increase the heap utilization to by memstore to 35%. It would be very helpful for us to understand the implication of higher Hbase.hregion.memstore.flush.size for a long running cluster. Thanks, Gautam
Re: impact of using higher Hbase.hregion.memstore.flush.size=512MB
Gautam, Yes, you can increase the size of the memstore to values larger to 128MB but usually you go by increasing hbase.hregion.memstore.block.multiplier only. Depending on the version of HBase you are running many things can happen, e.g. multiple memstores can be flushed at once and/or the memstores will be flushed if there are some rows in memory (30 million) or if the store hasn't been flushed in an hour, the rate of the flushes can be tuned and also if you are hitting the max number of HLogs that can trigger a flush. One problem running with large memstores is mostly how many regions you will have per RS and if using some encoding and/or compression codec is being used might cause the flush to take longer or use more CPU resources or push back clients b/c you haven't flushed some regions to disk. Based on the the behavior that you have described on the heap utilization sounds like you are not fully utilizing the memstores and you are below the lower limit, so depending on the version of HBase and available resources you might want to use hbase.rs.cacheblocksonwrite instead to keep some of the hot data in the block cache. cheers, esteban. -- Cloudera, Inc. On Wed, May 27, 2015 at 1:58 PM, Gautam Borah gbo...@appdynamics.com wrote: Hi all, The default size of Hbase.hregion.memstore.flush.size is define as 128 MB . Could anyone kindly explain what would be the impact if we increase this to a higher value 512 MB or 800 MB or higher. We have a very write heavy cluster. Also we run periodic end point co processor based jobs that operate on the data written in the last 10-15 mins, every 10 minute. We are trying to manage the memstore flush operations such that the hot data remains in memstore for at least 30-40 mins or longer, so that the job hits disk every 3rd or 4th time it tries to operate on the hot data (it does scan). We have region server heap size of 20 GB and set the, hbase.regionserver.global.memstore.lowerLimit = .45 hbase.regionserver.global.memstore.upperLimit = .55 We observed that if we set the Hbase.hregion.memstore.flush.size=128MB only 10% of the heap is utilized by memstore, after that memstore flushes. At Hbase.hregion.memstore.flush.size=512MB, we are able to increase the heap utilization to by memstore to 35%. It would be very helpful for us to understand the implication of higher Hbase.hregion.memstore.flush.size for a long running cluster. Thanks, Gautam
Re: impact of using higher Hbase.hregion.memstore.flush.size=512MB
Hi Esteban, Thanks for your response. hbase.rs.cacheblocksonwrite would be very useful for us. We have set hbase.regionserver.maxlogs appropriately to avoid flush across memstores. Also set hbase.regionserver.optionalcacheflushinterval to 0 to disable periodic flushing, we do not write anything by passing the WAL. We are running the cluster with conservative limits, so that if a region server crashes, others can take the extra load without hitting the memstore flushing limits. We are running the cluster now at 800MB flush size, initial job runs are fine. We will run it for couple of days and check the status. Thanks again. Gautam On Wed, May 27, 2015 at 2:15 PM, Esteban Gutierrez este...@cloudera.com wrote: Gautam, Yes, you can increase the size of the memstore to values larger to 128MB but usually you go by increasing hbase.hregion.memstore.block.multiplier only. Depending on the version of HBase you are running many things can happen, e.g. multiple memstores can be flushed at once and/or the memstores will be flushed if there are some rows in memory (30 million) or if the store hasn't been flushed in an hour, the rate of the flushes can be tuned and also if you are hitting the max number of HLogs that can trigger a flush. One problem running with large memstores is mostly how many regions you will have per RS and if using some encoding and/or compression codec is being used might cause the flush to take longer or use more CPU resources or push back clients b/c you haven't flushed some regions to disk. Based on the the behavior that you have described on the heap utilization sounds like you are not fully utilizing the memstores and you are below the lower limit, so depending on the version of HBase and available resources you might want to use hbase.rs.cacheblocksonwrite instead to keep some of the hot data in the block cache. cheers, esteban. -- Cloudera, Inc. On Wed, May 27, 2015 at 1:58 PM, Gautam Borah gbo...@appdynamics.com wrote: Hi all, The default size of Hbase.hregion.memstore.flush.size is define as 128 MB . Could anyone kindly explain what would be the impact if we increase this to a higher value 512 MB or 800 MB or higher. We have a very write heavy cluster. Also we run periodic end point co processor based jobs that operate on the data written in the last 10-15 mins, every 10 minute. We are trying to manage the memstore flush operations such that the hot data remains in memstore for at least 30-40 mins or longer, so that the job hits disk every 3rd or 4th time it tries to operate on the hot data (it does scan). We have region server heap size of 20 GB and set the, hbase.regionserver.global.memstore.lowerLimit = .45 hbase.regionserver.global.memstore.upperLimit = .55 We observed that if we set the Hbase.hregion.memstore.flush.size=128MB only 10% of the heap is utilized by memstore, after that memstore flushes. At Hbase.hregion.memstore.flush.size=512MB, we are able to increase the heap utilization to by memstore to 35%. It would be very helpful for us to understand the implication of higher Hbase.hregion.memstore.flush.size for a long running cluster. Thanks, Gautam