impact of using higher Hbase.hregion.memstore.flush.size=512MB

2015-05-27 Thread Gautam Borah
Hi all,

The default size of Hbase.hregion.memstore.flush.size is define as 128 MB .
Could anyone kindly explain what would be the impact if we increase this to
a higher value 512 MB or 800 MB or higher.

We have a very write heavy cluster. Also we run periodic end point co
processor based jobs that operate on the data written in the last 10-15
mins, every 10 minute. We are trying to manage the memstore flush
operations such that the hot data remains in memstore for at least 30-40
mins or longer, so that the job hits disk every 3rd or 4th time it tries to
operate on the hot data (it does scan).

We have region server heap size of 20 GB and set the,

hbase.regionserver.global.memstore.lowerLimit = .45

hbase.regionserver.global.memstore.upperLimit = .55

We observed that if we set the Hbase.hregion.memstore.flush.size=128MB only
10% of the heap is utilized by memstore, after that memstore flushes.

At Hbase.hregion.memstore.flush.size=512MB, we are able to increase the
heap utilization to by memstore to 35%.

It would be very helpful for us to understand the implication of higher
Hbase.hregion.memstore.flush.size  for a long running cluster.

Thanks,

Gautam


Re: impact of using higher Hbase.hregion.memstore.flush.size=512MB

2015-05-27 Thread Esteban Gutierrez
Gautam,

Yes, you can increase the size of the memstore to values larger to 128MB
but usually you go by increasing hbase.hregion.memstore.block.multiplier
only. Depending on the version of HBase you are running many things can
happen, e.g. multiple memstores can be flushed at once and/or the memstores
will be flushed if there are some rows in memory (30 million) or if the
store hasn't been flushed in an hour, the rate of the flushes can be tuned
and also if you are hitting the max number of HLogs that can trigger a
flush. One problem  running with large memstores is mostly how many regions
you will have per RS and if using some encoding and/or compression codec is
being used might cause the flush to take longer or use more CPU resources
or push back clients b/c you haven't flushed some regions to disk.

Based on the the behavior that you have described on the heap utilization
sounds like you are not fully utilizing the memstores and you are below the
lower limit, so depending on the version of HBase and available resources
you might want to use hbase.rs.cacheblocksonwrite instead to keep some of
the hot data in the block cache.

cheers,
esteban.




--
Cloudera, Inc.


On Wed, May 27, 2015 at 1:58 PM, Gautam Borah gbo...@appdynamics.com
wrote:

 Hi all,

 The default size of Hbase.hregion.memstore.flush.size is define as 128 MB .
 Could anyone kindly explain what would be the impact if we increase this to
 a higher value 512 MB or 800 MB or higher.

 We have a very write heavy cluster. Also we run periodic end point co
 processor based jobs that operate on the data written in the last 10-15
 mins, every 10 minute. We are trying to manage the memstore flush
 operations such that the hot data remains in memstore for at least 30-40
 mins or longer, so that the job hits disk every 3rd or 4th time it tries to
 operate on the hot data (it does scan).

 We have region server heap size of 20 GB and set the,

 hbase.regionserver.global.memstore.lowerLimit = .45

 hbase.regionserver.global.memstore.upperLimit = .55

 We observed that if we set the Hbase.hregion.memstore.flush.size=128MB only
 10% of the heap is utilized by memstore, after that memstore flushes.

 At Hbase.hregion.memstore.flush.size=512MB, we are able to increase the
 heap utilization to by memstore to 35%.

 It would be very helpful for us to understand the implication of higher
 Hbase.hregion.memstore.flush.size  for a long running cluster.

 Thanks,

 Gautam



Re: impact of using higher Hbase.hregion.memstore.flush.size=512MB

2015-05-27 Thread Gautam Borah
Hi Esteban,

Thanks for your response. hbase.rs.cacheblocksonwrite would be very useful
for us.

We have set hbase.regionserver.maxlogs appropriately to avoid flush across
memstores. Also set hbase.regionserver.optionalcacheflushinterval to 0 to
disable periodic flushing, we do not write anything by passing the WAL.

We are running the cluster with conservative limits, so that if a region
server crashes, others can take the extra load without hitting the memstore
flushing limits.

We are running the cluster now at 800MB flush size, initial job runs are
fine. We will run it for couple of days and check the status.

Thanks again.

Gautam




On Wed, May 27, 2015 at 2:15 PM, Esteban Gutierrez este...@cloudera.com
wrote:

 Gautam,

 Yes, you can increase the size of the memstore to values larger to 128MB
 but usually you go by increasing hbase.hregion.memstore.block.multiplier
 only. Depending on the version of HBase you are running many things can
 happen, e.g. multiple memstores can be flushed at once and/or the memstores
 will be flushed if there are some rows in memory (30 million) or if the
 store hasn't been flushed in an hour, the rate of the flushes can be tuned
 and also if you are hitting the max number of HLogs that can trigger a
 flush. One problem  running with large memstores is mostly how many regions
 you will have per RS and if using some encoding and/or compression codec is
 being used might cause the flush to take longer or use more CPU resources
 or push back clients b/c you haven't flushed some regions to disk.

 Based on the the behavior that you have described on the heap utilization
 sounds like you are not fully utilizing the memstores and you are below the
 lower limit, so depending on the version of HBase and available resources
 you might want to use hbase.rs.cacheblocksonwrite instead to keep some of
 the hot data in the block cache.

 cheers,
 esteban.




 --
 Cloudera, Inc.


 On Wed, May 27, 2015 at 1:58 PM, Gautam Borah gbo...@appdynamics.com
 wrote:

  Hi all,
 
  The default size of Hbase.hregion.memstore.flush.size is define as 128
 MB .
  Could anyone kindly explain what would be the impact if we increase this
 to
  a higher value 512 MB or 800 MB or higher.
 
  We have a very write heavy cluster. Also we run periodic end point co
  processor based jobs that operate on the data written in the last 10-15
  mins, every 10 minute. We are trying to manage the memstore flush
  operations such that the hot data remains in memstore for at least 30-40
  mins or longer, so that the job hits disk every 3rd or 4th time it tries
 to
  operate on the hot data (it does scan).
 
  We have region server heap size of 20 GB and set the,
 
  hbase.regionserver.global.memstore.lowerLimit = .45
 
  hbase.regionserver.global.memstore.upperLimit = .55
 
  We observed that if we set the Hbase.hregion.memstore.flush.size=128MB
 only
  10% of the heap is utilized by memstore, after that memstore flushes.
 
  At Hbase.hregion.memstore.flush.size=512MB, we are able to increase the
  heap utilization to by memstore to 35%.
 
  It would be very helpful for us to understand the implication of higher
  Hbase.hregion.memstore.flush.size  for a long running cluster.
 
  Thanks,
 
  Gautam