[DISCUSS] Should flush decisions be made based on data size (key-value only) or based on heap size (including metadata overhead)?

Eshcar Hillel Wed, 05 Jul 2017 06:30:48 -0700

Hi All,
I opened a new Jira https://issues.apache.org/jira/browse/HBASE-18294 to 
discuss this question.
Flush decisions are taken at the region level and also at the region server 
level - there is the question of when to trigger a flush and then which 
region/store to flush.Regions track both their data size (key-value size only) 
and their total heap occupancy (including index and additional metadata).One 
option (which was the past policy) is to trigger flushes and choose flush 
subjects based on regions heap size - this gives a better estimation for 
sysadmin of how many regions can a RS carry.Another option (which is the 
current policy) is to look at the data size - this gives a better estimation of 
the size of the files that are created by the flush.  
I see this is as critical to HBase performance and usability, namely meeting 
the user expectation from the system, hence I would like to hear as many voices 
as possible.Please join the discussion in the Jira and let us know what you 
think.
Thanks,Eshcar

[DISCUSS] Should flush decisions be made based on data size (key-value only) or based on heap size (including metadata overhead)?

Reply via email to