Hi All,
I opened a new Jira https://issues.apache.org/jira/browse/HBASE-18294 to
discuss this question.
Flush decisions are taken at the region level and also at the region server
level - there is the question of when to trigger a flush and then which
region/store to flush.Regions track both their data size (key-value size only)
and their total heap occupancy (including index and additional metadata).One
option (which was the past policy) is to trigger flushes and choose flush
subjects based on regions heap size - this gives a better estimation for
sysadmin of how many regions can a RS carry.Another option (which is the
current policy) is to look at the data size - this gives a better estimation of
the size of the files that are created by the flush.
I see this is as critical to HBase performance and usability, namely meeting
the user expectation from the system, hence I would like to hear as many voices
as possible.Please join the discussion in the Jira and let us know what you
think.
Thanks,Eshcar