Norbert Kalmár created HBASE-26340:
--------------------------------------
Summary: TableSplit returns false size under 1MB
Key: HBASE-26340
URL: https://issues.apache.org/jira/browse/HBASE-26340
Project: HBase
Issue Type: Bug
Reporter: Norbert Kalmár
We calculate region size in the mapreduce package by getting the size in MB
first and multiplying:
https://github.com/apache/hbase/blob/39a20c528e2bf27cedf12734dbdb1b7b1e538076/hbase-mapreduce/src/main/java/org/apache/hadoop/hbase/mapreduce/RegionSizeCalculator.java#L87
This will give a size of 0 until at least 1MB is reached. (And it will have an
unwanted rounding affect as well).
Spark for example can be tuned to do some performance tuning by eliminating the
0 sized regions. This will eliminate any small regions which are not actually
empty. The hadoop interface states the size is returned in bytes, and while
this is true do to the multiplication, we multiply by 0 until 1MB is reached.
I'm not sure why we get the size in MB units and not in bytes straight up.
Should we fix this?
--
This message was sent by Atlassian Jira
(v8.3.4#803005)