ruanhui created HBASE-28399:
-------------------------------
Summary: region size can be wrong from RegionSizeCalculator
Key: HBASE-28399
URL: https://issues.apache.org/jira/browse/HBASE-28399
Project: HBase
Issue Type: Bug
Components: mapreduce
Affects Versions: 3.0.0-beta-1
Reporter: ruanhui
Assignee: ruanhui
Fix For: 3.0.0-beta-2
The RegionSizeCalculator calculates region byte size using the following method
{code:java}
private static final long MEGABYTE = 1024L * 1024L;
long regionSizeBytes =
((long) regionLoad.getStoreFileSize().get(Size.Unit.MEGABYTE)) * MEGABYTE;
{code}
However, this method will lose accuracy. For example, the result of
{code:java}
((long) new Size(1, Size.Unit.BYTE).get(Size.Unit.MEGABYTE)) * MEGABYTE {code}
is 0. This will result in a TableInputSplit with a length of 0, but in fact
this TableInputSplit has a small amount of data.
This TableInputSplit will be ignored if we enable
spark.hadoopRDD.ignoreEmptySplits.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)