ruanhui created HBASE-28399:
-------------------------------

             Summary: region size can be wrong from RegionSizeCalculator
                 Key: HBASE-28399
                 URL: https://issues.apache.org/jira/browse/HBASE-28399
             Project: HBase
          Issue Type: Bug
          Components: mapreduce
    Affects Versions: 3.0.0-beta-1
            Reporter: ruanhui
            Assignee: ruanhui
             Fix For: 3.0.0-beta-2


The RegionSizeCalculator calculates region byte size using the following method
{code:java}
private static final long MEGABYTE = 1024L * 1024L;
long regionSizeBytes =
  ((long) regionLoad.getStoreFileSize().get(Size.Unit.MEGABYTE)) * MEGABYTE; 
{code}
However, this method will lose accuracy. For example, the result of 
{code:java}
((long) new Size(1, Size.Unit.BYTE).get(Size.Unit.MEGABYTE)) * MEGABYTE {code}
is 0. This will result in a TableInputSplit with a length of 0, but in fact 
this TableInputSplit has a small amount of data.

 

This TableInputSplit will be ignored if we enable 
spark.hadoopRDD.ignoreEmptySplits.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to