ruanhui created HBASE-28399: ------------------------------- Summary: region size can be wrong from RegionSizeCalculator Key: HBASE-28399 URL: https://issues.apache.org/jira/browse/HBASE-28399 Project: HBase Issue Type: Bug Components: mapreduce Affects Versions: 3.0.0-beta-1 Reporter: ruanhui Assignee: ruanhui Fix For: 3.0.0-beta-2
The RegionSizeCalculator calculates region byte size using the following method {code:java} private static final long MEGABYTE = 1024L * 1024L; long regionSizeBytes = ((long) regionLoad.getStoreFileSize().get(Size.Unit.MEGABYTE)) * MEGABYTE; {code} However, this method will lose accuracy. For example, the result of {code:java} ((long) new Size(1, Size.Unit.BYTE).get(Size.Unit.MEGABYTE)) * MEGABYTE {code} is 0. This will result in a TableInputSplit with a length of 0, but in fact this TableInputSplit has a small amount of data. This TableInputSplit will be ignored if we enable spark.hadoopRDD.ignoreEmptySplits. -- This message was sent by Atlassian Jira (v8.20.10#820010)