[ https://issues.apache.org/jira/browse/HBASE-28399?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
ruanhui updated HBASE-28399: ---------------------------- Priority: Major (was: Minor) > region size can be wrong from RegionSizeCalculator > -------------------------------------------------- > > Key: HBASE-28399 > URL: https://issues.apache.org/jira/browse/HBASE-28399 > Project: HBase > Issue Type: Bug > Components: mapreduce > Affects Versions: 3.0.0-beta-1 > Reporter: ruanhui > Assignee: ruanhui > Priority: Major > Labels: pull-request-available > Fix For: 3.0.0-beta-2 > > > The RegionSizeCalculator calculates region byte size using the following > method > {code:java} > private static final long MEGABYTE = 1024L * 1024L; > long regionSizeBytes = > ((long) regionLoad.getStoreFileSize().get(Size.Unit.MEGABYTE)) * MEGABYTE; > {code} > However, this method will lose accuracy. For example, the result of > {code:java} > ((long) new Size(1, Size.Unit.BYTE).get(Size.Unit.MEGABYTE)) * MEGABYTE {code} > is 0. This will result in a TableInputSplit with a length of 0, but in fact > this TableInputSplit has a small amount of data. > > This TableInputSplit will be ignored if we enable > spark.hadoopRDD.ignoreEmptySplits. -- This message was sent by Atlassian Jira (v8.20.10#820010)