[jira] [Updated] (HBASE-28399) region size can be wrong from RegionSizeCalculator

ruanhui (Jira) Sun, 25 Feb 2024 19:38:05 -0800


     [ 
https://issues.apache.org/jira/browse/HBASE-28399?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


ruanhui updated HBASE-28399:
----------------------------
    Priority: Major  (was: Minor)

> region size can be wrong from RegionSizeCalculator
> --------------------------------------------------
>
>                 Key: HBASE-28399
>                 URL: https://issues.apache.org/jira/browse/HBASE-28399
>             Project: HBase
>          Issue Type: Bug
>          Components: mapreduce
>    Affects Versions: 3.0.0-beta-1
>            Reporter: ruanhui
>            Assignee: ruanhui
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 3.0.0-beta-2
>
>
> The RegionSizeCalculator calculates region byte size using the following 
> method
> {code:java}
> private static final long MEGABYTE = 1024L * 1024L;
> long regionSizeBytes =
>   ((long) regionLoad.getStoreFileSize().get(Size.Unit.MEGABYTE)) * MEGABYTE; 
> {code}
> However, this method will lose accuracy. For example, the result of 
> {code:java}
> ((long) new Size(1, Size.Unit.BYTE).get(Size.Unit.MEGABYTE)) * MEGABYTE {code}
> is 0. This will result in a TableInputSplit with a length of 0, but in fact 
> this TableInputSplit has a small amount of data.
>  
> This TableInputSplit will be ignored if we enable 
> spark.hadoopRDD.ignoreEmptySplits.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HBASE-28399) region size can be wrong from RegionSizeCalculator

Reply via email to