[ 
https://issues.apache.org/jira/browse/HDFS-16272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-16272:
------------------------------
    Component/s:     (was: 3.1.1)

> Int overflow in computing safe length during EC block recovery
> --------------------------------------------------------------
>
>                 Key: HDFS-16272
>                 URL: https://issues.apache.org/jira/browse/HDFS-16272
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: ec, erasure-coding
>    Affects Versions: 3.3.0, 3.3.1
>         Environment: Cluster settings: EC RS-8-2-256k, Block Size 1GiB.
>            Reporter: daimin
>            Assignee: daimin
>            Priority: Critical
>              Labels: pull-request-available
>             Fix For: 3.4.0, 3.2.3, 3.3.2
>
>          Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> There exists an int overflow problem in StripedBlockUtil#getSafeLength, which 
> will produce a negative or zero length:
> 1. With negative length, it fails to the later >=0 check, and will crash the 
> BlockRecoveryWorker thread, which make the lease recovery operation unable to 
> finish.
> 2. With zero length, it passes the check, and directly truncate the block 
> size to zero, leads to data lossing.
> If you are using any of the default EC policies (3-2, 6-3 or 10-4) and the 
> default HDFS block size of 128MB, then you will not be impacted by this issue.
> To be impacted, the EC dataNumber * blockSize has to be larger than the Java 
> max int of 2,147,483,647.
> For example 10-4 is 10 * 134217728 = 1,342,177,280 which is OK.
> However 10-4 with 256MB blocks is 2,684,354,560 which overflows the INT and 
> causes the problem.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to