[ https://issues.apache.org/jira/browse/HDFS-16272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Shilun Fan updated HDFS-16272: ------------------------------ Component/s: (was: 3.1.1) > Int overflow in computing safe length during EC block recovery > -------------------------------------------------------------- > > Key: HDFS-16272 > URL: https://issues.apache.org/jira/browse/HDFS-16272 > Project: Hadoop HDFS > Issue Type: Bug > Components: ec, erasure-coding > Affects Versions: 3.3.0, 3.3.1 > Environment: Cluster settings: EC RS-8-2-256k, Block Size 1GiB. > Reporter: daimin > Assignee: daimin > Priority: Critical > Labels: pull-request-available > Fix For: 3.4.0, 3.2.3, 3.3.2 > > Time Spent: 1.5h > Remaining Estimate: 0h > > There exists an int overflow problem in StripedBlockUtil#getSafeLength, which > will produce a negative or zero length: > 1. With negative length, it fails to the later >=0 check, and will crash the > BlockRecoveryWorker thread, which make the lease recovery operation unable to > finish. > 2. With zero length, it passes the check, and directly truncate the block > size to zero, leads to data lossing. > If you are using any of the default EC policies (3-2, 6-3 or 10-4) and the > default HDFS block size of 128MB, then you will not be impacted by this issue. > To be impacted, the EC dataNumber * blockSize has to be larger than the Java > max int of 2,147,483,647. > For example 10-4 is 10 * 134217728 = 1,342,177,280 which is OK. > However 10-4 with 256MB blocks is 2,684,354,560 which overflows the INT and > causes the problem. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org