[ 
https://issues.apache.org/jira/browse/HDFS-16272?focusedWorklogId=665616&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-665616
 ]

ASF GitHub Bot logged work on HDFS-16272:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 14/Oct/21 10:46
            Start Date: 14/Oct/21 10:46
    Worklog Time Spent: 10m 
      Work Description: cndaimin commented on a change in pull request #3548:
URL: https://github.com/apache/hadoop/pull/3548#discussion_r728860506



##########
File path: 
hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/util/StripedBlockUtil.java
##########
@@ -245,8 +245,7 @@ public static long getSafeLength(ErasureCodingPolicy 
ecPolicy,
     Arrays.sort(cpy);
     // full stripe is a stripe has at least dataBlkNum full cells.
     // lastFullStripeIdx is the index of the last full stripe.
-    int lastFullStripeIdx =
-        (int) (cpy[cpy.length - dataBlkNum] / cellSize);
+    long lastFullStripeIdx = cpy[cpy.length - dataBlkNum] / cellSize;

Review comment:
       Many thanks for the review! @sodonnel 
   1. My understanding to this why not pick the first one is that the EC 
background reconstruction procedure has the ability to compute the rest 2 
blocks based on the 3 good blocks(take RS-3-2 policy for example).
   2. I did notice a todo in code comments of this method: "Include 
lastFullStripeIdx+1 stripe in safeLength, if there exists such a stripe (and it 
must be partial).", but I might not involve this since that work may take a 
while :(
   3. `cellIdxInBlk * cellSize * dataBlkNum` in `offsetInBlkToOffsetInBG` has 
the same problem, I am glad to fix that too.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
-------------------

    Worklog Id:     (was: 665616)
    Time Spent: 1h 10m  (was: 1h)

> Int overflow in computing safe length during EC block recovery
> --------------------------------------------------------------
>
>                 Key: HDFS-16272
>                 URL: https://issues.apache.org/jira/browse/HDFS-16272
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: 3.1.1
>    Affects Versions: 3.3.0, 3.3.1
>         Environment: Cluster settings: EC RS-8-2-256k, Block Size 1GiB.
>            Reporter: daimin
>            Assignee: daimin
>            Priority: Critical
>              Labels: pull-request-available
>          Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> There exists an int overflow problem in StripedBlockUtil#getSafeLength, which 
> will produce a negative or zero length:
> 1. With negative length, it fails to the later >=0 check, and will crash the 
> BlockRecoveryWorker thread, which make the lease recovery operation unable to 
> finish.
> 2. With zero length, it passes the check, and directly truncate the block 
> size to zero, leads to data lossing.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to