[ https://issues.apache.org/jira/browse/HDFS-11576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16163554#comment-16163554 ]
Íñigo Goiri commented on HDFS-11576: ------------------------------------ The unit tests seem the ones disabled tests in HDFS-12417 which was committed an hour ago. This fix looks good to me. > Block recovery will fail indefinitely if recovery time > heartbeat interval > --------------------------------------------------------------------------- > > Key: HDFS-11576 > URL: https://issues.apache.org/jira/browse/HDFS-11576 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode, hdfs, namenode > Affects Versions: 2.7.1, 2.7.2, 2.7.3, 3.0.0-alpha1, 3.0.0-alpha2 > Reporter: Lukas Majercak > Assignee: Lukas Majercak > Priority: Critical > Attachments: HDFS-11576.001.patch, HDFS-11576.002.patch, > HDFS-11576.003.patch, HDFS-11576.004.patch, HDFS-11576.005.patch, > HDFS-11576.006.patch, HDFS-11576.007.patch, HDFS-11576.008.patch, > HDFS-11576.009.patch, HDFS-11576.010.patch, HDFS-11576.011.patch, > HDFS-11576.repro.patch > > > Block recovery will fail indefinitely if the time to recover a block is > always longer than the heartbeat interval. Scenario: > 1. DN sends heartbeat > 2. NN sends a recovery command to DN, recoveryID=X > 3. DN starts recovery > 4. DN sends another heartbeat > 5. NN sends a recovery command to DN, recoveryID=X+1 > 6. DN calls commitBlockSyncronization after succeeding with first recovery to > NN, which fails because X < X+1 > ... -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org