[jira] [Updated] (HDFS-11674) reserveSpaceForReplicas is not released if append request failed due to mirror down and replica recovered
[ https://issues.apache.org/jira/browse/HDFS-11674?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Konstantin Shvachko updated HDFS-11674: --- Labels: (was: release-blocker) > reserveSpaceForReplicas is not released if append request failed due to > mirror down and replica recovered > - > > Key: HDFS-11674 > URL: https://issues.apache.org/jira/browse/HDFS-11674 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Reporter: Vinayakumar B >Assignee: Vinayakumar B >Priority: Critical > Fix For: 2.9.0, 2.7.4, 3.0.0-alpha3, 2.8.1 > > Attachments: HDFS-11674-01.patch, HDFS-11674-02.patch, > HDFS-11674-03.patch, HDFS-11674-branch-2.7-03.patch > > > Scenario: > 1. 3 Node cluster with > "dfs.client.block.write.replace-datanode-on-failure.policy" as DEFAULT > Block is written with x data. > 2. One of the Datanode, NOT the first DN, is down > 3. Client tries to append data to block and fails since one DN is down. > 4. calls recoverLease() on the file. > 5. Successfull recovery happens. > Issue: > 1. DNs which were connected from client before encountering mirror down, will > have the reservedSpaceForReplicas incremented, BUT never decremented. > 2. So in long run DN's all space will be in reservedSpaceForReplicas > resulting OutOfSpace errors. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-11674) reserveSpaceForReplicas is not released if append request failed due to mirror down and replica recovered
[ https://issues.apache.org/jira/browse/HDFS-11674?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinayakumar B updated HDFS-11674: - Resolution: Fixed Hadoop Flags: Reviewed Fix Version/s: 2.8.1 3.0.0-alpha3 2.7.4 2.9.0 Status: Resolved (was: Patch Available) Thanks [~arpitagarwal] and [~brahmareddy] for reviews. Committed to branch-2.7 as well. > reserveSpaceForReplicas is not released if append request failed due to > mirror down and replica recovered > - > > Key: HDFS-11674 > URL: https://issues.apache.org/jira/browse/HDFS-11674 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Reporter: Vinayakumar B >Assignee: Vinayakumar B >Priority: Critical > Labels: release-blocker > Fix For: 2.9.0, 2.7.4, 3.0.0-alpha3, 2.8.1 > > Attachments: HDFS-11674-01.patch, HDFS-11674-02.patch, > HDFS-11674-03.patch, HDFS-11674-branch-2.7-03.patch > > > Scenario: > 1. 3 Node cluster with > "dfs.client.block.write.replace-datanode-on-failure.policy" as DEFAULT > Block is written with x data. > 2. One of the Datanode, NOT the first DN, is down > 3. Client tries to append data to block and fails since one DN is down. > 4. calls recoverLease() on the file. > 5. Successfull recovery happens. > Issue: > 1. DNs which were connected from client before encountering mirror down, will > have the reservedSpaceForReplicas incremented, BUT never decremented. > 2. So in long run DN's all space will be in reservedSpaceForReplicas > resulting OutOfSpace errors. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-11674) reserveSpaceForReplicas is not released if append request failed due to mirror down and replica recovered
[ https://issues.apache.org/jira/browse/HDFS-11674?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinayakumar B updated HDFS-11674: - Attachment: HDFS-11674-branch-2.7-03.patch > reserveSpaceForReplicas is not released if append request failed due to > mirror down and replica recovered > - > > Key: HDFS-11674 > URL: https://issues.apache.org/jira/browse/HDFS-11674 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Reporter: Vinayakumar B >Assignee: Vinayakumar B >Priority: Critical > Labels: release-blocker > Attachments: HDFS-11674-01.patch, HDFS-11674-02.patch, > HDFS-11674-03.patch, HDFS-11674-branch-2.7-03.patch > > > Scenario: > 1. 3 Node cluster with > "dfs.client.block.write.replace-datanode-on-failure.policy" as DEFAULT > Block is written with x data. > 2. One of the Datanode, NOT the first DN, is down > 3. Client tries to append data to block and fails since one DN is down. > 4. calls recoverLease() on the file. > 5. Successfull recovery happens. > Issue: > 1. DNs which were connected from client before encountering mirror down, will > have the reservedSpaceForReplicas incremented, BUT never decremented. > 2. So in long run DN's all space will be in reservedSpaceForReplicas > resulting OutOfSpace errors. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-11674) reserveSpaceForReplicas is not released if append request failed due to mirror down and replica recovered
[ https://issues.apache.org/jira/browse/HDFS-11674?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinayakumar B updated HDFS-11674: - Attachment: HDFS-11674-03.patch Attached the updated patch. There was a chance that test could timeout, where stopped datanode is chosen as primary for block recovery. Now marked the node as dead just before recovery in test. > reserveSpaceForReplicas is not released if append request failed due to > mirror down and replica recovered > - > > Key: HDFS-11674 > URL: https://issues.apache.org/jira/browse/HDFS-11674 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Reporter: Vinayakumar B >Assignee: Vinayakumar B >Priority: Critical > Labels: release-blocker > Attachments: HDFS-11674-01.patch, HDFS-11674-02.patch, > HDFS-11674-03.patch > > > Scenario: > 1. 3 Node cluster with > "dfs.client.block.write.replace-datanode-on-failure.policy" as DEFAULT > Block is written with x data. > 2. One of the Datanode, NOT the first DN, is down > 3. Client tries to append data to block and fails since one DN is down. > 4. calls recoverLease() on the file. > 5. Successfull recovery happens. > Issue: > 1. DNs which were connected from client before encountering mirror down, will > have the reservedSpaceForReplicas incremented, BUT never decremented. > 2. So in long run DN's all space will be in reservedSpaceForReplicas > resulting OutOfSpace errors. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-11674) reserveSpaceForReplicas is not released if append request failed due to mirror down and replica recovered
[ https://issues.apache.org/jira/browse/HDFS-11674?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Konstantin Shvachko updated HDFS-11674: --- Labels: release-blocker (was: ) > reserveSpaceForReplicas is not released if append request failed due to > mirror down and replica recovered > - > > Key: HDFS-11674 > URL: https://issues.apache.org/jira/browse/HDFS-11674 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Reporter: Vinayakumar B >Assignee: Vinayakumar B >Priority: Critical > Labels: release-blocker > Attachments: HDFS-11674-01.patch, HDFS-11674-02.patch > > > Scenario: > 1. 3 Node cluster with > "dfs.client.block.write.replace-datanode-on-failure.policy" as DEFAULT > Block is written with x data. > 2. One of the Datanode, NOT the first DN, is down > 3. Client tries to append data to block and fails since one DN is down. > 4. calls recoverLease() on the file. > 5. Successfull recovery happens. > Issue: > 1. DNs which were connected from client before encountering mirror down, will > have the reservedSpaceForReplicas incremented, BUT never decremented. > 2. So in long run DN's all space will be in reservedSpaceForReplicas > resulting OutOfSpace errors. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-11674) reserveSpaceForReplicas is not released if append request failed due to mirror down and replica recovered
[ https://issues.apache.org/jira/browse/HDFS-11674?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinayakumar B updated HDFS-11674: - Attachment: HDFS-11674-02.patch Updated the patch. Please review. > reserveSpaceForReplicas is not released if append request failed due to > mirror down and replica recovered > - > > Key: HDFS-11674 > URL: https://issues.apache.org/jira/browse/HDFS-11674 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Reporter: Vinayakumar B >Assignee: Vinayakumar B >Priority: Critical > Attachments: HDFS-11674-01.patch, HDFS-11674-02.patch > > > Scenario: > 1. 3 Node cluster with > "dfs.client.block.write.replace-datanode-on-failure.policy" as DEFAULT > Block is written with x data. > 2. One of the Datanode, NOT the first DN, is down > 3. Client tries to append data to block and fails since one DN is down. > 4. calls recoverLease() on the file. > 5. Successfull recovery happens. > Issue: > 1. DNs which were connected from client before encountering mirror down, will > have the reservedSpaceForReplicas incremented, BUT never decremented. > 2. So in long run DN's all space will be in reservedSpaceForReplicas > resulting OutOfSpace errors. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-11674) reserveSpaceForReplicas is not released if append request failed due to mirror down and replica recovered
[ https://issues.apache.org/jira/browse/HDFS-11674?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinayakumar B updated HDFS-11674: - Target Version/s: 2.7.4, 2.8.1 Status: Patch Available (was: Open) > reserveSpaceForReplicas is not released if append request failed due to > mirror down and replica recovered > - > > Key: HDFS-11674 > URL: https://issues.apache.org/jira/browse/HDFS-11674 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Reporter: Vinayakumar B >Assignee: Vinayakumar B >Priority: Critical > Attachments: HDFS-11674-01.patch > > > Scenario: > 1. 3 Node cluster with > "dfs.client.block.write.replace-datanode-on-failure.policy" as DEFAULT > Block is written with x data. > 2. One of the Datanode, NOT the first DN, is down > 3. Client tries to append data to block and fails since one DN is down. > 4. calls recoverLease() on the file. > 5. Successfull recovery happens. > Issue: > 1. DNs which were connected from client before encountering mirror down, will > have the reservedSpaceForReplicas incremented, BUT never decremented. > 2. So in long run DN's all space will be in reservedSpaceForReplicas > resulting OutOfSpace errors. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-11674) reserveSpaceForReplicas is not released if append request failed due to mirror down and replica recovered
[ https://issues.apache.org/jira/browse/HDFS-11674?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinayakumar B updated HDFS-11674: - Priority: Critical (was: Major) > reserveSpaceForReplicas is not released if append request failed due to > mirror down and replica recovered > - > > Key: HDFS-11674 > URL: https://issues.apache.org/jira/browse/HDFS-11674 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Reporter: Vinayakumar B >Assignee: Vinayakumar B >Priority: Critical > Attachments: HDFS-11674-01.patch > > > Scenario: > 1. 3 Node cluster with > "dfs.client.block.write.replace-datanode-on-failure.policy" as DEFAULT > Block is written with x data. > 2. One of the Datanode, NOT the first DN, is down > 3. Client tries to append data to block and fails since one DN is down. > 4. calls recoverLease() on the file. > 5. Successfull recovery happens. > Issue: > 1. DNs which were connected from client before encountering mirror down, will > have the reservedSpaceForReplicas incremented, BUT never decremented. > 2. So in long run DN's all space will be in reservedSpaceForReplicas > resulting OutOfSpace errors. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-11674) reserveSpaceForReplicas is not released if append request failed due to mirror down and replica recovered
[ https://issues.apache.org/jira/browse/HDFS-11674?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinayakumar B updated HDFS-11674: - Attachment: HDFS-11674-01.patch Attached the patch. Please review. > reserveSpaceForReplicas is not released if append request failed due to > mirror down and replica recovered > - > > Key: HDFS-11674 > URL: https://issues.apache.org/jira/browse/HDFS-11674 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Reporter: Vinayakumar B >Assignee: Vinayakumar B >Priority: Critical > Attachments: HDFS-11674-01.patch > > > Scenario: > 1. 3 Node cluster with > "dfs.client.block.write.replace-datanode-on-failure.policy" as DEFAULT > Block is written with x data. > 2. One of the Datanode, NOT the first DN, is down > 3. Client tries to append data to block and fails since one DN is down. > 4. calls recoverLease() on the file. > 5. Successfull recovery happens. > Issue: > 1. DNs which were connected from client before encountering mirror down, will > have the reservedSpaceForReplicas incremented, BUT never decremented. > 2. So in long run DN's all space will be in reservedSpaceForReplicas > resulting OutOfSpace errors. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org