[jira] [Updated] (HDFS-7548) Corrupt block reporting delayed until datablock scanner thread detects it
[ https://issues.apache.org/jira/browse/HDFS-7548?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rushabh S Shah updated HDFS-7548: - Status: Open (was: Patch Available) Cancelling the patch to address findbugs warning Corrupt block reporting delayed until datablock scanner thread detects it - Key: HDFS-7548 URL: https://issues.apache.org/jira/browse/HDFS-7548 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.5.0 Reporter: Rushabh S Shah Assignee: Rushabh S Shah Attachments: HDFS-7548-v2.patch, HDFS-7548-v3.patch, HDFS-7548-v4.patch, HDFS-7548.patch When there is one datanode holding the block and that block happened to be corrupt, namenode would keep on trying to replicate the block repeatedly but it would only report the block as corrupt only when the data block scanner thread of the datanode picks up this bad block. Requesting improvement in namenode reporting so that corrupt replica would be reported when there is only 1 replica and the replication of that replica keeps on failing with the checksum error. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7548) Corrupt block reporting delayed until datablock scanner thread detects it
[ https://issues.apache.org/jira/browse/HDFS-7548?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rushabh S Shah updated HDFS-7548: - Status: Patch Available (was: Open) Corrupt block reporting delayed until datablock scanner thread detects it - Key: HDFS-7548 URL: https://issues.apache.org/jira/browse/HDFS-7548 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.5.0 Reporter: Rushabh S Shah Assignee: Rushabh S Shah Attachments: HDFS-7548-v2.patch, HDFS-7548-v3.patch, HDFS-7548-v4.patch, HDFS-7548-v5.patch, HDFS-7548.patch When there is one datanode holding the block and that block happened to be corrupt, namenode would keep on trying to replicate the block repeatedly but it would only report the block as corrupt only when the data block scanner thread of the datanode picks up this bad block. Requesting improvement in namenode reporting so that corrupt replica would be reported when there is only 1 replica and the replication of that replica keeps on failing with the checksum error. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7548) Corrupt block reporting delayed until datablock scanner thread detects it
[ https://issues.apache.org/jira/browse/HDFS-7548?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rushabh S Shah updated HDFS-7548: - Attachment: HDFS-7548-v5.patch Attachnig a new patch to address findbugs warning. Added synchronized keyword to newly added method. Corrupt block reporting delayed until datablock scanner thread detects it - Key: HDFS-7548 URL: https://issues.apache.org/jira/browse/HDFS-7548 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.5.0 Reporter: Rushabh S Shah Assignee: Rushabh S Shah Attachments: HDFS-7548-v2.patch, HDFS-7548-v3.patch, HDFS-7548-v4.patch, HDFS-7548-v5.patch, HDFS-7548.patch When there is one datanode holding the block and that block happened to be corrupt, namenode would keep on trying to replicate the block repeatedly but it would only report the block as corrupt only when the data block scanner thread of the datanode picks up this bad block. Requesting improvement in namenode reporting so that corrupt replica would be reported when there is only 1 replica and the replication of that replica keeps on failing with the checksum error. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7548) Corrupt block reporting delayed until datablock scanner thread detects it
[ https://issues.apache.org/jira/browse/HDFS-7548?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kihwal Lee updated HDFS-7548: - Resolution: Fixed Fix Version/s: 2.7.0 Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) Thanks for working on the bug, Rushabh. I've committed this to trunk and branch-2. Corrupt block reporting delayed until datablock scanner thread detects it - Key: HDFS-7548 URL: https://issues.apache.org/jira/browse/HDFS-7548 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.5.0 Reporter: Rushabh S Shah Assignee: Rushabh S Shah Fix For: 2.7.0 Attachments: HDFS-7548-v2.patch, HDFS-7548-v3.patch, HDFS-7548-v4.patch, HDFS-7548-v5.patch, HDFS-7548.patch When there is one datanode holding the block and that block happened to be corrupt, namenode would keep on trying to replicate the block repeatedly but it would only report the block as corrupt only when the data block scanner thread of the datanode picks up this bad block. Requesting improvement in namenode reporting so that corrupt replica would be reported when there is only 1 replica and the replication of that replica keeps on failing with the checksum error. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7548) Corrupt block reporting delayed until datablock scanner thread detects it
[ https://issues.apache.org/jira/browse/HDFS-7548?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rushabh S Shah updated HDFS-7548: - Status: Open (was: Patch Available) Cancelling the patch to address Daryn's comment. Corrupt block reporting delayed until datablock scanner thread detects it - Key: HDFS-7548 URL: https://issues.apache.org/jira/browse/HDFS-7548 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.5.0 Reporter: Rushabh S Shah Assignee: Rushabh S Shah Attachments: HDFS-7548-v2.patch, HDFS-7548-v3.patch, HDFS-7548.patch When there is one datanode holding the block and that block happened to be corrupt, namenode would keep on trying to replicate the block repeatedly but it would only report the block as corrupt only when the data block scanner thread of the datanode picks up this bad block. Requesting improvement in namenode reporting so that corrupt replica would be reported when there is only 1 replica and the replication of that replica keeps on failing with the checksum error. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7548) Corrupt block reporting delayed until datablock scanner thread detects it
[ https://issues.apache.org/jira/browse/HDFS-7548?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rushabh S Shah updated HDFS-7548: - Attachment: HDFS-7548-v4.patch Attaching a new patch addressing Daryn's comments. Corrupt block reporting delayed until datablock scanner thread detects it - Key: HDFS-7548 URL: https://issues.apache.org/jira/browse/HDFS-7548 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.5.0 Reporter: Rushabh S Shah Assignee: Rushabh S Shah Attachments: HDFS-7548-v2.patch, HDFS-7548-v3.patch, HDFS-7548-v4.patch, HDFS-7548.patch When there is one datanode holding the block and that block happened to be corrupt, namenode would keep on trying to replicate the block repeatedly but it would only report the block as corrupt only when the data block scanner thread of the datanode picks up this bad block. Requesting improvement in namenode reporting so that corrupt replica would be reported when there is only 1 replica and the replication of that replica keeps on failing with the checksum error. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7548) Corrupt block reporting delayed until datablock scanner thread detects it
[ https://issues.apache.org/jira/browse/HDFS-7548?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rushabh S Shah updated HDFS-7548: - Status: Patch Available (was: Open) Corrupt block reporting delayed until datablock scanner thread detects it - Key: HDFS-7548 URL: https://issues.apache.org/jira/browse/HDFS-7548 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.5.0 Reporter: Rushabh S Shah Assignee: Rushabh S Shah Attachments: HDFS-7548-v2.patch, HDFS-7548-v3.patch, HDFS-7548-v4.patch, HDFS-7548.patch When there is one datanode holding the block and that block happened to be corrupt, namenode would keep on trying to replicate the block repeatedly but it would only report the block as corrupt only when the data block scanner thread of the datanode picks up this bad block. Requesting improvement in namenode reporting so that corrupt replica would be reported when there is only 1 replica and the replication of that replica keeps on failing with the checksum error. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7548) Corrupt block reporting delayed until datablock scanner thread detects it
[ https://issues.apache.org/jira/browse/HDFS-7548?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rushabh S Shah updated HDFS-7548: - Status: Open (was: Patch Available) Talked offline with Daryn. He uncovered a potential bug. So cancelling the patch. Corrupt block reporting delayed until datablock scanner thread detects it - Key: HDFS-7548 URL: https://issues.apache.org/jira/browse/HDFS-7548 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.5.0 Reporter: Rushabh S Shah Assignee: Rushabh S Shah Attachments: HDFS-7548-v2.patch, HDFS-7548.patch When there is one datanode holding the block and that block happened to be corrupt, namenode would keep on trying to replicate the block repeatedly but it would only report the block as corrupt only when the data block scanner thread of the datanode picks up this bad block. Requesting improvement in namenode reporting so that corrupt replica would be reported when there is only 1 replica and the replication of that replica keeps on failing with the checksum error. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7548) Corrupt block reporting delayed until datablock scanner thread detects it
[ https://issues.apache.org/jira/browse/HDFS-7548?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rushabh S Shah updated HDFS-7548: - Status: Patch Available (was: Open) Corrupt block reporting delayed until datablock scanner thread detects it - Key: HDFS-7548 URL: https://issues.apache.org/jira/browse/HDFS-7548 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.5.0 Reporter: Rushabh S Shah Assignee: Rushabh S Shah Attachments: HDFS-7548-v2.patch, HDFS-7548-v3.patch, HDFS-7548.patch When there is one datanode holding the block and that block happened to be corrupt, namenode would keep on trying to replicate the block repeatedly but it would only report the block as corrupt only when the data block scanner thread of the datanode picks up this bad block. Requesting improvement in namenode reporting so that corrupt replica would be reported when there is only 1 replica and the replication of that replica keeps on failing with the checksum error. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7548) Corrupt block reporting delayed until datablock scanner thread detects it
[ https://issues.apache.org/jira/browse/HDFS-7548?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rushabh S Shah updated HDFS-7548: - Attachment: HDFS-7548-v3.patch Attaching a new patch addressing Daryn's offline comments. Corrupt block reporting delayed until datablock scanner thread detects it - Key: HDFS-7548 URL: https://issues.apache.org/jira/browse/HDFS-7548 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.5.0 Reporter: Rushabh S Shah Assignee: Rushabh S Shah Attachments: HDFS-7548-v2.patch, HDFS-7548-v3.patch, HDFS-7548.patch When there is one datanode holding the block and that block happened to be corrupt, namenode would keep on trying to replicate the block repeatedly but it would only report the block as corrupt only when the data block scanner thread of the datanode picks up this bad block. Requesting improvement in namenode reporting so that corrupt replica would be reported when there is only 1 replica and the replication of that replica keeps on failing with the checksum error. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7548) Corrupt block reporting delayed until datablock scanner thread detects it
[ https://issues.apache.org/jira/browse/HDFS-7548?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rushabh S Shah updated HDFS-7548: - Status: Patch Available (was: Open) Corrupt block reporting delayed until datablock scanner thread detects it - Key: HDFS-7548 URL: https://issues.apache.org/jira/browse/HDFS-7548 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.5.0 Reporter: Rushabh S Shah Assignee: Rushabh S Shah Attachments: HDFS-7548-v2.patch, HDFS-7548.patch When there is one datanode holding the block and that block happened to be corrupt, namenode would keep on trying to replicate the block repeatedly but it would only report the block as corrupt only when the data block scanner thread of the datanode picks up this bad block. Requesting improvement in namenode reporting so that corrupt replica would be reported when there is only 1 replica and the replication of that replica keeps on failing with the checksum error. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7548) Corrupt block reporting delayed until datablock scanner thread detects it
[ https://issues.apache.org/jira/browse/HDFS-7548?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rushabh S Shah updated HDFS-7548: - Attachment: HDFS-7548-v2.patch Attaching a new patch addressing Nathan's and Daryn's comment. Please review. Corrupt block reporting delayed until datablock scanner thread detects it - Key: HDFS-7548 URL: https://issues.apache.org/jira/browse/HDFS-7548 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.5.0 Reporter: Rushabh S Shah Assignee: Rushabh S Shah Attachments: HDFS-7548-v2.patch, HDFS-7548.patch When there is one datanode holding the block and that block happened to be corrupt, namenode would keep on trying to replicate the block repeatedly but it would only report the block as corrupt only when the data block scanner thread of the datanode picks up this bad block. Requesting improvement in namenode reporting so that corrupt replica would be reported when there is only 1 replica and the replication of that replica keeps on failing with the checksum error. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7548) Corrupt block reporting delayed until datablock scanner thread detects it
[ https://issues.apache.org/jira/browse/HDFS-7548?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rushabh S Shah updated HDFS-7548: - Status: Open (was: Patch Available) Thanks Daryn for the review. Cancelling the current patch to address Daryn's comments. Will update the patch shortly. @Nathan: What action needs to be taken if java.io.IOException: Input/output error occurs. Corrupt block reporting delayed until datablock scanner thread detects it - Key: HDFS-7548 URL: https://issues.apache.org/jira/browse/HDFS-7548 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.5.0 Reporter: Rushabh S Shah Assignee: Rushabh S Shah Attachments: HDFS-7548.patch When there is one datanode holding the block and that block happened to be corrupt, namenode would keep on trying to replicate the block repeatedly but it would only report the block as corrupt only when the data block scanner thread of the datanode picks up this bad block. Requesting improvement in namenode reporting so that corrupt replica would be reported when there is only 1 replica and the replication of that replica keeps on failing with the checksum error. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7548) Corrupt block reporting delayed until datablock scanner thread detects it
[ https://issues.apache.org/jira/browse/HDFS-7548?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rushabh S Shah updated HDFS-7548: - Status: Patch Available (was: Open) Corrupt block reporting delayed until datablock scanner thread detects it - Key: HDFS-7548 URL: https://issues.apache.org/jira/browse/HDFS-7548 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.5.0 Reporter: Rushabh S Shah Assignee: Rushabh S Shah Attachments: HDFS-7548.patch When there is one datanode holding the block and that block happened to be corrupt, namenode would keep on trying to replicate the block repeatedly but it would only report the block as corrupt only when the data block scanner thread of the datanode picks up this bad block. Requesting improvement in namenode reporting so that corrupt replica would be reported when there is only 1 replica and the replication of that replica keeps on failing with the checksum error. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7548) Corrupt block reporting delayed until datablock scanner thread detects it
[ https://issues.apache.org/jira/browse/HDFS-7548?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rushabh S Shah updated HDFS-7548: - Attachment: HDFS-7548.patch Whenever in write pipeline if the datanode detects any checksum error while transferring the block to target node, that particular block is added to first position in the blockInfoSet with setting the lastScanTime to 0. This will make the BlockPoolSliceScanner to pick this block first since that data structure is sorted by lastScanTime. In this way, we will scan this corrupt block first and will report it to namenode. Corrupt block reporting delayed until datablock scanner thread detects it - Key: HDFS-7548 URL: https://issues.apache.org/jira/browse/HDFS-7548 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.5.0 Reporter: Rushabh S Shah Assignee: Rushabh S Shah Attachments: HDFS-7548.patch When there is one datanode holding the block and that block happened to be corrupt, namenode would keep on trying to replicate the block repeatedly but it would only report the block as corrupt only when the data block scanner thread of the datanode picks up this bad block. Requesting improvement in namenode reporting so that corrupt replica would be reported when there is only 1 replica and the replication of that replica keeps on failing with the checksum error. -- This message was sent by Atlassian JIRA (v6.3.4#6332)