[jira] [Updated] (HDFS-10348) Namenode report bad block method doesn't check whether the block belongs to datanode before adding it to corrupt replicas map.

2019-09-24 Thread hemanthboyina (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-10348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hemanthboyina updated HDFS-10348:
-
Status: Open  (was: Patch Available)

> Namenode report bad block method doesn't check whether the block belongs to 
> datanode before adding it to corrupt replicas map.
> --
>
> Key: HDFS-10348
> URL: https://issues.apache.org/jira/browse/HDFS-10348
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 3.1.2, 2.7.0
>Reporter: Rushabh S Shah
>Assignee: Rushabh S Shah
>Priority: Major
> Attachments: HDFS-10348-1.patch, HDFS-10348.003.patch, 
> HDFS-10348.patch
>
>
> Namenode (via report bad block nethod) doesn't check whether the block 
> belongs to the datanode before it adds to corrupt replicas map.
> In one of our cluster we found that there were 3 lingering corrupt blocks.
> It happened in the following order.
> 1. Two clients called getBlockLocations for a particular file.
> 2. Client C1 tried to open the file and encountered checksum error from   
> node N3 and it reported bad block (blk1) to the namenode.
> 3. Namenode added that node N3 and block blk1  to corrrupt replicas map   and 
> ask one of the good node (one of the 2 nodes) to replicate the block to 
> another node N4.
> 4. After receiving the block, N4 sends an IBR (with RECEIVED_BLOCK) to 
> namenode.
> 5. Namenode removed the block and node N3 from corrupt replicas map.
>It also removed N3's storage from triplets and queued an invalidate 
> request for N3.
> 6. In the mean time, Client C2 tries to open the file and the request went to 
> node N3.
>C2 also encountered the checksum exception and reported bad block to 
> namenode.
> 7. Namenode added the corrupt block blk1 and node N3 to the corrupt replicas 
> map without confirming whether node N3 has the block or not.
> After deleting the block, N3 sends an IBR (with DELETED) and the namenode 
> simply ignores the report since the N3's storage is no longer in the 
> triplets(from step 5)
> We took the node out of rotation, but still the block was present only in the 
> corruptReplciasMap. 
> Since on removing the node, we only goes through the block which are present 
> in the triplets for a given datanode.
> [~kshukla]'s patch fixed this bug via 
> https://issues.apache.org/jira/browse/HDFS-9958.
> But I think the following check should be made in the 
> BlockManager#markBlockAsCorrupt instead of 
> BlockManager#findAndMarkBlockAsCorrupt.
> {noformat}
> if (storage == null) {
>   storage = storedBlock.findStorageInfo(node);
> }
> if (storage == null) {
>   blockLog.debug("BLOCK* findAndMarkBlockAsCorrupt: {} not found on {}",
>   blk, dn);
>   return;
> }
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-10348) Namenode report bad block method doesn't check whether the block belongs to datanode before adding it to corrupt replicas map.

2019-09-24 Thread hemanthboyina (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-10348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hemanthboyina updated HDFS-10348:
-
   Attachment: HDFS-10348.004.patch
Affects Version/s: (was: 2.7.0)
   Status: Patch Available  (was: Open)

> Namenode report bad block method doesn't check whether the block belongs to 
> datanode before adding it to corrupt replicas map.
> --
>
> Key: HDFS-10348
> URL: https://issues.apache.org/jira/browse/HDFS-10348
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 3.1.2
>Reporter: Rushabh S Shah
>Assignee: Rushabh S Shah
>Priority: Major
> Attachments: HDFS-10348-1.patch, HDFS-10348.003.patch, 
> HDFS-10348.004.patch, HDFS-10348.patch
>
>
> Namenode (via report bad block nethod) doesn't check whether the block 
> belongs to the datanode before it adds to corrupt replicas map.
> In one of our cluster we found that there were 3 lingering corrupt blocks.
> It happened in the following order.
> 1. Two clients called getBlockLocations for a particular file.
> 2. Client C1 tried to open the file and encountered checksum error from   
> node N3 and it reported bad block (blk1) to the namenode.
> 3. Namenode added that node N3 and block blk1  to corrrupt replicas map   and 
> ask one of the good node (one of the 2 nodes) to replicate the block to 
> another node N4.
> 4. After receiving the block, N4 sends an IBR (with RECEIVED_BLOCK) to 
> namenode.
> 5. Namenode removed the block and node N3 from corrupt replicas map.
>It also removed N3's storage from triplets and queued an invalidate 
> request for N3.
> 6. In the mean time, Client C2 tries to open the file and the request went to 
> node N3.
>C2 also encountered the checksum exception and reported bad block to 
> namenode.
> 7. Namenode added the corrupt block blk1 and node N3 to the corrupt replicas 
> map without confirming whether node N3 has the block or not.
> After deleting the block, N3 sends an IBR (with DELETED) and the namenode 
> simply ignores the report since the N3's storage is no longer in the 
> triplets(from step 5)
> We took the node out of rotation, but still the block was present only in the 
> corruptReplciasMap. 
> Since on removing the node, we only goes through the block which are present 
> in the triplets for a given datanode.
> [~kshukla]'s patch fixed this bug via 
> https://issues.apache.org/jira/browse/HDFS-9958.
> But I think the following check should be made in the 
> BlockManager#markBlockAsCorrupt instead of 
> BlockManager#findAndMarkBlockAsCorrupt.
> {noformat}
> if (storage == null) {
>   storage = storedBlock.findStorageInfo(node);
> }
> if (storage == null) {
>   blockLog.debug("BLOCK* findAndMarkBlockAsCorrupt: {} not found on {}",
>   blk, dn);
>   return;
> }
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-10348) Namenode report bad block method doesn't check whether the block belongs to datanode before adding it to corrupt replicas map.

2019-09-02 Thread hemanthboyina (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-10348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hemanthboyina updated HDFS-10348:
-
   Attachment: HDFS-10348.003.patch
Affects Version/s: 3.1.2
   Status: Patch Available  (was: Open)

> Namenode report bad block method doesn't check whether the block belongs to 
> datanode before adding it to corrupt replicas map.
> --
>
> Key: HDFS-10348
> URL: https://issues.apache.org/jira/browse/HDFS-10348
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 3.1.2, 2.7.0
>Reporter: Rushabh S Shah
>Assignee: Rushabh S Shah
>Priority: Major
> Attachments: HDFS-10348-1.patch, HDFS-10348.003.patch, 
> HDFS-10348.patch
>
>
> Namenode (via report bad block nethod) doesn't check whether the block 
> belongs to the datanode before it adds to corrupt replicas map.
> In one of our cluster we found that there were 3 lingering corrupt blocks.
> It happened in the following order.
> 1. Two clients called getBlockLocations for a particular file.
> 2. Client C1 tried to open the file and encountered checksum error from   
> node N3 and it reported bad block (blk1) to the namenode.
> 3. Namenode added that node N3 and block blk1  to corrrupt replicas map   and 
> ask one of the good node (one of the 2 nodes) to replicate the block to 
> another node N4.
> 4. After receiving the block, N4 sends an IBR (with RECEIVED_BLOCK) to 
> namenode.
> 5. Namenode removed the block and node N3 from corrupt replicas map.
>It also removed N3's storage from triplets and queued an invalidate 
> request for N3.
> 6. In the mean time, Client C2 tries to open the file and the request went to 
> node N3.
>C2 also encountered the checksum exception and reported bad block to 
> namenode.
> 7. Namenode added the corrupt block blk1 and node N3 to the corrupt replicas 
> map without confirming whether node N3 has the block or not.
> After deleting the block, N3 sends an IBR (with DELETED) and the namenode 
> simply ignores the report since the N3's storage is no longer in the 
> triplets(from step 5)
> We took the node out of rotation, but still the block was present only in the 
> corruptReplciasMap. 
> Since on removing the node, we only goes through the block which are present 
> in the triplets for a given datanode.
> [~kshukla]'s patch fixed this bug via 
> https://issues.apache.org/jira/browse/HDFS-9958.
> But I think the following check should be made in the 
> BlockManager#markBlockAsCorrupt instead of 
> BlockManager#findAndMarkBlockAsCorrupt.
> {noformat}
> if (storage == null) {
>   storage = storedBlock.findStorageInfo(node);
> }
> if (storage == null) {
>   blockLog.debug("BLOCK* findAndMarkBlockAsCorrupt: {} not found on {}",
>   blk, dn);
>   return;
> }
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-10348) Namenode report bad block method doesn't check whether the block belongs to datanode before adding it to corrupt replicas map.

2018-03-30 Thread Konstantin Shvachko (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko updated HDFS-10348:
---
Target Version/s:   (was: 2.7.6)

> Namenode report bad block method doesn't check whether the block belongs to 
> datanode before adding it to corrupt replicas map.
> --
>
> Key: HDFS-10348
> URL: https://issues.apache.org/jira/browse/HDFS-10348
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.7.0
>Reporter: Rushabh S Shah
>Assignee: Rushabh S Shah
>Priority: Major
> Attachments: HDFS-10348-1.patch, HDFS-10348.patch
>
>
> Namenode (via report bad block nethod) doesn't check whether the block 
> belongs to the datanode before it adds to corrupt replicas map.
> In one of our cluster we found that there were 3 lingering corrupt blocks.
> It happened in the following order.
> 1. Two clients called getBlockLocations for a particular file.
> 2. Client C1 tried to open the file and encountered checksum error from   
> node N3 and it reported bad block (blk1) to the namenode.
> 3. Namenode added that node N3 and block blk1  to corrrupt replicas map   and 
> ask one of the good node (one of the 2 nodes) to replicate the block to 
> another node N4.
> 4. After receiving the block, N4 sends an IBR (with RECEIVED_BLOCK) to 
> namenode.
> 5. Namenode removed the block and node N3 from corrupt replicas map.
>It also removed N3's storage from triplets and queued an invalidate 
> request for N3.
> 6. In the mean time, Client C2 tries to open the file and the request went to 
> node N3.
>C2 also encountered the checksum exception and reported bad block to 
> namenode.
> 7. Namenode added the corrupt block blk1 and node N3 to the corrupt replicas 
> map without confirming whether node N3 has the block or not.
> After deleting the block, N3 sends an IBR (with DELETED) and the namenode 
> simply ignores the report since the N3's storage is no longer in the 
> triplets(from step 5)
> We took the node out of rotation, but still the block was present only in the 
> corruptReplciasMap. 
> Since on removing the node, we only goes through the block which are present 
> in the triplets for a given datanode.
> [~kshukla]'s patch fixed this bug via 
> https://issues.apache.org/jira/browse/HDFS-9958.
> But I think the following check should be made in the 
> BlockManager#markBlockAsCorrupt instead of 
> BlockManager#findAndMarkBlockAsCorrupt.
> {noformat}
> if (storage == null) {
>   storage = storedBlock.findStorageInfo(node);
> }
> if (storage == null) {
>   blockLog.debug("BLOCK* findAndMarkBlockAsCorrupt: {} not found on {}",
>   blk, dn);
>   return;
> }
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-10348) Namenode report bad block method doesn't check whether the block belongs to datanode before adding it to corrupt replicas map.

2018-03-30 Thread Konstantin Shvachko (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko updated HDFS-10348:
---
Status: Open  (was: Patch Available)

Canceling the patch as it is not applying anymore.

> Namenode report bad block method doesn't check whether the block belongs to 
> datanode before adding it to corrupt replicas map.
> --
>
> Key: HDFS-10348
> URL: https://issues.apache.org/jira/browse/HDFS-10348
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.7.0
>Reporter: Rushabh S Shah
>Assignee: Rushabh S Shah
>Priority: Major
> Attachments: HDFS-10348-1.patch, HDFS-10348.patch
>
>
> Namenode (via report bad block nethod) doesn't check whether the block 
> belongs to the datanode before it adds to corrupt replicas map.
> In one of our cluster we found that there were 3 lingering corrupt blocks.
> It happened in the following order.
> 1. Two clients called getBlockLocations for a particular file.
> 2. Client C1 tried to open the file and encountered checksum error from   
> node N3 and it reported bad block (blk1) to the namenode.
> 3. Namenode added that node N3 and block blk1  to corrrupt replicas map   and 
> ask one of the good node (one of the 2 nodes) to replicate the block to 
> another node N4.
> 4. After receiving the block, N4 sends an IBR (with RECEIVED_BLOCK) to 
> namenode.
> 5. Namenode removed the block and node N3 from corrupt replicas map.
>It also removed N3's storage from triplets and queued an invalidate 
> request for N3.
> 6. In the mean time, Client C2 tries to open the file and the request went to 
> node N3.
>C2 also encountered the checksum exception and reported bad block to 
> namenode.
> 7. Namenode added the corrupt block blk1 and node N3 to the corrupt replicas 
> map without confirming whether node N3 has the block or not.
> After deleting the block, N3 sends an IBR (with DELETED) and the namenode 
> simply ignores the report since the N3's storage is no longer in the 
> triplets(from step 5)
> We took the node out of rotation, but still the block was present only in the 
> corruptReplciasMap. 
> Since on removing the node, we only goes through the block which are present 
> in the triplets for a given datanode.
> [~kshukla]'s patch fixed this bug via 
> https://issues.apache.org/jira/browse/HDFS-9958.
> But I think the following check should be made in the 
> BlockManager#markBlockAsCorrupt instead of 
> BlockManager#findAndMarkBlockAsCorrupt.
> {noformat}
> if (storage == null) {
>   storage = storedBlock.findStorageInfo(node);
> }
> if (storage == null) {
>   blockLog.debug("BLOCK* findAndMarkBlockAsCorrupt: {} not found on {}",
>   blk, dn);
>   return;
> }
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-10348) Namenode report bad block method doesn't check whether the block belongs to datanode before adding it to corrupt replicas map.

2016-08-17 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated HDFS-10348:
---
Target Version/s: 2.7.4  (was: 2.7.3)

2.7.3 is under release process, changing target-version to 2.7.4.

> Namenode report bad block method doesn't check whether the block belongs to 
> datanode before adding it to corrupt replicas map.
> --
>
> Key: HDFS-10348
> URL: https://issues.apache.org/jira/browse/HDFS-10348
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.7.0
>Reporter: Rushabh S Shah
>Assignee: Rushabh S Shah
> Attachments: HDFS-10348-1.patch, HDFS-10348.patch
>
>
> Namenode (via report bad block nethod) doesn't check whether the block 
> belongs to the datanode before it adds to corrupt replicas map.
> In one of our cluster we found that there were 3 lingering corrupt blocks.
> It happened in the following order.
> 1. Two clients called getBlockLocations for a particular file.
> 2. Client C1 tried to open the file and encountered checksum error from   
> node N3 and it reported bad block (blk1) to the namenode.
> 3. Namenode added that node N3 and block blk1  to corrrupt replicas map   and 
> ask one of the good node (one of the 2 nodes) to replicate the block to 
> another node N4.
> 4. After receiving the block, N4 sends an IBR (with RECEIVED_BLOCK) to 
> namenode.
> 5. Namenode removed the block and node N3 from corrupt replicas map.
>It also removed N3's storage from triplets and queued an invalidate 
> request for N3.
> 6. In the mean time, Client C2 tries to open the file and the request went to 
> node N3.
>C2 also encountered the checksum exception and reported bad block to 
> namenode.
> 7. Namenode added the corrupt block blk1 and node N3 to the corrupt replicas 
> map without confirming whether node N3 has the block or not.
> After deleting the block, N3 sends an IBR (with DELETED) and the namenode 
> simply ignores the report since the N3's storage is no longer in the 
> triplets(from step 5)
> We took the node out of rotation, but still the block was present only in the 
> corruptReplciasMap. 
> Since on removing the node, we only goes through the block which are present 
> in the triplets for a given datanode.
> [~kshukla]'s patch fixed this bug via 
> https://issues.apache.org/jira/browse/HDFS-9958.
> But I think the following check should be made in the 
> BlockManager#markBlockAsCorrupt instead of 
> BlockManager#findAndMarkBlockAsCorrupt.
> {noformat}
> if (storage == null) {
>   storage = storedBlock.findStorageInfo(node);
> }
> if (storage == null) {
>   blockLog.debug("BLOCK* findAndMarkBlockAsCorrupt: {} not found on {}",
>   blk, dn);
>   return;
> }
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-10348) Namenode report bad block method doesn't check whether the block belongs to datanode before adding it to corrupt replicas map.

2016-05-03 Thread Rushabh S Shah (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rushabh S Shah updated HDFS-10348:
--
Status: Patch Available  (was: Open)

Attaching a new patch addressing white space and find bugs warnings.
TestBlockManager passes on my local box on both jdk versions.
Failure of TestHFlush is being tracked under multiple jiras: HDFS-2043, 
HDFS-3041 and HDFS-4504
TestDataNodeLifeline is failing on local box with and without my patch.
But I don't see it failing on any other jenkins build except this one.
I will wait for another jenkins build on my latest patch.
If it fails again then I will file a jira.

> Namenode report bad block method doesn't check whether the block belongs to 
> datanode before adding it to corrupt replicas map.
> --
>
> Key: HDFS-10348
> URL: https://issues.apache.org/jira/browse/HDFS-10348
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.7.0
>Reporter: Rushabh S Shah
>Assignee: Rushabh S Shah
> Attachments: HDFS-10348-1.patch, HDFS-10348.patch
>
>
> Namenode (via report bad block nethod) doesn't check whether the block 
> belongs to the datanode before it adds to corrupt replicas map.
> In one of our cluster we found that there were 3 lingering corrupt blocks.
> It happened in the following order.
> 1. Two clients called getBlockLocations for a particular file.
> 2. Client C1 tried to open the file and encountered checksum error from   
> node N3 and it reported bad block (blk1) to the namenode.
> 3. Namenode added that node N3 and block blk1  to corrrupt replicas map   and 
> ask one of the good node (one of the 2 nodes) to replicate the block to 
> another node N4.
> 4. After receiving the block, N4 sends an IBR (with RECEIVED_BLOCK) to 
> namenode.
> 5. Namenode removed the block and node N3 from corrupt replicas map.
>It also removed N3's storage from triplets and queued an invalidate 
> request for N3.
> 6. In the mean time, Client C2 tries to open the file and the request went to 
> node N3.
>C2 also encountered the checksum exception and reported bad block to 
> namenode.
> 7. Namenode added the corrupt block blk1 and node N3 to the corrupt replicas 
> map without confirming whether node N3 has the block or not.
> After deleting the block, N3 sends an IBR (with DELETED) and the namenode 
> simply ignores the report since the N3's storage is no longer in the 
> triplets(from step 5)
> We took the node out of rotation, but still the block was present only in the 
> corruptReplciasMap. 
> Since on removing the node, we only goes through the block which are present 
> in the triplets for a given datanode.
> [~kshukla]'s patch fixed this bug via 
> https://issues.apache.org/jira/browse/HDFS-9958.
> But I think the following check should be made in the 
> BlockManager#markBlockAsCorrupt instead of 
> BlockManager#findAndMarkBlockAsCorrupt.
> {noformat}
> if (storage == null) {
>   storage = storedBlock.findStorageInfo(node);
> }
> if (storage == null) {
>   blockLog.debug("BLOCK* findAndMarkBlockAsCorrupt: {} not found on {}",
>   blk, dn);
>   return;
> }
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-10348) Namenode report bad block method doesn't check whether the block belongs to datanode before adding it to corrupt replicas map.

2016-05-03 Thread Rushabh S Shah (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rushabh S Shah updated HDFS-10348:
--
Attachment: HDFS-10348-1.patch

> Namenode report bad block method doesn't check whether the block belongs to 
> datanode before adding it to corrupt replicas map.
> --
>
> Key: HDFS-10348
> URL: https://issues.apache.org/jira/browse/HDFS-10348
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.7.0
>Reporter: Rushabh S Shah
>Assignee: Rushabh S Shah
> Attachments: HDFS-10348-1.patch, HDFS-10348.patch
>
>
> Namenode (via report bad block nethod) doesn't check whether the block 
> belongs to the datanode before it adds to corrupt replicas map.
> In one of our cluster we found that there were 3 lingering corrupt blocks.
> It happened in the following order.
> 1. Two clients called getBlockLocations for a particular file.
> 2. Client C1 tried to open the file and encountered checksum error from   
> node N3 and it reported bad block (blk1) to the namenode.
> 3. Namenode added that node N3 and block blk1  to corrrupt replicas map   and 
> ask one of the good node (one of the 2 nodes) to replicate the block to 
> another node N4.
> 4. After receiving the block, N4 sends an IBR (with RECEIVED_BLOCK) to 
> namenode.
> 5. Namenode removed the block and node N3 from corrupt replicas map.
>It also removed N3's storage from triplets and queued an invalidate 
> request for N3.
> 6. In the mean time, Client C2 tries to open the file and the request went to 
> node N3.
>C2 also encountered the checksum exception and reported bad block to 
> namenode.
> 7. Namenode added the corrupt block blk1 and node N3 to the corrupt replicas 
> map without confirming whether node N3 has the block or not.
> After deleting the block, N3 sends an IBR (with DELETED) and the namenode 
> simply ignores the report since the N3's storage is no longer in the 
> triplets(from step 5)
> We took the node out of rotation, but still the block was present only in the 
> corruptReplciasMap. 
> Since on removing the node, we only goes through the block which are present 
> in the triplets for a given datanode.
> [~kshukla]'s patch fixed this bug via 
> https://issues.apache.org/jira/browse/HDFS-9958.
> But I think the following check should be made in the 
> BlockManager#markBlockAsCorrupt instead of 
> BlockManager#findAndMarkBlockAsCorrupt.
> {noformat}
> if (storage == null) {
>   storage = storedBlock.findStorageInfo(node);
> }
> if (storage == null) {
>   blockLog.debug("BLOCK* findAndMarkBlockAsCorrupt: {} not found on {}",
>   blk, dn);
>   return;
> }
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-10348) Namenode report bad block method doesn't check whether the block belongs to datanode before adding it to corrupt replicas map.

2016-05-03 Thread Rushabh S Shah (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rushabh S Shah updated HDFS-10348:
--
Status: Open  (was: Patch Available)

> Namenode report bad block method doesn't check whether the block belongs to 
> datanode before adding it to corrupt replicas map.
> --
>
> Key: HDFS-10348
> URL: https://issues.apache.org/jira/browse/HDFS-10348
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.7.0
>Reporter: Rushabh S Shah
>Assignee: Rushabh S Shah
> Attachments: HDFS-10348.patch
>
>
> Namenode (via report bad block nethod) doesn't check whether the block 
> belongs to the datanode before it adds to corrupt replicas map.
> In one of our cluster we found that there were 3 lingering corrupt blocks.
> It happened in the following order.
> 1. Two clients called getBlockLocations for a particular file.
> 2. Client C1 tried to open the file and encountered checksum error from   
> node N3 and it reported bad block (blk1) to the namenode.
> 3. Namenode added that node N3 and block blk1  to corrrupt replicas map   and 
> ask one of the good node (one of the 2 nodes) to replicate the block to 
> another node N4.
> 4. After receiving the block, N4 sends an IBR (with RECEIVED_BLOCK) to 
> namenode.
> 5. Namenode removed the block and node N3 from corrupt replicas map.
>It also removed N3's storage from triplets and queued an invalidate 
> request for N3.
> 6. In the mean time, Client C2 tries to open the file and the request went to 
> node N3.
>C2 also encountered the checksum exception and reported bad block to 
> namenode.
> 7. Namenode added the corrupt block blk1 and node N3 to the corrupt replicas 
> map without confirming whether node N3 has the block or not.
> After deleting the block, N3 sends an IBR (with DELETED) and the namenode 
> simply ignores the report since the N3's storage is no longer in the 
> triplets(from step 5)
> We took the node out of rotation, but still the block was present only in the 
> corruptReplciasMap. 
> Since on removing the node, we only goes through the block which are present 
> in the triplets for a given datanode.
> [~kshukla]'s patch fixed this bug via 
> https://issues.apache.org/jira/browse/HDFS-9958.
> But I think the following check should be made in the 
> BlockManager#markBlockAsCorrupt instead of 
> BlockManager#findAndMarkBlockAsCorrupt.
> {noformat}
> if (storage == null) {
>   storage = storedBlock.findStorageInfo(node);
> }
> if (storage == null) {
>   blockLog.debug("BLOCK* findAndMarkBlockAsCorrupt: {} not found on {}",
>   blk, dn);
>   return;
> }
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-10348) Namenode report bad block method doesn't check whether the block belongs to datanode before adding it to corrupt replicas map.

2016-05-02 Thread Rushabh S Shah (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rushabh S Shah updated HDFS-10348:
--
Status: Patch Available  (was: Open)

> Namenode report bad block method doesn't check whether the block belongs to 
> datanode before adding it to corrupt replicas map.
> --
>
> Key: HDFS-10348
> URL: https://issues.apache.org/jira/browse/HDFS-10348
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.7.0
>Reporter: Rushabh S Shah
>Assignee: Rushabh S Shah
> Attachments: HDFS-10348.patch
>
>
> Namenode (via report bad block nethod) doesn't check whether the block 
> belongs to the datanode before it adds to corrupt replicas map.
> In one of our cluster we found that there were 3 lingering corrupt blocks.
> It happened in the following order.
> 1. Two clients called getBlockLocations for a particular file.
> 2. Client C1 tried to open the file and encountered checksum error from   
> node N3 and it reported bad block (blk1) to the namenode.
> 3. Namenode added that node N3 and block blk1  to corrrupt replicas map   and 
> ask one of the good node (one of the 2 nodes) to replicate the block to 
> another node N4.
> 4. After receiving the block, N4 sends an IBR (with RECEIVED_BLOCK) to 
> namenode.
> 5. Namenode removed the block and node N3 from corrupt replicas map.
>It also removed N3's storage from triplets and queued an invalidate 
> request for N3.
> 6. In the mean time, Client C2 tries to open the file and the request went to 
> node N3.
>C2 also encountered the checksum exception and reported bad block to 
> namenode.
> 7. Namenode added the corrupt block blk1 and node N3 to the corrupt replicas 
> map without confirming whether node N3 has the block or not.
> After deleting the block, N3 sends an IBR (with DELETED) and the namenode 
> simply ignores the report since the N3's storage is no longer in the 
> triplets(from step 5)
> We took the node out of rotation, but still the block was present only in the 
> corruptReplciasMap. 
> Since on removing the node, we only goes through the block which are present 
> in the triplets for a given datanode.
> [~kshukla]'s patch fixed this bug via 
> https://issues.apache.org/jira/browse/HDFS-9958.
> But I think the following check should be made in the 
> BlockManager#markBlockAsCorrupt instead of 
> BlockManager#findAndMarkBlockAsCorrupt.
> {noformat}
> if (storage == null) {
>   storage = storedBlock.findStorageInfo(node);
> }
> if (storage == null) {
>   blockLog.debug("BLOCK* findAndMarkBlockAsCorrupt: {} not found on {}",
>   blk, dn);
>   return;
> }
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-10348) Namenode report bad block method doesn't check whether the block belongs to datanode before adding it to corrupt replicas map.

2016-05-02 Thread Rushabh S Shah (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rushabh S Shah updated HDFS-10348:
--
Attachment: HDFS-10348.patch

> Namenode report bad block method doesn't check whether the block belongs to 
> datanode before adding it to corrupt replicas map.
> --
>
> Key: HDFS-10348
> URL: https://issues.apache.org/jira/browse/HDFS-10348
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.7.0
>Reporter: Rushabh S Shah
>Assignee: Rushabh S Shah
> Attachments: HDFS-10348.patch
>
>
> Namenode (via report bad block nethod) doesn't check whether the block 
> belongs to the datanode before it adds to corrupt replicas map.
> In one of our cluster we found that there were 3 lingering corrupt blocks.
> It happened in the following order.
> 1. Two clients called getBlockLocations for a particular file.
> 2. Client C1 tried to open the file and encountered checksum error from   
> node N3 and it reported bad block (blk1) to the namenode.
> 3. Namenode added that node N3 and block blk1  to corrrupt replicas map   and 
> ask one of the good node (one of the 2 nodes) to replicate the block to 
> another node N4.
> 4. After receiving the block, N4 sends an IBR (with RECEIVED_BLOCK) to 
> namenode.
> 5. Namenode removed the block and node N3 from corrupt replicas map.
>It also removed N3's storage from triplets and queued an invalidate 
> request for N3.
> 6. In the mean time, Client C2 tries to open the file and the request went to 
> node N3.
>C2 also encountered the checksum exception and reported bad block to 
> namenode.
> 7. Namenode added the corrupt block blk1 and node N3 to the corrupt replicas 
> map without confirming whether node N3 has the block or not.
> After deleting the block, N3 sends an IBR (with DELETED) and the namenode 
> simply ignores the report since the N3's storage is no longer in the 
> triplets(from step 5)
> We took the node out of rotation, but still the block was present only in the 
> corruptReplciasMap. 
> Since on removing the node, we only goes through the block which are present 
> in the triplets for a given datanode.
> [~kshukla]'s patch fixed this bug via 
> https://issues.apache.org/jira/browse/HDFS-9958.
> But I think the following check should be made in the 
> BlockManager#markBlockAsCorrupt instead of 
> BlockManager#findAndMarkBlockAsCorrupt.
> {noformat}
> if (storage == null) {
>   storage = storedBlock.findStorageInfo(node);
> }
> if (storage == null) {
>   blockLog.debug("BLOCK* findAndMarkBlockAsCorrupt: {} not found on {}",
>   blk, dn);
>   return;
> }
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-10348) Namenode report bad block method doesn't check whether the block belongs to datanode before adding it to corrupt replicas map.

2016-04-29 Thread Rushabh S Shah (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rushabh S Shah updated HDFS-10348:
--
Description: 
Namenode (via report bad block nethod) doesn't check whether the block belongs 
to the datanode before it adds to corrupt replicas map.
In one of our cluster we found that there were 3 lingering corrupt blocks.
It happened in the following order.
1. Two clients called getBlockLocations for a particular file.
2. Client C1 tried to open the file and encountered checksum error from   node 
N3 and it reported bad block (blk1) to the namenode.
3. Namenode added that node N3 and block blk1  to corrrupt replicas map   and 
ask one of the good node (one of the 2 nodes) to replicate the block to another 
node N4.
4. After receiving the block, N4 sends an IBR (with RECEIVED_BLOCK) to namenode.
5. Namenode removed the block and node N3 from corrupt replicas map.
   It also removed N3's storage from triplets and queued an invalidate request 
for N3.
6. In the mean time, Client C2 tries to open the file and the request went to 
node N3.
   C2 also encountered the checksum exception and reported bad block to 
namenode.
7. Namenode added the corrupt block blk1 and node N3 to the corrupt replicas 
map without confirming whether node N3 has the block or not.

After deleting the block, N3 sends an IBR (with DELETED) and the namenode 
simply ignores the report since the N3's storage is no longer in the 
triplets(from step 5)

We took the node out of rotation, but still the block was present only in the 
corruptReplciasMap. 
Since on removing the node, we only goes through the block which are present in 
the triplets for a given datanode.

[~kshukla]'s patch fixed this bug via 
https://issues.apache.org/jira/browse/HDFS-9958.
But I think the following check should be made in the 
BlockManager#markBlockAsCorrupt instead of 
BlockManager#findAndMarkBlockAsCorrupt.
{noformat}
if (storage == null) {
  storage = storedBlock.findStorageInfo(node);
}

if (storage == null) {
  blockLog.debug("BLOCK* findAndMarkBlockAsCorrupt: {} not found on {}",
  blk, dn);
  return;
}
{noformat}

  was:
Namenode (via report bad block nethod) doesn't check whether the block belongs 
to the datanode before it adds to corrupt replicas map.
In one of our cluster we found that there were 3 lingering corrupt blocks.
It happened in the following order.
1. Two clients called getBlockLocations for a particular file.
2. Client C1 tried to open the file and encountered checksum error from   node 
N3 and it reported bad block (blk1) to the namenode.
3. Namenode added that node N3 and block blk1  to corrrupt replicas map   and 
ask one of the good node (one of the 2 nodes) to replicate the block to another 
node N4.
4. After receiving the block, N4 sends an IBR (with RECEIVED_BLOCK) to namenode.
5. Namenode removed the block and node N3 from corrupt replicas map.
   It also removed N3's storage from triplets and queued an invalidate request 
for N3.
6. In the mean time, Client C2 tries to open the file and the request went to 
node N3.
   C2 also encountered the checksum exception and reported bad block to 
namenode.
7. Namenode added the corrupt block blk1 and node N3 to the corrupt replicas 
map without confirming whether node N3 has the block or not.

After deleting the block, N3 sends an IBR (with DELETED) and the namenode 
simply ignores the report since the N3's storage is no longer in the 
triplets(from step 5)

We took the node out of rotation, but still the block was present only in the 
corruptReplciasMap. 
Since on removing the node, we only goes through the block which are present in 
the triplets for a given datanode.

Kuhu's patch fixed this bug via https://issues.apache.org/jira/browse/HDFS-9958.
But I think the following check should be made in the 
BlockManager#markBlockAsCorrupt instead of 
BlockManager#findAndMarkBlockAsCorrupt.
{noformat}
if (storage == null) {
  storage = storedBlock.findStorageInfo(node);
}

if (storage == null) {
  blockLog.debug("BLOCK* findAndMarkBlockAsCorrupt: {} not found on {}",
  blk, dn);
  return;
}
{noformat}


> Namenode report bad block method doesn't check whether the block belongs to 
> datanode before adding it to corrupt replicas map.
> --
>
> Key: HDFS-10348
> URL: https://issues.apache.org/jira/browse/HDFS-10348
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.7.0
>Reporter: Rushabh S Shah
>Assignee: Rushabh S Shah
>
> Namenode (via report bad block nethod) doesn't check whether the block 
> belongs to the datanode before it adds to corrupt replicas map.
> In one of our cluster we found that