[GitHub] [hadoop] ayushtkn commented on pull request #5583: HDFS-16987. [BugFix] NameNode should remove all invalid corrupt blocks when starting active service
ayushtkn commented on PR #5583: URL: https://github.com/apache/hadoop/pull/5583#issuecomment-1522037750 I think yes, we should fix the ``markBlockAsCorrupt`` method itself while maintaining the regular flow. If a newer block exists on the same storage, We should pass a flag to both ``countNodes`` and ``invalidateBlock`` to make sure the newer storage doesn't get removed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[GitHub] [hadoop] ayushtkn commented on pull request #5583: HDFS-16987. [BugFix] NameNode should remove all invalid corrupt blocks when starting active service
ayushtkn commented on PR #5583: URL: https://github.com/apache/hadoop/pull/5583#issuecomment-1520440465 I think it required the replica with older genstamp should also be on the same datanode as with the replica with the newer genstamp. Are you able to reproduce it when the 1001 replicas and 1002 replicas are on different datanodes? Else this block deletes corrupt replica immediately ``` // the block is over-replicated so invalidate the replicas immediately invalidateBlock(b, node, numberOfReplicas); ``` If you debug your test and go inside invalidateBlocks(). ``` // we already checked the number of replicas in the caller of this // function and know there are enough live replicas, so we can delete it. addToInvalidates(b.getCorrupted(), dn); removeStoredBlock(b.getStored(), node); ``` This ```removeStoredBlock(b.getStored(), node);``` removes ``node`` from the stored block which even contained the 1002 genstamp. Since all three 1001 & 1002 are on 3 same DN. For first 2 ``` boolean minReplicationSatisfied = hasMinStorage(b.getStored(), numUsableReplicas); ``` this stays satisfied, but since the previous 2 removed the actual storage which contained 1002 from the blockMap to get rid of 1001, so in the last iteration, this comes false, and hence the last replica isn't deleted. So, I feel for this to trigger the 1001 & 1002 need to be on same datanode -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[GitHub] [hadoop] ayushtkn commented on pull request #5583: HDFS-16987. [BugFix] NameNode should remove all invalid corrupt blocks when starting active service
ayushtkn commented on PR #5583: URL: https://github.com/apache/hadoop/pull/5583#issuecomment-1519599529 @ZanderXu When the NN marks it as corrupt. Does it gets added to postponeBlock? Why does ```dfs.namenode.corrupt.block.delete.immediately.enabled``` deletes it immediately? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[GitHub] [hadoop] ayushtkn commented on pull request #5583: HDFS-16987. [BugFix] NameNode should remove all invalid corrupt blocks when starting active service
ayushtkn commented on PR #5583: URL: https://github.com/apache/hadoop/pull/5583#issuecomment-1519145142 @ZanderXu Some confusion here, Are you talking about like: The block got marked as a corrupt replica but namenode didn't delete to due to its recent transition into Active namenode which marked all the storages as stale & then delayed deletion of the blocks till the next BR? Some extension of [HDFS-15200](https://issues.apache.org/jira/browse/HDFS-15200)? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org