[GitHub] [hadoop] ayushtkn commented on pull request #5583: HDFS-16987. [BugFix] NameNode should remove all invalid corrupt blocks when starting active service

2023-04-25 Thread via GitHub


ayushtkn commented on PR #5583:
URL: https://github.com/apache/hadoop/pull/5583#issuecomment-1522037750

   I think yes, we should fix the ``markBlockAsCorrupt`` method itself while 
maintaining the regular flow. If a newer block exists on the same storage, We 
should pass a flag to both ``countNodes`` and ``invalidateBlock`` to make sure 
the newer storage doesn't get removed


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[GitHub] [hadoop] ayushtkn commented on pull request #5583: HDFS-16987. [BugFix] NameNode should remove all invalid corrupt blocks when starting active service

2023-04-24 Thread via GitHub


ayushtkn commented on PR #5583:
URL: https://github.com/apache/hadoop/pull/5583#issuecomment-1520440465

   I think it required the replica with older genstamp should also be on the 
same datanode as with the replica with the newer genstamp.
   Are you able to reproduce it when the 1001 replicas and 1002 replicas are on 
different datanodes?
   Else this block deletes corrupt replica immediately
   ```
 // the block is over-replicated so invalidate the replicas immediately
 invalidateBlock(b, node, numberOfReplicas);
   ```
   If you debug your test and go inside invalidateBlocks().
   ```
 // we already checked the number of replicas in the caller of this
 // function and know there are enough live replicas, so we can delete 
it.
 addToInvalidates(b.getCorrupted(), dn);
 removeStoredBlock(b.getStored(), node);
   ```
   This ```removeStoredBlock(b.getStored(), node);```  removes ``node`` 
from the stored block which even contained the 1002 genstamp.
   
   Since all three 1001 & 1002 are on 3 same DN. For first 2
   ```
   boolean minReplicationSatisfied = hasMinStorage(b.getStored(),
   numUsableReplicas);
   
   ```
   
   this stays satisfied, but since the previous 2 removed the actual storage 
which contained 1002 from the blockMap to get rid of 1001, so in the last 
iteration, this comes false, and hence the last replica isn't deleted.
   
   So, I feel for this to trigger the 1001 & 1002 need to be on same datanode


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[GitHub] [hadoop] ayushtkn commented on pull request #5583: HDFS-16987. [BugFix] NameNode should remove all invalid corrupt blocks when starting active service

2023-04-24 Thread via GitHub


ayushtkn commented on PR #5583:
URL: https://github.com/apache/hadoop/pull/5583#issuecomment-1519599529

   @ZanderXu When the NN marks it as corrupt. Does it gets added to 
postponeBlock? Why does 
```dfs.namenode.corrupt.block.delete.immediately.enabled``` deletes it 
immediately?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[GitHub] [hadoop] ayushtkn commented on pull request #5583: HDFS-16987. [BugFix] NameNode should remove all invalid corrupt blocks when starting active service

2023-04-23 Thread via GitHub


ayushtkn commented on PR #5583:
URL: https://github.com/apache/hadoop/pull/5583#issuecomment-1519145142

   @ZanderXu Some confusion here, Are you talking about like: The block got 
marked as a corrupt replica but namenode didn't delete to due to its recent 
transition into Active namenode which marked all the storages as stale & then 
delayed deletion of the blocks till the next BR?
   
   Some extension of 
[HDFS-15200](https://issues.apache.org/jira/browse/HDFS-15200)?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org