[ https://issues.apache.org/jira/browse/HDFS-1350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12901269#action_12901269 ]
sam rash commented on HDFS-1350: -------------------------------- I saw the case of a single replica existing that did not have a matching data + checksum length. it was not used and we lost the block. i need to double-check the code to see, but the DN exception was that the block was not valid and couldn't be used it seems to me the logic is simple: take the longest length you can get. It doesn't matter if data and checksum match as far as I can tell (though I think typically matching => longer than unmatching). truncation only happens after the NN picks the length of the blocks. as I said, I think the bug, at least in our patched rev (need to look at stock 20-append), is that mismatching lengths can't participate at all in lease recovery which seems broken > make datanodes do graceful shutdown > ----------------------------------- > > Key: HDFS-1350 > URL: https://issues.apache.org/jira/browse/HDFS-1350 > Project: Hadoop HDFS > Issue Type: Improvement > Components: data-node > Reporter: sam rash > Assignee: sam rash > > we found that the Datanode doesn't do a graceful shutdown and a block can be > corrupted (data + checksum amounts off) > we can make the DN do a graceful shutdown in case there are open files. if > this presents a problem to a timely shutdown, we can make a it a parameter of > how long to wait for the full graceful shutdown before just exiting -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.