[ 
https://issues.apache.org/jira/browse/HDFS-1350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12901269#action_12901269
 ] 

sam rash commented on HDFS-1350:
--------------------------------

I saw the case of a single replica existing that did not have a matching data + 
checksum length.  it was not used and we lost the block.  i need to 
double-check the code to see, but the DN exception was that the block was not 
valid and couldn't be used

it seems to me the logic is simple:  take the longest length you can get.  It 
doesn't matter if data and checksum match as far as I can tell (though I think 
typically matching => longer than unmatching).

truncation only happens after the NN picks the length of the blocks.  as I 
said, I think the bug, at least in our patched rev (need to look at stock 
20-append), is that mismatching lengths can't participate at all in lease 
recovery which seems broken


> make datanodes do graceful shutdown
> -----------------------------------
>
>                 Key: HDFS-1350
>                 URL: https://issues.apache.org/jira/browse/HDFS-1350
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: data-node
>            Reporter: sam rash
>            Assignee: sam rash
>
> we found that the Datanode doesn't do a graceful shutdown and a block can be 
> corrupted (data + checksum amounts off)
> we can make the DN do a graceful shutdown in case there are open files. if 
> this presents a problem to a timely shutdown, we can make a it a parameter of 
> how long to wait for the full graceful shutdown before just exiting

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to