[ https://issues.apache.org/jira/browse/HDFS-1262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12883855#action_12883855 ]
sam rash commented on HDFS-1262: -------------------------------- that's probably better. this was dependent on it as i was killing the datanodes to simulate the pipeline failure. i ended up tuning the test case to use mockito to throw exceptions at the end of a NN rpc call for both append() and create(), so I think that dependency is gone. can we mark this as dependent on that if it turns out to be needed? > Failed pipeline creation during append leaves lease hanging on NN > ----------------------------------------------------------------- > > Key: HDFS-1262 > URL: https://issues.apache.org/jira/browse/HDFS-1262 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs client, name-node > Affects Versions: 0.20-append > Reporter: Todd Lipcon > Assignee: sam rash > Priority: Critical > Fix For: 0.20-append > > Attachments: hdfs-1262-1.txt > > > Ryan Rawson came upon this nasty bug in HBase cluster testing. What happened > was the following: > 1) File's original writer died > 2) Recovery client tried to open file for append - looped for a minute or so > until soft lease expired, then append call initiated recovery > 3) Recovery completed successfully > 4) Recovery client calls append again, which succeeds on the NN > 5) For some reason, the block recovery that happens at the start of append > pipeline creation failed on all datanodes 6 times, causing the append() call > to throw an exception back to HBase master. HBase assumed the file wasn't > open and put it back on a queue to try later > 6) Some time later, it tried append again, but the lease was still assigned > to the same DFS client, so it wasn't able to recover. > The recovery failure in step 5 is a separate issue, but the problem for this > JIRA is that the NN can think it failed to open a file for append when the NN > thinks the writer holds a lease. Since the writer keeps renewing its lease, > recovery never happens, and no one can open or recover the file until the DFS > client shuts down. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.