[jira] [Updated] (HDFS-1262) Failed pipeline creation during append leaves lease hanging on NN
[ https://issues.apache.org/jira/browse/HDFS-1262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated HDFS-1262: --- Fix Version/s: (was: 0.20-append) > Failed pipeline creation during append leaves lease hanging on NN > - > > Key: HDFS-1262 > URL: https://issues.apache.org/jira/browse/HDFS-1262 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs-client, namenode >Affects Versions: 0.20-append >Reporter: Todd Lipcon >Assignee: sam rash >Priority: Critical > Attachments: hdfs-1262-1.txt, hdfs-1262-2.txt, hdfs-1262-3.txt, > hdfs-1262-4.txt, hdfs-1262-5.txt > > > Ryan Rawson came upon this nasty bug in HBase cluster testing. What happened > was the following: > 1) File's original writer died > 2) Recovery client tried to open file for append - looped for a minute or so > until soft lease expired, then append call initiated recovery > 3) Recovery completed successfully > 4) Recovery client calls append again, which succeeds on the NN > 5) For some reason, the block recovery that happens at the start of append > pipeline creation failed on all datanodes 6 times, causing the append() call > to throw an exception back to HBase master. HBase assumed the file wasn't > open and put it back on a queue to try later > 6) Some time later, it tried append again, but the lease was still assigned > to the same DFS client, so it wasn't able to recover. > The recovery failure in step 5 is a separate issue, but the problem for this > JIRA is that the NN can think it failed to open a file for append when the NN > thinks the writer holds a lease. Since the writer keeps renewing its lease, > recovery never happens, and no one can open or recover the file until the DFS > client shuts down. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] Updated: (HDFS-1262) Failed pipeline creation during append leaves lease hanging on NN
[ https://issues.apache.org/jira/browse/HDFS-1262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sam rash updated HDFS-1262: --- Attachment: hdfs-1262-1.txt -test case for append and create failures. -tried to get it so both cases fail fast, but create will hit the test timeout (default for create that gets AlreadyBeingCreatedException is 5 retries with 60s sleep) -append case fails in 30s w/o the fix worst case > Failed pipeline creation during append leaves lease hanging on NN > - > > Key: HDFS-1262 > URL: https://issues.apache.org/jira/browse/HDFS-1262 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs client, name-node >Affects Versions: 0.20-append >Reporter: Todd Lipcon >Assignee: sam rash >Priority: Critical > Fix For: 0.20-append > > Attachments: hdfs-1262-1.txt > > > Ryan Rawson came upon this nasty bug in HBase cluster testing. What happened > was the following: > 1) File's original writer died > 2) Recovery client tried to open file for append - looped for a minute or so > until soft lease expired, then append call initiated recovery > 3) Recovery completed successfully > 4) Recovery client calls append again, which succeeds on the NN > 5) For some reason, the block recovery that happens at the start of append > pipeline creation failed on all datanodes 6 times, causing the append() call > to throw an exception back to HBase master. HBase assumed the file wasn't > open and put it back on a queue to try later > 6) Some time later, it tried append again, but the lease was still assigned > to the same DFS client, so it wasn't able to recover. > The recovery failure in step 5 is a separate issue, but the problem for this > JIRA is that the NN can think it failed to open a file for append when the NN > thinks the writer holds a lease. Since the writer keeps renewing its lease, > recovery never happens, and no one can open or recover the file until the DFS > client shuts down. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HDFS-1262) Failed pipeline creation during append leaves lease hanging on NN
[ https://issues.apache.org/jira/browse/HDFS-1262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sam rash updated HDFS-1262: --- Attachment: hdfs-1262-2.txt removed hdfs-894 change from patch (commit this to 0.20-append separately) > Failed pipeline creation during append leaves lease hanging on NN > - > > Key: HDFS-1262 > URL: https://issues.apache.org/jira/browse/HDFS-1262 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs client, name-node >Affects Versions: 0.20-append >Reporter: Todd Lipcon >Assignee: sam rash >Priority: Critical > Fix For: 0.20-append > > Attachments: hdfs-1262-1.txt, hdfs-1262-2.txt > > > Ryan Rawson came upon this nasty bug in HBase cluster testing. What happened > was the following: > 1) File's original writer died > 2) Recovery client tried to open file for append - looped for a minute or so > until soft lease expired, then append call initiated recovery > 3) Recovery completed successfully > 4) Recovery client calls append again, which succeeds on the NN > 5) For some reason, the block recovery that happens at the start of append > pipeline creation failed on all datanodes 6 times, causing the append() call > to throw an exception back to HBase master. HBase assumed the file wasn't > open and put it back on a queue to try later > 6) Some time later, it tried append again, but the lease was still assigned > to the same DFS client, so it wasn't able to recover. > The recovery failure in step 5 is a separate issue, but the problem for this > JIRA is that the NN can think it failed to open a file for append when the NN > thinks the writer holds a lease. Since the writer keeps renewing its lease, > recovery never happens, and no one can open or recover the file until the DFS > client shuts down. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HDFS-1262) Failed pipeline creation during append leaves lease hanging on NN
[ https://issues.apache.org/jira/browse/HDFS-1262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sam rash updated HDFS-1262: --- Attachment: hdfs-1262-3.txt removed empty file MockitoUtil > Failed pipeline creation during append leaves lease hanging on NN > - > > Key: HDFS-1262 > URL: https://issues.apache.org/jira/browse/HDFS-1262 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs client, name-node >Affects Versions: 0.20-append >Reporter: Todd Lipcon >Assignee: sam rash >Priority: Critical > Fix For: 0.20-append > > Attachments: hdfs-1262-1.txt, hdfs-1262-2.txt, hdfs-1262-3.txt > > > Ryan Rawson came upon this nasty bug in HBase cluster testing. What happened > was the following: > 1) File's original writer died > 2) Recovery client tried to open file for append - looped for a minute or so > until soft lease expired, then append call initiated recovery > 3) Recovery completed successfully > 4) Recovery client calls append again, which succeeds on the NN > 5) For some reason, the block recovery that happens at the start of append > pipeline creation failed on all datanodes 6 times, causing the append() call > to throw an exception back to HBase master. HBase assumed the file wasn't > open and put it back on a queue to try later > 6) Some time later, it tried append again, but the lease was still assigned > to the same DFS client, so it wasn't able to recover. > The recovery failure in step 5 is a separate issue, but the problem for this > JIRA is that the NN can think it failed to open a file for append when the NN > thinks the writer holds a lease. Since the writer keeps renewing its lease, > recovery never happens, and no one can open or recover the file until the DFS > client shuts down. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HDFS-1262) Failed pipeline creation during append leaves lease hanging on NN
[ https://issues.apache.org/jira/browse/HDFS-1262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sam rash updated HDFS-1262: --- Attachment: hdfs-1262-4.txt fixed bug where calling append() to trigger lease recovery resulted in a client-side exception (trying to abandon a file that you don't own lease on). DFSClient now catches this exception and logs it > Failed pipeline creation during append leaves lease hanging on NN > - > > Key: HDFS-1262 > URL: https://issues.apache.org/jira/browse/HDFS-1262 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs client, name-node >Affects Versions: 0.20-append >Reporter: Todd Lipcon >Assignee: sam rash >Priority: Critical > Fix For: 0.20-append > > Attachments: hdfs-1262-1.txt, hdfs-1262-2.txt, hdfs-1262-3.txt, > hdfs-1262-4.txt > > > Ryan Rawson came upon this nasty bug in HBase cluster testing. What happened > was the following: > 1) File's original writer died > 2) Recovery client tried to open file for append - looped for a minute or so > until soft lease expired, then append call initiated recovery > 3) Recovery completed successfully > 4) Recovery client calls append again, which succeeds on the NN > 5) For some reason, the block recovery that happens at the start of append > pipeline creation failed on all datanodes 6 times, causing the append() call > to throw an exception back to HBase master. HBase assumed the file wasn't > open and put it back on a queue to try later > 6) Some time later, it tried append again, but the lease was still assigned > to the same DFS client, so it wasn't able to recover. > The recovery failure in step 5 is a separate issue, but the problem for this > JIRA is that the NN can think it failed to open a file for append when the NN > thinks the writer holds a lease. Since the writer keeps renewing its lease, > recovery never happens, and no one can open or recover the file until the DFS > client shuts down. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HDFS-1262) Failed pipeline creation during append leaves lease hanging on NN
[ https://issues.apache.org/jira/browse/HDFS-1262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sam rash updated HDFS-1262: --- Attachment: hdfs-1262-5.txt address todd's comments (except for RPC compatibility--pending discussion) > Failed pipeline creation during append leaves lease hanging on NN > - > > Key: HDFS-1262 > URL: https://issues.apache.org/jira/browse/HDFS-1262 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs client, name-node >Affects Versions: 0.20-append >Reporter: Todd Lipcon >Assignee: sam rash >Priority: Critical > Fix For: 0.20-append > > Attachments: hdfs-1262-1.txt, hdfs-1262-2.txt, hdfs-1262-3.txt, > hdfs-1262-4.txt, hdfs-1262-5.txt > > > Ryan Rawson came upon this nasty bug in HBase cluster testing. What happened > was the following: > 1) File's original writer died > 2) Recovery client tried to open file for append - looped for a minute or so > until soft lease expired, then append call initiated recovery > 3) Recovery completed successfully > 4) Recovery client calls append again, which succeeds on the NN > 5) For some reason, the block recovery that happens at the start of append > pipeline creation failed on all datanodes 6 times, causing the append() call > to throw an exception back to HBase master. HBase assumed the file wasn't > open and put it back on a queue to try later > 6) Some time later, it tried append again, but the lease was still assigned > to the same DFS client, so it wasn't able to recover. > The recovery failure in step 5 is a separate issue, but the problem for this > JIRA is that the NN can think it failed to open a file for append when the NN > thinks the writer holds a lease. Since the writer keeps renewing its lease, > recovery never happens, and no one can open or recover the file until the DFS > client shuts down. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.