[jira] [Commented] (HDFS-5546) race condition crashes "hadoop ls -R" when directories are moved/removed
[ https://issues.apache.org/jira/browse/HDFS-5546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14525158#comment-14525158 ] Hadoop QA commented on HDFS-5546: - \\ \\ | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 15m 6s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:green}+1{color} | javac | 7m 46s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 52s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 22s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 1m 7s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 35s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 33s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 1m 41s | The patch does not introduce any new Findbugs (version 2.0.3) warnings. | | {color:green}+1{color} | common tests | 22m 35s | Tests passed in hadoop-common. | | | | 60m 41s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12652070/HDFS-5546.2.004.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / f1a152c | | hadoop-common test log | https://builds.apache.org/job/PreCommit-HDFS-Build/10713/artifact/patchprocess/testrun_hadoop-common.txt | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/10713/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf904.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/10713/console | This message was automatically generated. > race condition crashes "hadoop ls -R" when directories are moved/removed > > > Key: HDFS-5546 > URL: https://issues.apache.org/jira/browse/HDFS-5546 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.2.0 >Reporter: Colin Patrick McCabe >Assignee: Lei (Eddy) Xu >Priority: Minor > Attachments: HDFS-5546.1.patch, HDFS-5546.2.000.patch, > HDFS-5546.2.001.patch, HDFS-5546.2.002.patch, HDFS-5546.2.003.patch, > HDFS-5546.2.004.patch > > > This seems to be a rare race condition where we have a sequence of events > like this: > 1. org.apache.hadoop.shell.Ls calls DFS#getFileStatus on directory D. > 2. someone deletes or moves directory D > 3. org.apache.hadoop.shell.Ls calls PathData#getDirectoryContents(D), which > calls DFS#listStatus(D). This throws FileNotFoundException. > 4. ls command terminates with FNF -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-5546) race condition crashes "hadoop ls -R" when directories are moved/removed
[ https://issues.apache.org/jira/browse/HDFS-5546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14525075#comment-14525075 ] Hadoop QA commented on HDFS-5546: - \\ \\ | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 14m 33s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:green}+1{color} | javac | 7m 28s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 30s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 22s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 1m 5s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 32s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 33s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 1m 40s | The patch does not introduce any new Findbugs (version 2.0.3) warnings. | | {color:green}+1{color} | common tests | 24m 9s | Tests passed in hadoop-common. | | | | 60m 56s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12652070/HDFS-5546.2.004.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / f1a152c | | hadoop-common test log | https://builds.apache.org/job/PreCommit-HDFS-Build/10663/artifact/patchprocess/testrun_hadoop-common.txt | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/10663/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf909.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/10663/console | This message was automatically generated. > race condition crashes "hadoop ls -R" when directories are moved/removed > > > Key: HDFS-5546 > URL: https://issues.apache.org/jira/browse/HDFS-5546 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.2.0 >Reporter: Colin Patrick McCabe >Assignee: Lei (Eddy) Xu >Priority: Minor > Attachments: HDFS-5546.1.patch, HDFS-5546.2.000.patch, > HDFS-5546.2.001.patch, HDFS-5546.2.002.patch, HDFS-5546.2.003.patch, > HDFS-5546.2.004.patch > > > This seems to be a rare race condition where we have a sequence of events > like this: > 1. org.apache.hadoop.shell.Ls calls DFS#getFileStatus on directory D. > 2. someone deletes or moves directory D > 3. org.apache.hadoop.shell.Ls calls PathData#getDirectoryContents(D), which > calls DFS#listStatus(D). This throws FileNotFoundException. > 4. ls command terminates with FNF -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-5546) race condition crashes "hadoop ls -R" when directories are moved/removed
[ https://issues.apache.org/jira/browse/HDFS-5546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14134560#comment-14134560 ] Colin Patrick McCabe commented on HDFS-5546: Which patch is current? HDFS-5546.2.004.patch? I'm not completely happy with that patch since the return code is still 0 even after errors are encountered. > race condition crashes "hadoop ls -R" when directories are moved/removed > > > Key: HDFS-5546 > URL: https://issues.apache.org/jira/browse/HDFS-5546 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.2.0 >Reporter: Colin Patrick McCabe >Assignee: Lei (Eddy) Xu >Priority: Minor > Attachments: HDFS-5546.1.patch, HDFS-5546.2.000.patch, > HDFS-5546.2.001.patch, HDFS-5546.2.002.patch, HDFS-5546.2.003.patch, > HDFS-5546.2.004.patch > > > This seems to be a rare race condition where we have a sequence of events > like this: > 1. org.apache.hadoop.shell.Ls calls DFS#getFileStatus on directory D. > 2. someone deletes or moves directory D > 3. org.apache.hadoop.shell.Ls calls PathData#getDirectoryContents(D), which > calls DFS#listStatus(D). This throws FileNotFoundException. > 4. ls command terminates with FNF -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-5546) race condition crashes "hadoop ls -R" when directories are moved/removed
[ https://issues.apache.org/jira/browse/HDFS-5546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14134044#comment-14134044 ] Daryn Sharp commented on HDFS-5546: --- Consistency is always good. However, the issue in the curly set expansions is it arguably shouldn't have been part of the globber. The shell pre-expands curly sets before attempting to do glob expansion. Currently that's all done by the globber so I'm not sure there's much we can do. At least in this jira. I'm +1 on the current patch assuming others are too. > race condition crashes "hadoop ls -R" when directories are moved/removed > > > Key: HDFS-5546 > URL: https://issues.apache.org/jira/browse/HDFS-5546 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.2.0 >Reporter: Colin Patrick McCabe >Assignee: Lei (Eddy) Xu >Priority: Minor > Attachments: HDFS-5546.1.patch, HDFS-5546.2.000.patch, > HDFS-5546.2.001.patch, HDFS-5546.2.002.patch, HDFS-5546.2.003.patch, > HDFS-5546.2.004.patch > > > This seems to be a rare race condition where we have a sequence of events > like this: > 1. org.apache.hadoop.shell.Ls calls DFS#getFileStatus on directory D. > 2. someone deletes or moves directory D > 3. org.apache.hadoop.shell.Ls calls PathData#getDirectoryContents(D), which > calls DFS#listStatus(D). This throws FileNotFoundException. > 4. ls command terminates with FNF -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-5546) race condition crashes "hadoop ls -R" when directories are moved/removed
[ https://issues.apache.org/jira/browse/HDFS-5546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14133367#comment-14133367 ] Hadoop QA commented on HDFS-5546: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12652070/HDFS-5546.2.004.patch against trunk revision 14e2639. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-common-project/hadoop-common: org.apache.hadoop.ipc.TestFairCallQueue org.apache.hadoop.crypto.key.TestValueQueue org.apache.hadoop.ha.TestZKFailoverControllerStress org.apache.hadoop.ipc.TestDecayRpcScheduler org.apache.hadoop.ipc.TestIPC org.apache.hadoop.crypto.random.TestOsSecureRandom {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/8022//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/8022//console This message is automatically generated. > race condition crashes "hadoop ls -R" when directories are moved/removed > > > Key: HDFS-5546 > URL: https://issues.apache.org/jira/browse/HDFS-5546 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.2.0 >Reporter: Colin Patrick McCabe >Assignee: Lei (Eddy) Xu >Priority: Minor > Attachments: HDFS-5546.1.patch, HDFS-5546.2.000.patch, > HDFS-5546.2.001.patch, HDFS-5546.2.002.patch, HDFS-5546.2.003.patch, > HDFS-5546.2.004.patch > > > This seems to be a rare race condition where we have a sequence of events > like this: > 1. org.apache.hadoop.shell.Ls calls DFS#getFileStatus on directory D. > 2. someone deletes or moves directory D > 3. org.apache.hadoop.shell.Ls calls PathData#getDirectoryContents(D), which > calls DFS#listStatus(D). This throws FileNotFoundException. > 4. ls command terminates with FNF -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-5546) race condition crashes "hadoop ls -R" when directories are moved/removed
[ https://issues.apache.org/jira/browse/HDFS-5546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14043956#comment-14043956 ] Colin Patrick McCabe commented on HDFS-5546: I agree with a lot of the stuff that's been presented, but I also think our behavior should be consistent beween "{{ls /a1/b /a2/b}}" and "{{ls /a\{1,2\}/b}}", and right now I can't see a good way to achieve that if we catch IOE (since the globber does not catch IOE) On the other hand, if we catch FNF and continue if a top-level directory disappears on us, then we are making things more consistent, since the globber catches and ignores IOEs (when dealing with globs). bq. Colin Patrick McCabe shouldn't the globStatus() be out of scope for this JIRA? Maybe we should open another related JIRA? I'm not sure how the globber would report IOE other than throwing it. We'd have to return a list of {{Option}} or something? It doesn't seem like the kind of change that could be made compatibly, since we'd need a new interface. So overall I would lean towards just catching FNF at the top-level, like the earlier patch did. And maybe revisiting this later if we have better ideas about how to handle this in the globber as well. [~daryn], [~eddyxu], does that make sense? Or am I trying too hard to be consistent? :) > race condition crashes "hadoop ls -R" when directories are moved/removed > > > Key: HDFS-5546 > URL: https://issues.apache.org/jira/browse/HDFS-5546 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.2.0 >Reporter: Colin Patrick McCabe >Assignee: Lei (Eddy) Xu >Priority: Minor > Fix For: 3.0.0 > > Attachments: HDFS-5546.1.patch, HDFS-5546.2.000.patch, > HDFS-5546.2.001.patch, HDFS-5546.2.002.patch, HDFS-5546.2.003.patch, > HDFS-5546.2.004.patch > > > This seems to be a rare race condition where we have a sequence of events > like this: > 1. org.apache.hadoop.shell.Ls calls DFS#getFileStatus on directory D. > 2. someone deletes or moves directory D > 3. org.apache.hadoop.shell.Ls calls PathData#getDirectoryContents(D), which > calls DFS#listStatus(D). This throws FileNotFoundException. > 4. ls command terminates with FNF -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-5546) race condition crashes "hadoop ls -R" when directories are moved/removed
[ https://issues.apache.org/jira/browse/HDFS-5546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14042442#comment-14042442 ] Lei (Eddy) Xu commented on HDFS-5546: - [~daryn] Would you mind to take another look on this patch? Thank you very much! > race condition crashes "hadoop ls -R" when directories are moved/removed > > > Key: HDFS-5546 > URL: https://issues.apache.org/jira/browse/HDFS-5546 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.2.0 >Reporter: Colin Patrick McCabe >Assignee: Lei (Eddy) Xu >Priority: Minor > Fix For: 3.0.0 > > Attachments: HDFS-5546.1.patch, HDFS-5546.2.000.patch, > HDFS-5546.2.001.patch, HDFS-5546.2.002.patch, HDFS-5546.2.003.patch, > HDFS-5546.2.004.patch > > > This seems to be a rare race condition where we have a sequence of events > like this: > 1. org.apache.hadoop.shell.Ls calls DFS#getFileStatus on directory D. > 2. someone deletes or moves directory D > 3. org.apache.hadoop.shell.Ls calls PathData#getDirectoryContents(D), which > calls DFS#listStatus(D). This throws FileNotFoundException. > 4. ls command terminates with FNF -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-5546) race condition crashes "hadoop ls -R" when directories are moved/removed
[ https://issues.apache.org/jira/browse/HDFS-5546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14041420#comment-14041420 ] Hadoop QA commented on HDFS-5546: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12652070/HDFS-5546.2.004.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-common-project/hadoop-common: org.apache.hadoop.ha.TestZKFailoverController org.apache.hadoop.ha.TestZKFailoverControllerStress {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/7216//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7216//console This message is automatically generated. > race condition crashes "hadoop ls -R" when directories are moved/removed > > > Key: HDFS-5546 > URL: https://issues.apache.org/jira/browse/HDFS-5546 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.2.0 >Reporter: Colin Patrick McCabe >Assignee: Lei (Eddy) Xu >Priority: Minor > Fix For: 3.0.0 > > Attachments: HDFS-5546.1.patch, HDFS-5546.2.000.patch, > HDFS-5546.2.001.patch, HDFS-5546.2.002.patch, HDFS-5546.2.003.patch, > HDFS-5546.2.004.patch > > > This seems to be a rare race condition where we have a sequence of events > like this: > 1. org.apache.hadoop.shell.Ls calls DFS#getFileStatus on directory D. > 2. someone deletes or moves directory D > 3. org.apache.hadoop.shell.Ls calls PathData#getDirectoryContents(D), which > calls DFS#listStatus(D). This throws FileNotFoundException. > 4. ls command terminates with FNF -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-5546) race condition crashes "hadoop ls -R" when directories are moved/removed
[ https://issues.apache.org/jira/browse/HDFS-5546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14041352#comment-14041352 ] Lei (Eddy) Xu commented on HDFS-5546: - [~daryn] was right on this one, we should just replace FNF to IOException in the first patch. Two test cases to verify the expected behaviors are added though. [~cmccabe] shouldn't the {{globStatus()}} be out of scope for this JIRA? Maybe we should open another related JIRA? > race condition crashes "hadoop ls -R" when directories are moved/removed > > > Key: HDFS-5546 > URL: https://issues.apache.org/jira/browse/HDFS-5546 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.2.0 >Reporter: Colin Patrick McCabe >Assignee: Lei (Eddy) Xu >Priority: Minor > Fix For: 3.0.0 > > Attachments: HDFS-5546.1.patch, HDFS-5546.2.000.patch, > HDFS-5546.2.001.patch, HDFS-5546.2.002.patch, HDFS-5546.2.003.patch, > HDFS-5546.2.004.patch > > > This seems to be a rare race condition where we have a sequence of events > like this: > 1. org.apache.hadoop.shell.Ls calls DFS#getFileStatus on directory D. > 2. someone deletes or moves directory D > 3. org.apache.hadoop.shell.Ls calls PathData#getDirectoryContents(D), which > calls DFS#listStatus(D). This throws FileNotFoundException. > 4. ls command terminates with FNF -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-5546) race condition crashes "hadoop ls -R" when directories are moved/removed
[ https://issues.apache.org/jira/browse/HDFS-5546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14041159#comment-14041159 ] Colin Patrick McCabe commented on HDFS-5546: I think what Daryn is advocating is that when attempting to recurse into a directory, we should catch IOE for the {{listStatus}} operation, not just FNF. Although this makes sense to me, there is a bit of a fly in the ointment-- if we have a glob expression like {{/\*/\*}}, the Globber internally will throw an exception if there is a path error while resolving the globs. For example, if you have {{/a/b/c}} and {{/a/r/c}}, and /a/r is inaccessible to you, {{ls /\*/\*/c}} will fail with an {{AccessControlException}} before displaying anything. This behavior has existed basically forever in the globber code (it wasn't added by the globber rewrite) and unfortunately, there is no good way to fix it now. The problem is that there is no way to indicate that we got an error other than throwing an exception, and an exception terminates the whole glob operation, even if there were other valid results. So in the interest of consistency, perhaps we should keep things the way they are, and only catch FNF? {{ls /a/b/c /a/r/c}} seems similar conceptually to {{ls /\*/\*/c}}... it is tricky to explain why an exception should terminate one but not the other... Eddy, can you take a look at the internal JIRA that prompted this and see if it was user error? I'm less and less convinced we should change {{ls -R}}... > race condition crashes "hadoop ls -R" when directories are moved/removed > > > Key: HDFS-5546 > URL: https://issues.apache.org/jira/browse/HDFS-5546 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.2.0 >Reporter: Colin Patrick McCabe >Assignee: Lei (Eddy) Xu >Priority: Minor > Fix For: 3.0.0 > > Attachments: HDFS-5546.1.patch, HDFS-5546.2.000.patch, > HDFS-5546.2.001.patch, HDFS-5546.2.002.patch, HDFS-5546.2.003.patch > > > This seems to be a rare race condition where we have a sequence of events > like this: > 1. org.apache.hadoop.shell.Ls calls DFS#getFileStatus on directory D. > 2. someone deletes or moves directory D > 3. org.apache.hadoop.shell.Ls calls PathData#getDirectoryContents(D), which > calls DFS#listStatus(D). This throws FileNotFoundException. > 4. ls command terminates with FNF -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-5546) race condition crashes "hadoop ls -R" when directories are moved/removed
[ https://issues.apache.org/jira/browse/HDFS-5546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14041020#comment-14041020 ] Lei (Eddy) Xu commented on HDFS-5546: - Maybe I misunderstand this JIRA. If printing FNF exception during printing out ls information is normal behavior as what {{/bin/ls}} do, the current {{trunk}} works correctly and thus it does not need to be fixed. > race condition crashes "hadoop ls -R" when directories are moved/removed > > > Key: HDFS-5546 > URL: https://issues.apache.org/jira/browse/HDFS-5546 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.2.0 >Reporter: Colin Patrick McCabe >Assignee: Lei (Eddy) Xu >Priority: Minor > Fix For: 3.0.0 > > Attachments: HDFS-5546.1.patch, HDFS-5546.2.000.patch, > HDFS-5546.2.001.patch, HDFS-5546.2.002.patch, HDFS-5546.2.003.patch > > > This seems to be a rare race condition where we have a sequence of events > like this: > 1. org.apache.hadoop.shell.Ls calls DFS#getFileStatus on directory D. > 2. someone deletes or moves directory D > 3. org.apache.hadoop.shell.Ls calls PathData#getDirectoryContents(D), which > calls DFS#listStatus(D). This throws FileNotFoundException. > 4. ls command terminates with FNF -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-5546) race condition crashes "hadoop ls -R" when directories are moved/removed
[ https://issues.apache.org/jira/browse/HDFS-5546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14039207#comment-14039207 ] Daryn Sharp commented on HDFS-5546: --- In your example, what are you listing? "hadoop fs -ls -R /" or "hadoop fs -ls -R /test /other"? Why will it stop? The exception will be caught, displayed, and it moves to the next item. BTW, checked gnu coreutils and ls will immediately emit a warning about a path that disappears, and eventually exit with status 1 when done which matches FsShell's behavior. {noformat} /* Exit statuses. */ enum { /* "ls" had a minor problem. E.g., while processing a directory, ls obtained the name of an entry via readdir, yet was later unable to stat that name. This happens when listing a directory in which entries are actively being removed or renamed. */ LS_MINOR_PROBLEM = 1, {noformat} > race condition crashes "hadoop ls -R" when directories are moved/removed > > > Key: HDFS-5546 > URL: https://issues.apache.org/jira/browse/HDFS-5546 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.2.0 >Reporter: Colin Patrick McCabe >Assignee: Lei (Eddy) Xu >Priority: Minor > Fix For: 3.0.0 > > Attachments: HDFS-5546.1.patch, HDFS-5546.2.000.patch, > HDFS-5546.2.001.patch, HDFS-5546.2.002.patch, HDFS-5546.2.003.patch > > > This seems to be a rare race condition where we have a sequence of events > like this: > 1. org.apache.hadoop.shell.Ls calls DFS#getFileStatus on directory D. > 2. someone deletes or moves directory D > 3. org.apache.hadoop.shell.Ls calls PathData#getDirectoryContents(D), which > calls DFS#listStatus(D). This throws FileNotFoundException. > 4. ls command terminates with FNF -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-5546) race condition crashes "hadoop ls -R" when directories are moved/removed
[ https://issues.apache.org/jira/browse/HDFS-5546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14039140#comment-14039140 ] Lei (Eddy) Xu commented on HDFS-5546: - My question is that, suppose we have directories {{/test/0..10}} and {{/other/1..10}}, and deleted {{/test/1}} during execution. The first patch will only print {{/test/0}} and {{/other/1..10}}, so that {{/test/2..10}} are ignored in this case. > race condition crashes "hadoop ls -R" when directories are moved/removed > > > Key: HDFS-5546 > URL: https://issues.apache.org/jira/browse/HDFS-5546 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.2.0 >Reporter: Colin Patrick McCabe >Assignee: Lei (Eddy) Xu >Priority: Minor > Fix For: 3.0.0 > > Attachments: HDFS-5546.1.patch, HDFS-5546.2.000.patch, > HDFS-5546.2.001.patch, HDFS-5546.2.002.patch, HDFS-5546.2.003.patch > > > This seems to be a rare race condition where we have a sequence of events > like this: > 1. org.apache.hadoop.shell.Ls calls DFS#getFileStatus on directory D. > 2. someone deletes or moves directory D > 3. org.apache.hadoop.shell.Ls calls PathData#getDirectoryContents(D), which > calls DFS#listStatus(D). This throws FileNotFoundException. > 4. ls command terminates with FNF -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-5546) race condition crashes "hadoop ls -R" when directories are moved/removed
[ https://issues.apache.org/jira/browse/HDFS-5546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14039131#comment-14039131 ] Daryn Sharp commented on HDFS-5546: --- I'm not sure I understand the question. The first patch didn't stop. It just displayed the FNF for the command line path and kept going. If it traps IOE instead, I think that's all we need. > race condition crashes "hadoop ls -R" when directories are moved/removed > > > Key: HDFS-5546 > URL: https://issues.apache.org/jira/browse/HDFS-5546 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.2.0 >Reporter: Colin Patrick McCabe >Assignee: Lei (Eddy) Xu >Priority: Minor > Fix For: 3.0.0 > > Attachments: HDFS-5546.1.patch, HDFS-5546.2.000.patch, > HDFS-5546.2.001.patch, HDFS-5546.2.002.patch, HDFS-5546.2.003.patch > > > This seems to be a rare race condition where we have a sequence of events > like this: > 1. org.apache.hadoop.shell.Ls calls DFS#getFileStatus on directory D. > 2. someone deletes or moves directory D > 3. org.apache.hadoop.shell.Ls calls PathData#getDirectoryContents(D), which > calls DFS#listStatus(D). This throws FileNotFoundException. > 4. ls command terminates with FNF -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-5546) race condition crashes "hadoop ls -R" when directories are moved/removed
[ https://issues.apache.org/jira/browse/HDFS-5546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14039008#comment-14039008 ] Lei (Eddy) Xu commented on HDFS-5546: - Thanks [~daryn]. You are right. {{ls}} should not print the Warning message. Just one question, would it be preferable to stop iterating a namespace when the first FNF happens, as what the first patch did? I thought {{/bin/ls -R}} printing the rest of the namespace even some files are deleted during the execution? > race condition crashes "hadoop ls -R" when directories are moved/removed > > > Key: HDFS-5546 > URL: https://issues.apache.org/jira/browse/HDFS-5546 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.2.0 >Reporter: Colin Patrick McCabe >Assignee: Lei (Eddy) Xu >Priority: Minor > Fix For: 3.0.0 > > Attachments: HDFS-5546.1.patch, HDFS-5546.2.000.patch, > HDFS-5546.2.001.patch, HDFS-5546.2.002.patch, HDFS-5546.2.003.patch > > > This seems to be a rare race condition where we have a sequence of events > like this: > 1. org.apache.hadoop.shell.Ls calls DFS#getFileStatus on directory D. > 2. someone deletes or moves directory D > 3. org.apache.hadoop.shell.Ls calls PathData#getDirectoryContents(D), which > calls DFS#listStatus(D). This throws FileNotFoundException. > 4. ls command terminates with FNF -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-5546) race condition crashes "hadoop ls -R" when directories are moved/removed
[ https://issues.apache.org/jira/browse/HDFS-5546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14038848#comment-14038848 ] Daryn Sharp commented on HDFS-5546: --- {{displayError}} does not terminate the operation. It just prints the exception. The even numbered directories are listed because they existed at the time the {{listStatus}} returned. This is expected behavior so there's no need to re-test existence before displaying. As Colin pointed out, ls is not and cannot be a perfect snapshot in time. The current patch prints an awkward warning (non-posix warning - *nix ls doesn't display this) when a command line path is removed. That's inconsistent with the non-race output when the path never existed. All that's needed is the first patch tweaked to catch IOE, not just FNF because it's not the only exception that may occur. This will make ls consistent with all other commands. > race condition crashes "hadoop ls -R" when directories are moved/removed > > > Key: HDFS-5546 > URL: https://issues.apache.org/jira/browse/HDFS-5546 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.2.0 >Reporter: Colin Patrick McCabe >Assignee: Lei (Eddy) Xu >Priority: Minor > Fix For: 3.0.0 > > Attachments: HDFS-5546.1.patch, HDFS-5546.2.000.patch, > HDFS-5546.2.001.patch, HDFS-5546.2.002.patch, HDFS-5546.2.003.patch > > > This seems to be a rare race condition where we have a sequence of events > like this: > 1. org.apache.hadoop.shell.Ls calls DFS#getFileStatus on directory D. > 2. someone deletes or moves directory D > 3. org.apache.hadoop.shell.Ls calls PathData#getDirectoryContents(D), which > calls DFS#listStatus(D). This throws FileNotFoundException. > 4. ls command terminates with FNF -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-5546) race condition crashes "hadoop ls -R" when directories are moved/removed
[ https://issues.apache.org/jira/browse/HDFS-5546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14038289#comment-14038289 ] Hadoop QA commented on HDFS-5546: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12651573/HDFS-5546.2.003.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-common-project/hadoop-common. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/7184//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7184//console This message is automatically generated. > race condition crashes "hadoop ls -R" when directories are moved/removed > > > Key: HDFS-5546 > URL: https://issues.apache.org/jira/browse/HDFS-5546 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.2.0 >Reporter: Colin Patrick McCabe >Assignee: Lei (Eddy) Xu >Priority: Minor > Fix For: 3.0.0 > > Attachments: HDFS-5546.1.patch, HDFS-5546.2.000.patch, > HDFS-5546.2.001.patch, HDFS-5546.2.002.patch, HDFS-5546.2.003.patch > > > This seems to be a rare race condition where we have a sequence of events > like this: > 1. org.apache.hadoop.shell.Ls calls DFS#getFileStatus on directory D. > 2. someone deletes or moves directory D > 3. org.apache.hadoop.shell.Ls calls PathData#getDirectoryContents(D), which > calls DFS#listStatus(D). This throws FileNotFoundException. > 4. ls command terminates with FNF -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-5546) race condition crashes "hadoop ls -R" when directories are moved/removed
[ https://issues.apache.org/jira/browse/HDFS-5546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14037999#comment-14037999 ] Lei (Eddy) Xu commented on HDFS-5546: - It appears to me that the actual behavior is that the FNF exceptions are printed by {{Command#displayError(e)}} in the middle of LS output, which looks like that it terminates `LS` because of FNF. I will write a test case to verify this. I am actually working on a change that only prints out existing directories/files and puts the captured FNF exception to {{Command#exceptions}}. In the end of {{LS}}, it checks whether there is FNF in {{exceptions}} and prints a human readable message if there is one or more FNF. What do you think, [~daryn] and [~cmccabe]? > race condition crashes "hadoop ls -R" when directories are moved/removed > > > Key: HDFS-5546 > URL: https://issues.apache.org/jira/browse/HDFS-5546 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.2.0 >Reporter: Colin Patrick McCabe >Assignee: Lei (Eddy) Xu >Priority: Minor > Fix For: 3.0.0 > > Attachments: HDFS-5546.1.patch, HDFS-5546.2.000.patch, > HDFS-5546.2.001.patch, HDFS-5546.2.002.patch > > > This seems to be a rare race condition where we have a sequence of events > like this: > 1. org.apache.hadoop.shell.Ls calls DFS#getFileStatus on directory D. > 2. someone deletes or moves directory D > 3. org.apache.hadoop.shell.Ls calls PathData#getDirectoryContents(D), which > calls DFS#listStatus(D). This throws FileNotFoundException. > 4. ls command terminates with FNF -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-5546) race condition crashes "hadoop ls -R" when directories are moved/removed
[ https://issues.apache.org/jira/browse/HDFS-5546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14037978#comment-14037978 ] Colin Patrick McCabe commented on HDFS-5546: Hmm. I think you're right about this. {{Command#processPaths}} wraps the call to {{recursePath}} in an IOE try... catch block. So if we can't recurse into an individual child, we should still be able to move forward with the other ones. Of course, there is no such protection for the paths specified on the command-line. I tried looking for all the places IOE could be thrown, but didn't manage to spot one that would abort the recursion because of a problem with a child. Eddy, can you run your unit test against trunk to verify all this? > race condition crashes "hadoop ls -R" when directories are moved/removed > > > Key: HDFS-5546 > URL: https://issues.apache.org/jira/browse/HDFS-5546 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.2.0 >Reporter: Colin Patrick McCabe >Assignee: Lei (Eddy) Xu >Priority: Minor > Fix For: 3.0.0 > > Attachments: HDFS-5546.1.patch, HDFS-5546.2.000.patch, > HDFS-5546.2.001.patch, HDFS-5546.2.002.patch > > > This seems to be a rare race condition where we have a sequence of events > like this: > 1. org.apache.hadoop.shell.Ls calls DFS#getFileStatus on directory D. > 2. someone deletes or moves directory D > 3. org.apache.hadoop.shell.Ls calls PathData#getDirectoryContents(D), which > calls DFS#listStatus(D). This throws FileNotFoundException. > 4. ls command terminates with FNF -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-5546) race condition crashes "hadoop ls -R" when directories are moved/removed
[ https://issues.apache.org/jira/browse/HDFS-5546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14037346#comment-14037346 ] Daryn Sharp commented on HDFS-5546: --- I took a look at the source. Ls doesn't stop during a descent. It appears to only prematurely abort when invoked with multiple args and one of those args disappears after the command starts. Ex. "hadoop fs -ls /dir1 ...". If /dir1 disappears, then it aborts w/o processing subsequent paths which I agree is wrong. Displaying FNF and returning non-zero is the correct behavior though since the directory was explicitly requested to be listed. The first patch appears to be the correct solution, but instead of catching only FNF and displaying the error, I think it should catch IOE. > race condition crashes "hadoop ls -R" when directories are moved/removed > > > Key: HDFS-5546 > URL: https://issues.apache.org/jira/browse/HDFS-5546 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.2.0 >Reporter: Colin Patrick McCabe >Assignee: Lei (Eddy) Xu >Priority: Minor > Fix For: 3.0.0 > > Attachments: HDFS-5546.1.patch, HDFS-5546.2.000.patch, > HDFS-5546.2.001.patch, HDFS-5546.2.002.patch > > > This seems to be a rare race condition where we have a sequence of events > like this: > 1. org.apache.hadoop.shell.Ls calls DFS#getFileStatus on directory D. > 2. someone deletes or moves directory D > 3. org.apache.hadoop.shell.Ls calls PathData#getDirectoryContents(D), which > calls DFS#listStatus(D). This throws FileNotFoundException. > 4. ls command terminates with FNF -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-5546) race condition crashes "hadoop ls -R" when directories are moved/removed
[ https://issues.apache.org/jira/browse/HDFS-5546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14036087#comment-14036087 ] Colin Patrick McCabe commented on HDFS-5546: bq. Maybe we should print more verbose messages and tell the users re-run `ls -R`? If you run {{/bin/ls -R}}, it doesn't fail with a warning message if someone is making changes to the directories you're listing. I think we should stick to trying to implement that behavior, which is what users expect. > race condition crashes "hadoop ls -R" when directories are moved/removed > > > Key: HDFS-5546 > URL: https://issues.apache.org/jira/browse/HDFS-5546 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.2.0 >Reporter: Colin Patrick McCabe >Assignee: Lei (Eddy) Xu >Priority: Minor > Fix For: 3.0.0 > > Attachments: HDFS-5546.1.patch, HDFS-5546.2.000.patch, > HDFS-5546.2.001.patch, HDFS-5546.2.002.patch > > > This seems to be a rare race condition where we have a sequence of events > like this: > 1. org.apache.hadoop.shell.Ls calls DFS#getFileStatus on directory D. > 2. someone deletes or moves directory D > 3. org.apache.hadoop.shell.Ls calls PathData#getDirectoryContents(D), which > calls DFS#listStatus(D). This throws FileNotFoundException. > 4. ls command terminates with FNF -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-5546) race condition crashes "hadoop ls -R" when directories are moved/removed
[ https://issues.apache.org/jira/browse/HDFS-5546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14034879#comment-14034879 ] Lei (Eddy) Xu commented on HDFS-5546: - [~daryn] Thank you very much for your detailed comments and thank [~cmccabe] for explanation! As this patch deals with a _very rare_ case and is just for the CLI command {{ls}}, I would prefer avoiding crashing user oriented CLI program to a little bit heavier load for NN in such a rare case. Moreover, the overhead of looking up non-existed files/dirs on NN in this rare case is not more than a normal {{ls}} on this namespace, without deleting sub-directories. If there is a data point to back up how rare such an race condition will occur, it would be great for us to justify this design though. Maybe we should print more verbose messages and tell the users re-run `ls -R`? > race condition crashes "hadoop ls -R" when directories are moved/removed > > > Key: HDFS-5546 > URL: https://issues.apache.org/jira/browse/HDFS-5546 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.2.0 >Reporter: Colin Patrick McCabe >Assignee: Lei (Eddy) Xu >Priority: Minor > Fix For: 3.0.0 > > Attachments: HDFS-5546.1.patch, HDFS-5546.2.000.patch, > HDFS-5546.2.001.patch, HDFS-5546.2.002.patch > > > This seems to be a rare race condition where we have a sequence of events > like this: > 1. org.apache.hadoop.shell.Ls calls DFS#getFileStatus on directory D. > 2. someone deletes or moves directory D > 3. org.apache.hadoop.shell.Ls calls PathData#getDirectoryContents(D), which > calls DFS#listStatus(D). This throws FileNotFoundException. > 4. ls command terminates with FNF -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-5546) race condition crashes "hadoop ls -R" when directories are moved/removed
[ https://issues.apache.org/jira/browse/HDFS-5546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14034712#comment-14034712 ] Colin Patrick McCabe commented on HDFS-5546: bq. Maybe I'm misunderstanding the description. Is this jira only trying to address a tiny race if the path existed when the command started, but disappeared before being listed? If yes, then FNF is exactly the correct behavior. After that, the stats you see being checked in the code are supposed to be from listStatus. The problem is that right now, we have a race between getting back a directory entry from {{listStatus}} on the parent directory, and calling {{listStatus}} on it. Think of the following interleaving: 1. Eddy issues "hadoop fs -ls -R /" 2. ls command calls {{listStatus( /)}} and gets back status_a, status_b, status_c 3. ls command uses status_a to print out a line describing /a 4. Colin removes directory a 5. ls command calls {{listStatus("/a")}} 6. {{FileNotFoundException}} aborts the whole ls command. Nothing else is printed. Basically, this makes the {{ls -R}} command unusable in situations where files are changing. From a user's perspective, this just translates to "{{ls -R}} is broken" since you effectively can't really use it. bq. If you are trying to make ls always forge ahead when it gets FNF while in a subdir, that has some peril associated with it. What if the item being listed isn't what was deleted? What if an ancestor directory was deleted? Should ls keep pounding on the NN to list every directory it thinks should be there? And then as it ascends back up the tree should keep trying to list other siblings it thinks should be there? We're never going to be able to provide a 100% consistent view of the filesystem via {{ls -R}}. HDFS simply doesn't have a way of getting back a snapshot an entire subtree (well, except HDFS snapshots, which I think we can all agree are overkill here.). You are going to need multiple calls to {{listDir}}, and things may change in between those calls. These are just the facts of life, and something we have to accept. After all, we can't even get back a 100% consistent view of a single large directory via {{listStatus}}. Large directories will need multiple {{listStatus}} RPC calls in between and something may have changed in between RPCs. The {{/bin/ls}} command on UNIX has similar issues. But clearly, despite the lack of snapshot consistency, people do find {{ls}} to be a useful command, though. Unless I'm missing something, there is no major harm if we just do forge ahead and try to call {{listStatus}} on subdirectories we retrieved from the previous {{listStatus}} call. The worst that can happen is we try to list something that isn't there and get an FNF which we ignore. We could also print out the FNFs, but I'm not sure what the user would do with this information. bq. This is not an acceptable patch. It's not ok to swallow the FNF and return a success exit code. We still throw an FNF if the directory where {{ls -R}} starts doesn't exist. It's just that we don't shut down the whole enterprise if something underneath that directory changes during our recursion. Does that make sense? > race condition crashes "hadoop ls -R" when directories are moved/removed > > > Key: HDFS-5546 > URL: https://issues.apache.org/jira/browse/HDFS-5546 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.2.0 >Reporter: Colin Patrick McCabe >Assignee: Lei (Eddy) Xu >Priority: Minor > Fix For: 3.0.0 > > Attachments: HDFS-5546.1.patch, HDFS-5546.2.000.patch, > HDFS-5546.2.001.patch, HDFS-5546.2.002.patch > > > This seems to be a rare race condition where we have a sequence of events > like this: > 1. org.apache.hadoop.shell.Ls calls DFS#getFileStatus on directory D. > 2. someone deletes or moves directory D > 3. org.apache.hadoop.shell.Ls calls PathData#getDirectoryContents(D), which > calls DFS#listStatus(D). This throws FileNotFoundException. > 4. ls command terminates with FNF -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-5546) race condition crashes "hadoop ls -R" when directories are moved/removed
[ https://issues.apache.org/jira/browse/HDFS-5546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14034671#comment-14034671 ] Hadoop QA commented on HDFS-5546: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12650918/HDFS-5546.2.002.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-common-project/hadoop-common. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/7154//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7154//console This message is automatically generated. > race condition crashes "hadoop ls -R" when directories are moved/removed > > > Key: HDFS-5546 > URL: https://issues.apache.org/jira/browse/HDFS-5546 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.2.0 >Reporter: Colin Patrick McCabe >Assignee: Lei (Eddy) Xu >Priority: Minor > Fix For: 3.0.0 > > Attachments: HDFS-5546.1.patch, HDFS-5546.2.000.patch, > HDFS-5546.2.001.patch, HDFS-5546.2.002.patch > > > This seems to be a rare race condition where we have a sequence of events > like this: > 1. org.apache.hadoop.shell.Ls calls DFS#getFileStatus on directory D. > 2. someone deletes or moves directory D > 3. org.apache.hadoop.shell.Ls calls PathData#getDirectoryContents(D), which > calls DFS#listStatus(D). This throws FileNotFoundException. > 4. ls command terminates with FNF -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-5546) race condition crashes "hadoop ls -R" when directories are moved/removed
[ https://issues.apache.org/jira/browse/HDFS-5546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14034630#comment-14034630 ] Daryn Sharp commented on HDFS-5546: --- This is not an acceptable patch. It's not ok to swallow the FNF and return a success exit code. Maybe I'm misunderstanding the description. Is this jira only trying to address a tiny race if the path existed when the command started, but disappeared before being listed? If yes, then FNF is exactly the correct behavior. After that, the stats you see being checked in the code are supposed to be from listStatus. If you are trying to make ls always forge ahead when it gets FNF while in a subdir, that has some peril associated with it. What if the item being listed isn't what was deleted? What if an ancestor directory was deleted? Should ls keep pounding on the NN to list every directory it thinks should be there? And then as it ascends back up the tree should keep trying to list other siblings it thinks should be there? > race condition crashes "hadoop ls -R" when directories are moved/removed > > > Key: HDFS-5546 > URL: https://issues.apache.org/jira/browse/HDFS-5546 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.2.0 >Reporter: Colin Patrick McCabe >Assignee: Lei (Eddy) Xu >Priority: Minor > Fix For: 3.0.0 > > Attachments: HDFS-5546.1.patch, HDFS-5546.2.000.patch, > HDFS-5546.2.001.patch, HDFS-5546.2.002.patch > > > This seems to be a rare race condition where we have a sequence of events > like this: > 1. org.apache.hadoop.shell.Ls calls DFS#getFileStatus on directory D. > 2. someone deletes or moves directory D > 3. org.apache.hadoop.shell.Ls calls PathData#getDirectoryContents(D), which > calls DFS#listStatus(D). This throws FileNotFoundException. > 4. ls command terminates with FNF -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-5546) race condition crashes "hadoop ls -R" when directories are moved/removed
[ https://issues.apache.org/jira/browse/HDFS-5546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14034535#comment-14034535 ] Hadoop QA commented on HDFS-5546: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12650888/HDFS-5546.2.001.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-common-project/hadoop-common: org.apache.hadoop.fs.TestPath org.apache.hadoop.fs.shell.TestPathData The following test timeouts occurred in hadoop-common-project/hadoop-common: org.apache.hadoop.http.TestHttpServer {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/7151//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7151//console This message is automatically generated. > race condition crashes "hadoop ls -R" when directories are moved/removed > > > Key: HDFS-5546 > URL: https://issues.apache.org/jira/browse/HDFS-5546 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.2.0 >Reporter: Colin Patrick McCabe >Assignee: Kousuke Saruta >Priority: Minor > Fix For: 3.0.0 > > Attachments: HDFS-5546.1.patch, HDFS-5546.2.000.patch, > HDFS-5546.2.001.patch > > > This seems to be a rare race condition where we have a sequence of events > like this: > 1. org.apache.hadoop.shell.Ls calls DFS#getFileStatus on directory D. > 2. someone deletes or moves directory D > 3. org.apache.hadoop.shell.Ls calls PathData#getDirectoryContents(D), which > calls DFS#listStatus(D). This throws FileNotFoundException. > 4. ls command terminates with FNF -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-5546) race condition crashes "hadoop ls -R" when directories are moved/removed
[ https://issues.apache.org/jira/browse/HDFS-5546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14033393#comment-14033393 ] Colin Patrick McCabe commented on HDFS-5546: It's nondeterministic in the sense that the unit test might not fail on buggy code 100% of the time. It should not be nondeterministic in the sense that it fails on good code :) > race condition crashes "hadoop ls -R" when directories are moved/removed > > > Key: HDFS-5546 > URL: https://issues.apache.org/jira/browse/HDFS-5546 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.2.0 >Reporter: Colin Patrick McCabe >Assignee: Kousuke Saruta >Priority: Minor > Attachments: HDFS-5546.1.patch, HDFS-5546.2.000.patch > > > This seems to be a rare race condition where we have a sequence of events > like this: > 1. org.apache.hadoop.shell.Ls calls DFS#getFileStatus on directory D. > 2. someone deletes or moves directory D > 3. org.apache.hadoop.shell.Ls calls PathData#getDirectoryContents(D), which > calls DFS#listStatus(D). This throws FileNotFoundException. > 4. ls command terminates with FNF -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-5546) race condition crashes "hadoop ls -R" when directories are moved/removed
[ https://issues.apache.org/jira/browse/HDFS-5546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14033240#comment-14033240 ] Lei (Eddy) Xu commented on HDFS-5546: - I will try to use MockFileSystem to test this issue. > race condition crashes "hadoop ls -R" when directories are moved/removed > > > Key: HDFS-5546 > URL: https://issues.apache.org/jira/browse/HDFS-5546 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.2.0 >Reporter: Colin Patrick McCabe >Assignee: Kousuke Saruta >Priority: Minor > Attachments: HDFS-5546.1.patch, HDFS-5546.2.000.patch > > > This seems to be a rare race condition where we have a sequence of events > like this: > 1. org.apache.hadoop.shell.Ls calls DFS#getFileStatus on directory D. > 2. someone deletes or moves directory D > 3. org.apache.hadoop.shell.Ls calls PathData#getDirectoryContents(D), which > calls DFS#listStatus(D). This throws FileNotFoundException. > 4. ls command terminates with FNF -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-5546) race condition crashes "hadoop ls -R" when directories are moved/removed
[ https://issues.apache.org/jira/browse/HDFS-5546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14033150#comment-14033150 ] Lei (Eddy) Xu commented on HDFS-5546: - Thanks [~cmccabe]. Would it be nondeterministic to use multithreads to test these race conditions? > race condition crashes "hadoop ls -R" when directories are moved/removed > > > Key: HDFS-5546 > URL: https://issues.apache.org/jira/browse/HDFS-5546 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.2.0 >Reporter: Colin Patrick McCabe >Assignee: Kousuke Saruta >Priority: Minor > Attachments: HDFS-5546.1.patch, HDFS-5546.2.000.patch > > > This seems to be a rare race condition where we have a sequence of events > like this: > 1. org.apache.hadoop.shell.Ls calls DFS#getFileStatus on directory D. > 2. someone deletes or moves directory D > 3. org.apache.hadoop.shell.Ls calls PathData#getDirectoryContents(D), which > calls DFS#listStatus(D). This throws FileNotFoundException. > 4. ls command terminates with FNF -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-5546) race condition crashes "hadoop ls -R" when directories are moved/removed
[ https://issues.apache.org/jira/browse/HDFS-5546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14032895#comment-14032895 ] Colin Patrick McCabe commented on HDFS-5546: I think we need a unit test to go with this that tests a few threads doing {{mkdir}} and {{remove}} while another thread is using {{Ls}} > race condition crashes "hadoop ls -R" when directories are moved/removed > > > Key: HDFS-5546 > URL: https://issues.apache.org/jira/browse/HDFS-5546 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.2.0 >Reporter: Colin Patrick McCabe >Assignee: Kousuke Saruta >Priority: Minor > Attachments: HDFS-5546.1.patch, HDFS-5546.2.000.patch > > > This seems to be a rare race condition where we have a sequence of events > like this: > 1. org.apache.hadoop.shell.Ls calls DFS#getFileStatus on directory D. > 2. someone deletes or moves directory D > 3. org.apache.hadoop.shell.Ls calls PathData#getDirectoryContents(D), which > calls DFS#listStatus(D). This throws FileNotFoundException. > 4. ls command terminates with FNF -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-5546) race condition crashes "hadoop ls -R" when directories are moved/removed
[ https://issues.apache.org/jira/browse/HDFS-5546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13830418#comment-13830418 ] Colin Patrick McCabe commented on HDFS-5546: Thanks, Kousuke. I think the goal is to have it continue, ignoring the failure to stat that entry. This will be a bit tricky since when listing a single file, we can't ignore that error. It probably makes sense to print out a warning at the end, as well. > race condition crashes "hadoop ls -R" when directories are moved/removed > > > Key: HDFS-5546 > URL: https://issues.apache.org/jira/browse/HDFS-5546 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.2.0 >Reporter: Colin Patrick McCabe >Priority: Minor > Attachments: HDFS-5546.1.patch > > > This seems to be a rare race condition where we have a sequence of events > like this: > 1. org.apache.hadoop.shell.Ls calls DFS#getFileStatus on directory D. > 2. someone deletes or moves directory D > 3. org.apache.hadoop.shell.Ls calls PathData#getDirectoryContents(D), which > calls DFS#listStatus(D). This throws FileNotFoundException. > 4. ls command terminates with FNF -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HDFS-5546) race condition crashes "hadoop ls -R" when directories are moved/removed
[ https://issues.apache.org/jira/browse/HDFS-5546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13830366#comment-13830366 ] Kousuke Saruta commented on HDFS-5546: -- I see, and I will try to modify that. > race condition crashes "hadoop ls -R" when directories are moved/removed > > > Key: HDFS-5546 > URL: https://issues.apache.org/jira/browse/HDFS-5546 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.2.0 >Reporter: Colin Patrick McCabe >Priority: Minor > Attachments: HDFS-5546.1.patch > > > This seems to be a rare race condition where we have a sequence of events > like this: > 1. org.apache.hadoop.shell.Ls calls DFS#getFileStatus on directory D. > 2. someone deletes or moves directory D > 3. org.apache.hadoop.shell.Ls calls PathData#getDirectoryContents(D), which > calls DFS#listStatus(D). This throws FileNotFoundException. > 4. ls command terminates with FNF -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HDFS-5546) race condition crashes "hadoop ls -R" when directories are moved/removed
[ https://issues.apache.org/jira/browse/HDFS-5546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13830353#comment-13830353 ] Colin Patrick McCabe commented on HDFS-5546: {code} @@ -86,7 +87,11 @@ protected void processOptions(LinkedList args) protected void processPathArgument(PathData item) throws IOException { // implicitly recurse once for cmdline directories if (dirRecurse && item.stat.isDirectory()) { - recursePath(item); + try { +recursePath(item); + } catch (FileNotFoundException e){ +displayError(e); + } } else { super.processPathArgument(item); } {code} This will result in the first moved/removed file aborting the entire recursive ls with an error message. Basically, the same behavior as now, only with a process exit code of 0 rather than nonzero. That's not what we want. > race condition crashes "hadoop ls -R" when directories are moved/removed > > > Key: HDFS-5546 > URL: https://issues.apache.org/jira/browse/HDFS-5546 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.2.0 >Reporter: Colin Patrick McCabe >Priority: Minor > Attachments: HDFS-5546.1.patch > > > This seems to be a rare race condition where we have a sequence of events > like this: > 1. org.apache.hadoop.shell.Ls calls DFS#getFileStatus on directory D. > 2. someone deletes or moves directory D > 3. org.apache.hadoop.shell.Ls calls PathData#getDirectoryContents(D), which > calls DFS#listStatus(D). This throws FileNotFoundException. > 4. ls command terminates with FNF -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HDFS-5546) race condition crashes "hadoop ls -R" when directories are moved/removed
[ https://issues.apache.org/jira/browse/HDFS-5546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13829261#comment-13829261 ] Colin Patrick McCabe commented on HDFS-5546: The best solution is probably to catch the FNF in #3, and simply not put that directory in the listing, since it doesn't exist by that name any more. I guess you could argue that we should re-list the parent directory in this case to make sure new stuff doesn't exist in it (like a renamed version of D), but that seems like it would open a difficult can of worms since we'd have arbitrary levels of backtracking. Also, we can't really know whether any of the work we had already done is still valid, since the names of directories could all have changed. > race condition crashes "hadoop ls -R" when directories are moved/removed > > > Key: HDFS-5546 > URL: https://issues.apache.org/jira/browse/HDFS-5546 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.2.0 >Reporter: Colin Patrick McCabe >Priority: Minor > > This seems to be a rare race condition where we have a sequence of events > like this: > 1. org.apache.hadoop.shell.Ls calls DFS#getFileStatus on directory D. > 2. someone deletes or moves directory D > 3. org.apache.hadoop.shell.Ls calls PathData#getDirectoryContents(D), which > calls DFS#listStatus(D). This throws FileNotFoundException. > 4. ls command terminates with FNF -- This message was sent by Atlassian JIRA (v6.1#6144)