[jira] [Commented] (HDFS-5546) race condition crashes hadoop ls -R when directories are moved/removed

2015-05-02 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14525158#comment-14525158
 ] 

Hadoop QA commented on HDFS-5546:
-

\\
\\
| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  15m  6s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 1 new or modified test files. |
| {color:green}+1{color} | javac |   7m 46s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 52s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 22s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   1m  7s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 35s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 33s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   1m 41s | The patch does not introduce 
any new Findbugs (version 2.0.3) warnings. |
| {color:green}+1{color} | common tests |  22m 35s | Tests passed in 
hadoop-common. |
| | |  60m 41s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12652070/HDFS-5546.2.004.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / f1a152c |
| hadoop-common test log | 
https://builds.apache.org/job/PreCommit-HDFS-Build/10713/artifact/patchprocess/testrun_hadoop-common.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/10713/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf904.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/10713/console |


This message was automatically generated.

 race condition crashes hadoop ls -R when directories are moved/removed
 

 Key: HDFS-5546
 URL: https://issues.apache.org/jira/browse/HDFS-5546
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.2.0
Reporter: Colin Patrick McCabe
Assignee: Lei (Eddy) Xu
Priority: Minor
 Attachments: HDFS-5546.1.patch, HDFS-5546.2.000.patch, 
 HDFS-5546.2.001.patch, HDFS-5546.2.002.patch, HDFS-5546.2.003.patch, 
 HDFS-5546.2.004.patch


 This seems to be a rare race condition where we have a sequence of events 
 like this:
 1. org.apache.hadoop.shell.Ls calls DFS#getFileStatus on directory D.
 2. someone deletes or moves directory D
 3. org.apache.hadoop.shell.Ls calls PathData#getDirectoryContents(D), which 
 calls DFS#listStatus(D). This throws FileNotFoundException.
 4. ls command terminates with FNF



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-5546) race condition crashes hadoop ls -R when directories are moved/removed

2015-05-02 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14525075#comment-14525075
 ] 

Hadoop QA commented on HDFS-5546:
-

\\
\\
| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  14m 33s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 1 new or modified test files. |
| {color:green}+1{color} | javac |   7m 28s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 30s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 22s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   1m  5s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 32s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 33s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   1m 40s | The patch does not introduce 
any new Findbugs (version 2.0.3) warnings. |
| {color:green}+1{color} | common tests |  24m  9s | Tests passed in 
hadoop-common. |
| | |  60m 56s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12652070/HDFS-5546.2.004.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / f1a152c |
| hadoop-common test log | 
https://builds.apache.org/job/PreCommit-HDFS-Build/10663/artifact/patchprocess/testrun_hadoop-common.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/10663/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf909.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/10663/console |


This message was automatically generated.

 race condition crashes hadoop ls -R when directories are moved/removed
 

 Key: HDFS-5546
 URL: https://issues.apache.org/jira/browse/HDFS-5546
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.2.0
Reporter: Colin Patrick McCabe
Assignee: Lei (Eddy) Xu
Priority: Minor
 Attachments: HDFS-5546.1.patch, HDFS-5546.2.000.patch, 
 HDFS-5546.2.001.patch, HDFS-5546.2.002.patch, HDFS-5546.2.003.patch, 
 HDFS-5546.2.004.patch


 This seems to be a rare race condition where we have a sequence of events 
 like this:
 1. org.apache.hadoop.shell.Ls calls DFS#getFileStatus on directory D.
 2. someone deletes or moves directory D
 3. org.apache.hadoop.shell.Ls calls PathData#getDirectoryContents(D), which 
 calls DFS#listStatus(D). This throws FileNotFoundException.
 4. ls command terminates with FNF



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-5546) race condition crashes hadoop ls -R when directories are moved/removed

2014-09-15 Thread Daryn Sharp (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14134044#comment-14134044
 ] 

Daryn Sharp commented on HDFS-5546:
---

Consistency is always good.  However, the issue in the curly set expansions is 
it arguably shouldn't have been part of the globber.  The shell pre-expands 
curly sets before attempting to do glob expansion.  Currently that's all done 
by the globber so I'm not sure there's much we can do.  At least in this jira.

I'm +1 on the current patch assuming others are too.



 race condition crashes hadoop ls -R when directories are moved/removed
 

 Key: HDFS-5546
 URL: https://issues.apache.org/jira/browse/HDFS-5546
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.2.0
Reporter: Colin Patrick McCabe
Assignee: Lei (Eddy) Xu
Priority: Minor
 Attachments: HDFS-5546.1.patch, HDFS-5546.2.000.patch, 
 HDFS-5546.2.001.patch, HDFS-5546.2.002.patch, HDFS-5546.2.003.patch, 
 HDFS-5546.2.004.patch


 This seems to be a rare race condition where we have a sequence of events 
 like this:
 1. org.apache.hadoop.shell.Ls calls DFS#getFileStatus on directory D.
 2. someone deletes or moves directory D
 3. org.apache.hadoop.shell.Ls calls PathData#getDirectoryContents(D), which 
 calls DFS#listStatus(D). This throws FileNotFoundException.
 4. ls command terminates with FNF



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-5546) race condition crashes hadoop ls -R when directories are moved/removed

2014-09-14 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14133367#comment-14133367
 ] 

Hadoop QA commented on HDFS-5546:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12652070/HDFS-5546.2.004.patch
  against trunk revision 14e2639.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-common-project/hadoop-common:

  org.apache.hadoop.ipc.TestFairCallQueue
  org.apache.hadoop.crypto.key.TestValueQueue
  org.apache.hadoop.ha.TestZKFailoverControllerStress
  org.apache.hadoop.ipc.TestDecayRpcScheduler
  org.apache.hadoop.ipc.TestIPC
  org.apache.hadoop.crypto.random.TestOsSecureRandom

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/8022//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/8022//console

This message is automatically generated.

 race condition crashes hadoop ls -R when directories are moved/removed
 

 Key: HDFS-5546
 URL: https://issues.apache.org/jira/browse/HDFS-5546
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.2.0
Reporter: Colin Patrick McCabe
Assignee: Lei (Eddy) Xu
Priority: Minor
 Attachments: HDFS-5546.1.patch, HDFS-5546.2.000.patch, 
 HDFS-5546.2.001.patch, HDFS-5546.2.002.patch, HDFS-5546.2.003.patch, 
 HDFS-5546.2.004.patch


 This seems to be a rare race condition where we have a sequence of events 
 like this:
 1. org.apache.hadoop.shell.Ls calls DFS#getFileStatus on directory D.
 2. someone deletes or moves directory D
 3. org.apache.hadoop.shell.Ls calls PathData#getDirectoryContents(D), which 
 calls DFS#listStatus(D). This throws FileNotFoundException.
 4. ls command terminates with FNF



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-5546) race condition crashes hadoop ls -R when directories are moved/removed

2014-06-25 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14043956#comment-14043956
 ] 

Colin Patrick McCabe commented on HDFS-5546:


I agree with a lot of the stuff that's been presented, but I also think our 
behavior should be consistent beween {{ls /a1/b /a2/b}} and {{ls 
/a\{1,2\}/b}}, and right now I can't see a good way to achieve that if we 
catch IOE (since the globber does not catch IOE)  On the other hand, if we 
catch FNF and continue if a top-level directory disappears on us, then we are 
making things more consistent, since the globber catches and ignores IOEs (when 
dealing with globs).

bq. Colin Patrick McCabe shouldn't the globStatus() be out of scope for this 
JIRA? Maybe we should open another related JIRA?

I'm not sure how the globber would report IOE other than throwing it.  We'd 
have to return a list of {{OptionFileStatus, IOException}} or something?  It 
doesn't seem like the kind of change that could be made compatibly, since we'd 
need a new interface.

So overall I would lean towards just catching FNF at the top-level, like the 
earlier patch did.  And maybe revisiting this later if we have better ideas 
about how to handle this in the globber as well.  [~daryn], [~eddyxu], does 
that make sense?  Or am I trying too hard to be consistent? :)

 race condition crashes hadoop ls -R when directories are moved/removed
 

 Key: HDFS-5546
 URL: https://issues.apache.org/jira/browse/HDFS-5546
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.2.0
Reporter: Colin Patrick McCabe
Assignee: Lei (Eddy) Xu
Priority: Minor
 Fix For: 3.0.0

 Attachments: HDFS-5546.1.patch, HDFS-5546.2.000.patch, 
 HDFS-5546.2.001.patch, HDFS-5546.2.002.patch, HDFS-5546.2.003.patch, 
 HDFS-5546.2.004.patch


 This seems to be a rare race condition where we have a sequence of events 
 like this:
 1. org.apache.hadoop.shell.Ls calls DFS#getFileStatus on directory D.
 2. someone deletes or moves directory D
 3. org.apache.hadoop.shell.Ls calls PathData#getDirectoryContents(D), which 
 calls DFS#listStatus(D). This throws FileNotFoundException.
 4. ls command terminates with FNF



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-5546) race condition crashes hadoop ls -R when directories are moved/removed

2014-06-24 Thread Lei (Eddy) Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14042442#comment-14042442
 ] 

Lei (Eddy) Xu commented on HDFS-5546:
-

[~daryn] Would you mind to take another look on this patch?  Thank you very 
much!

 race condition crashes hadoop ls -R when directories are moved/removed
 

 Key: HDFS-5546
 URL: https://issues.apache.org/jira/browse/HDFS-5546
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.2.0
Reporter: Colin Patrick McCabe
Assignee: Lei (Eddy) Xu
Priority: Minor
 Fix For: 3.0.0

 Attachments: HDFS-5546.1.patch, HDFS-5546.2.000.patch, 
 HDFS-5546.2.001.patch, HDFS-5546.2.002.patch, HDFS-5546.2.003.patch, 
 HDFS-5546.2.004.patch


 This seems to be a rare race condition where we have a sequence of events 
 like this:
 1. org.apache.hadoop.shell.Ls calls DFS#getFileStatus on directory D.
 2. someone deletes or moves directory D
 3. org.apache.hadoop.shell.Ls calls PathData#getDirectoryContents(D), which 
 calls DFS#listStatus(D). This throws FileNotFoundException.
 4. ls command terminates with FNF



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-5546) race condition crashes hadoop ls -R when directories are moved/removed

2014-06-23 Thread Lei (Eddy) Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041020#comment-14041020
 ] 

Lei (Eddy) Xu commented on HDFS-5546:
-

Maybe I misunderstand this JIRA. If printing FNF exception during printing out 
ls information is normal behavior as what {{/bin/ls}} do, the current {{trunk}} 
works correctly and thus it does not need to be fixed. 

 race condition crashes hadoop ls -R when directories are moved/removed
 

 Key: HDFS-5546
 URL: https://issues.apache.org/jira/browse/HDFS-5546
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.2.0
Reporter: Colin Patrick McCabe
Assignee: Lei (Eddy) Xu
Priority: Minor
 Fix For: 3.0.0

 Attachments: HDFS-5546.1.patch, HDFS-5546.2.000.patch, 
 HDFS-5546.2.001.patch, HDFS-5546.2.002.patch, HDFS-5546.2.003.patch


 This seems to be a rare race condition where we have a sequence of events 
 like this:
 1. org.apache.hadoop.shell.Ls calls DFS#getFileStatus on directory D.
 2. someone deletes or moves directory D
 3. org.apache.hadoop.shell.Ls calls PathData#getDirectoryContents(D), which 
 calls DFS#listStatus(D). This throws FileNotFoundException.
 4. ls command terminates with FNF



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-5546) race condition crashes hadoop ls -R when directories are moved/removed

2014-06-23 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041159#comment-14041159
 ] 

Colin Patrick McCabe commented on HDFS-5546:


I think what Daryn is advocating is that when attempting to recurse into a 
directory, we should catch IOE for the {{listStatus}} operation, not just FNF.

Although this makes sense to me, there is a bit of a fly in the ointment-- if 
we have a glob expression like {{/\*/\*}}, the Globber internally will throw an 
exception if there is a path error while resolving the globs.  For example, if 
you have {{/a/b/c}} and {{/a/r/c}}, and /a/r is inaccessible to you, {{ls 
/\*/\*/c}} will fail with an {{AccessControlException}} before displaying 
anything.

This behavior has existed basically forever in the globber code (it wasn't 
added by the globber rewrite) and unfortunately, there is no good way to fix it 
now.  The problem is that there is no way to indicate that we got an error 
other than throwing an exception, and an exception terminates the whole glob 
operation, even if there were other valid results.  So in the interest of 
consistency, perhaps we should keep things the way they are, and only catch 
FNF?  {{ls /a/b/c /a/r/c}} seems similar conceptually to {{ls /\*/\*/c}}... it 
is tricky to explain why an exception should terminate one but not the other...

Eddy, can you take a look at the internal JIRA that prompted this and see if it 
was user error?  I'm less and less convinced we should change {{ls -R}}...

 race condition crashes hadoop ls -R when directories are moved/removed
 

 Key: HDFS-5546
 URL: https://issues.apache.org/jira/browse/HDFS-5546
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.2.0
Reporter: Colin Patrick McCabe
Assignee: Lei (Eddy) Xu
Priority: Minor
 Fix For: 3.0.0

 Attachments: HDFS-5546.1.patch, HDFS-5546.2.000.patch, 
 HDFS-5546.2.001.patch, HDFS-5546.2.002.patch, HDFS-5546.2.003.patch


 This seems to be a rare race condition where we have a sequence of events 
 like this:
 1. org.apache.hadoop.shell.Ls calls DFS#getFileStatus on directory D.
 2. someone deletes or moves directory D
 3. org.apache.hadoop.shell.Ls calls PathData#getDirectoryContents(D), which 
 calls DFS#listStatus(D). This throws FileNotFoundException.
 4. ls command terminates with FNF



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-5546) race condition crashes hadoop ls -R when directories are moved/removed

2014-06-23 Thread Lei (Eddy) Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041352#comment-14041352
 ] 

Lei (Eddy) Xu commented on HDFS-5546:
-

[~daryn] was right on this one, we should just replace FNF to IOException in 
the first patch. Two test cases to verify the expected behaviors are added 
though.

[~cmccabe] shouldn't the {{globStatus()}} be out of scope for this JIRA? Maybe 
we should open another related JIRA?



 race condition crashes hadoop ls -R when directories are moved/removed
 

 Key: HDFS-5546
 URL: https://issues.apache.org/jira/browse/HDFS-5546
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.2.0
Reporter: Colin Patrick McCabe
Assignee: Lei (Eddy) Xu
Priority: Minor
 Fix For: 3.0.0

 Attachments: HDFS-5546.1.patch, HDFS-5546.2.000.patch, 
 HDFS-5546.2.001.patch, HDFS-5546.2.002.patch, HDFS-5546.2.003.patch, 
 HDFS-5546.2.004.patch


 This seems to be a rare race condition where we have a sequence of events 
 like this:
 1. org.apache.hadoop.shell.Ls calls DFS#getFileStatus on directory D.
 2. someone deletes or moves directory D
 3. org.apache.hadoop.shell.Ls calls PathData#getDirectoryContents(D), which 
 calls DFS#listStatus(D). This throws FileNotFoundException.
 4. ls command terminates with FNF



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-5546) race condition crashes hadoop ls -R when directories are moved/removed

2014-06-23 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041420#comment-14041420
 ] 

Hadoop QA commented on HDFS-5546:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12652070/HDFS-5546.2.004.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-common-project/hadoop-common:

  org.apache.hadoop.ha.TestZKFailoverController
  org.apache.hadoop.ha.TestZKFailoverControllerStress

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/7216//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7216//console

This message is automatically generated.

 race condition crashes hadoop ls -R when directories are moved/removed
 

 Key: HDFS-5546
 URL: https://issues.apache.org/jira/browse/HDFS-5546
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.2.0
Reporter: Colin Patrick McCabe
Assignee: Lei (Eddy) Xu
Priority: Minor
 Fix For: 3.0.0

 Attachments: HDFS-5546.1.patch, HDFS-5546.2.000.patch, 
 HDFS-5546.2.001.patch, HDFS-5546.2.002.patch, HDFS-5546.2.003.patch, 
 HDFS-5546.2.004.patch


 This seems to be a rare race condition where we have a sequence of events 
 like this:
 1. org.apache.hadoop.shell.Ls calls DFS#getFileStatus on directory D.
 2. someone deletes or moves directory D
 3. org.apache.hadoop.shell.Ls calls PathData#getDirectoryContents(D), which 
 calls DFS#listStatus(D). This throws FileNotFoundException.
 4. ls command terminates with FNF



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-5546) race condition crashes hadoop ls -R when directories are moved/removed

2014-06-20 Thread Daryn Sharp (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14038848#comment-14038848
 ] 

Daryn Sharp commented on HDFS-5546:
---

{{displayError}} does not terminate the operation.  It just prints the 
exception.  The even numbered directories are listed because they existed at 
the time the {{listStatus}} returned.  This is expected behavior so there's no 
need to re-test existence before displaying.  As Colin pointed out, ls is not 
and cannot be a perfect snapshot in time.

The current patch prints an awkward warning (non-posix warning - *nix ls 
doesn't display this) when a command line path is removed.  That's inconsistent 
with the non-race output when the path never existed.  All that's needed is the 
first patch tweaked to catch IOE, not just FNF because it's not the only 
exception that may occur.  This will make ls consistent with all other commands.

 race condition crashes hadoop ls -R when directories are moved/removed
 

 Key: HDFS-5546
 URL: https://issues.apache.org/jira/browse/HDFS-5546
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.2.0
Reporter: Colin Patrick McCabe
Assignee: Lei (Eddy) Xu
Priority: Minor
 Fix For: 3.0.0

 Attachments: HDFS-5546.1.patch, HDFS-5546.2.000.patch, 
 HDFS-5546.2.001.patch, HDFS-5546.2.002.patch, HDFS-5546.2.003.patch


 This seems to be a rare race condition where we have a sequence of events 
 like this:
 1. org.apache.hadoop.shell.Ls calls DFS#getFileStatus on directory D.
 2. someone deletes or moves directory D
 3. org.apache.hadoop.shell.Ls calls PathData#getDirectoryContents(D), which 
 calls DFS#listStatus(D). This throws FileNotFoundException.
 4. ls command terminates with FNF



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-5546) race condition crashes hadoop ls -R when directories are moved/removed

2014-06-20 Thread Lei (Eddy) Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14039008#comment-14039008
 ] 

Lei (Eddy) Xu commented on HDFS-5546:
-

Thanks [~daryn]. You are right. {{ls}} should not print the Warning message.

Just one question, would it be preferable to stop iterating a namespace when 
the first FNF happens, as what the first patch did? I thought {{/bin/ls -R}} 
printing the rest of the namespace even some files are deleted during the 
execution? 

 race condition crashes hadoop ls -R when directories are moved/removed
 

 Key: HDFS-5546
 URL: https://issues.apache.org/jira/browse/HDFS-5546
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.2.0
Reporter: Colin Patrick McCabe
Assignee: Lei (Eddy) Xu
Priority: Minor
 Fix For: 3.0.0

 Attachments: HDFS-5546.1.patch, HDFS-5546.2.000.patch, 
 HDFS-5546.2.001.patch, HDFS-5546.2.002.patch, HDFS-5546.2.003.patch


 This seems to be a rare race condition where we have a sequence of events 
 like this:
 1. org.apache.hadoop.shell.Ls calls DFS#getFileStatus on directory D.
 2. someone deletes or moves directory D
 3. org.apache.hadoop.shell.Ls calls PathData#getDirectoryContents(D), which 
 calls DFS#listStatus(D). This throws FileNotFoundException.
 4. ls command terminates with FNF



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-5546) race condition crashes hadoop ls -R when directories are moved/removed

2014-06-20 Thread Daryn Sharp (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14039131#comment-14039131
 ] 

Daryn Sharp commented on HDFS-5546:
---

I'm not sure I understand the question.  The first patch didn't stop.  It just 
displayed the FNF for the command line path and kept going.  If it traps IOE 
instead, I think that's all we need.

 race condition crashes hadoop ls -R when directories are moved/removed
 

 Key: HDFS-5546
 URL: https://issues.apache.org/jira/browse/HDFS-5546
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.2.0
Reporter: Colin Patrick McCabe
Assignee: Lei (Eddy) Xu
Priority: Minor
 Fix For: 3.0.0

 Attachments: HDFS-5546.1.patch, HDFS-5546.2.000.patch, 
 HDFS-5546.2.001.patch, HDFS-5546.2.002.patch, HDFS-5546.2.003.patch


 This seems to be a rare race condition where we have a sequence of events 
 like this:
 1. org.apache.hadoop.shell.Ls calls DFS#getFileStatus on directory D.
 2. someone deletes or moves directory D
 3. org.apache.hadoop.shell.Ls calls PathData#getDirectoryContents(D), which 
 calls DFS#listStatus(D). This throws FileNotFoundException.
 4. ls command terminates with FNF



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-5546) race condition crashes hadoop ls -R when directories are moved/removed

2014-06-20 Thread Lei (Eddy) Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14039140#comment-14039140
 ] 

Lei (Eddy) Xu commented on HDFS-5546:
-

My question is that, suppose we have directories {{/test/0..10}} and 
{{/other/1..10}}, and deleted {{/test/1}} during execution. The first patch 
will only print {{/test/0}} and {{/other/1..10}}, so that {{/test/2..10}} are 
ignored in this case.



 race condition crashes hadoop ls -R when directories are moved/removed
 

 Key: HDFS-5546
 URL: https://issues.apache.org/jira/browse/HDFS-5546
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.2.0
Reporter: Colin Patrick McCabe
Assignee: Lei (Eddy) Xu
Priority: Minor
 Fix For: 3.0.0

 Attachments: HDFS-5546.1.patch, HDFS-5546.2.000.patch, 
 HDFS-5546.2.001.patch, HDFS-5546.2.002.patch, HDFS-5546.2.003.patch


 This seems to be a rare race condition where we have a sequence of events 
 like this:
 1. org.apache.hadoop.shell.Ls calls DFS#getFileStatus on directory D.
 2. someone deletes or moves directory D
 3. org.apache.hadoop.shell.Ls calls PathData#getDirectoryContents(D), which 
 calls DFS#listStatus(D). This throws FileNotFoundException.
 4. ls command terminates with FNF



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-5546) race condition crashes hadoop ls -R when directories are moved/removed

2014-06-20 Thread Daryn Sharp (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14039207#comment-14039207
 ] 

Daryn Sharp commented on HDFS-5546:
---

In your example, what are you listing? hadoop fs -ls -R / or hadoop fs -ls 
-R /test /other?  Why will it stop?  The exception will be caught, displayed, 
and it moves to the next item.

BTW, checked gnu coreutils and ls will immediately emit a warning about a path 
that disappears, and eventually exit with status 1 when done which matches 
FsShell's behavior.

{noformat}
/* Exit statuses.  */
enum
  {
/* ls had a minor problem.  E.g., while processing a directory,
   ls obtained the name of an entry via readdir, yet was later
   unable to stat that name.  This happens when listing a directory
   in which entries are actively being removed or renamed.  */
LS_MINOR_PROBLEM = 1,
{noformat}



 race condition crashes hadoop ls -R when directories are moved/removed
 

 Key: HDFS-5546
 URL: https://issues.apache.org/jira/browse/HDFS-5546
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.2.0
Reporter: Colin Patrick McCabe
Assignee: Lei (Eddy) Xu
Priority: Minor
 Fix For: 3.0.0

 Attachments: HDFS-5546.1.patch, HDFS-5546.2.000.patch, 
 HDFS-5546.2.001.patch, HDFS-5546.2.002.patch, HDFS-5546.2.003.patch


 This seems to be a rare race condition where we have a sequence of events 
 like this:
 1. org.apache.hadoop.shell.Ls calls DFS#getFileStatus on directory D.
 2. someone deletes or moves directory D
 3. org.apache.hadoop.shell.Ls calls PathData#getDirectoryContents(D), which 
 calls DFS#listStatus(D). This throws FileNotFoundException.
 4. ls command terminates with FNF



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-5546) race condition crashes hadoop ls -R when directories are moved/removed

2014-06-19 Thread Daryn Sharp (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14037346#comment-14037346
 ] 

Daryn Sharp commented on HDFS-5546:
---

I took a look at the source.  Ls doesn't stop during a descent.  It appears to 
only prematurely abort when invoked with multiple args and one of those args 
disappears after the command starts.  Ex.  hadoop fs -ls /dir1   If /dir1 
disappears, then it aborts w/o processing subsequent paths which I agree is 
wrong.  Displaying FNF and returning non-zero is the correct behavior though 
since the directory was explicitly requested to be listed.

The first patch appears to be the correct solution, but instead of catching 
only FNF and displaying the error, I think it should catch IOE.

 race condition crashes hadoop ls -R when directories are moved/removed
 

 Key: HDFS-5546
 URL: https://issues.apache.org/jira/browse/HDFS-5546
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.2.0
Reporter: Colin Patrick McCabe
Assignee: Lei (Eddy) Xu
Priority: Minor
 Fix For: 3.0.0

 Attachments: HDFS-5546.1.patch, HDFS-5546.2.000.patch, 
 HDFS-5546.2.001.patch, HDFS-5546.2.002.patch


 This seems to be a rare race condition where we have a sequence of events 
 like this:
 1. org.apache.hadoop.shell.Ls calls DFS#getFileStatus on directory D.
 2. someone deletes or moves directory D
 3. org.apache.hadoop.shell.Ls calls PathData#getDirectoryContents(D), which 
 calls DFS#listStatus(D). This throws FileNotFoundException.
 4. ls command terminates with FNF



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-5546) race condition crashes hadoop ls -R when directories are moved/removed

2014-06-19 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14037978#comment-14037978
 ] 

Colin Patrick McCabe commented on HDFS-5546:


Hmm.  I think you're right about this.  {{Command#processPaths}} wraps the call 
to {{recursePath}} in an IOE try... catch block.  So if we can't recurse into 
an individual child, we should still be able to move forward with the other 
ones.  Of course, there is no such protection for the paths specified on the 
command-line.  I tried looking for all the places IOE could be thrown, but 
didn't manage to spot one that would abort the recursion because of a problem 
with a child.  Eddy, can you run your unit test against trunk to verify all 
this?

 race condition crashes hadoop ls -R when directories are moved/removed
 

 Key: HDFS-5546
 URL: https://issues.apache.org/jira/browse/HDFS-5546
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.2.0
Reporter: Colin Patrick McCabe
Assignee: Lei (Eddy) Xu
Priority: Minor
 Fix For: 3.0.0

 Attachments: HDFS-5546.1.patch, HDFS-5546.2.000.patch, 
 HDFS-5546.2.001.patch, HDFS-5546.2.002.patch


 This seems to be a rare race condition where we have a sequence of events 
 like this:
 1. org.apache.hadoop.shell.Ls calls DFS#getFileStatus on directory D.
 2. someone deletes or moves directory D
 3. org.apache.hadoop.shell.Ls calls PathData#getDirectoryContents(D), which 
 calls DFS#listStatus(D). This throws FileNotFoundException.
 4. ls command terminates with FNF



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-5546) race condition crashes hadoop ls -R when directories are moved/removed

2014-06-19 Thread Lei (Eddy) Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14037999#comment-14037999
 ] 

Lei (Eddy) Xu commented on HDFS-5546:
-

It appears to me that the actual behavior is that the FNF exceptions are 
printed by {{Command#displayError(e)}} in the middle of LS output, which looks 
like that it terminates `LS` because of FNF. I will write a test case to verify 
this. 

I am actually working on a change that only prints out existing 
directories/files and puts the captured FNF exception to 
{{Command#exceptions}}. In the end of {{LS}}, it checks whether there is FNF in 
{{exceptions}} and prints a human readable message if there is one or more FNF. 
What do you think, [~daryn] and [~cmccabe]?

 race condition crashes hadoop ls -R when directories are moved/removed
 

 Key: HDFS-5546
 URL: https://issues.apache.org/jira/browse/HDFS-5546
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.2.0
Reporter: Colin Patrick McCabe
Assignee: Lei (Eddy) Xu
Priority: Minor
 Fix For: 3.0.0

 Attachments: HDFS-5546.1.patch, HDFS-5546.2.000.patch, 
 HDFS-5546.2.001.patch, HDFS-5546.2.002.patch


 This seems to be a rare race condition where we have a sequence of events 
 like this:
 1. org.apache.hadoop.shell.Ls calls DFS#getFileStatus on directory D.
 2. someone deletes or moves directory D
 3. org.apache.hadoop.shell.Ls calls PathData#getDirectoryContents(D), which 
 calls DFS#listStatus(D). This throws FileNotFoundException.
 4. ls command terminates with FNF



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-5546) race condition crashes hadoop ls -R when directories are moved/removed

2014-06-19 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14038289#comment-14038289
 ] 

Hadoop QA commented on HDFS-5546:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12651573/HDFS-5546.2.003.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-common-project/hadoop-common.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/7184//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7184//console

This message is automatically generated.

 race condition crashes hadoop ls -R when directories are moved/removed
 

 Key: HDFS-5546
 URL: https://issues.apache.org/jira/browse/HDFS-5546
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.2.0
Reporter: Colin Patrick McCabe
Assignee: Lei (Eddy) Xu
Priority: Minor
 Fix For: 3.0.0

 Attachments: HDFS-5546.1.patch, HDFS-5546.2.000.patch, 
 HDFS-5546.2.001.patch, HDFS-5546.2.002.patch, HDFS-5546.2.003.patch


 This seems to be a rare race condition where we have a sequence of events 
 like this:
 1. org.apache.hadoop.shell.Ls calls DFS#getFileStatus on directory D.
 2. someone deletes or moves directory D
 3. org.apache.hadoop.shell.Ls calls PathData#getDirectoryContents(D), which 
 calls DFS#listStatus(D). This throws FileNotFoundException.
 4. ls command terminates with FNF



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-5546) race condition crashes hadoop ls -R when directories are moved/removed

2014-06-18 Thread Lei (Eddy) Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14034879#comment-14034879
 ] 

Lei (Eddy) Xu commented on HDFS-5546:
-

[~daryn] Thank you very much for your detailed comments and thank [~cmccabe] 
for explanation!

As this patch deals with a _very rare_ case and is just for the CLI command 
{{ls}}, I would prefer avoiding crashing user oriented CLI program to a little 
bit heavier load for NN in such a rare case. Moreover, the overhead of looking 
up non-existed files/dirs on NN in this rare case is not more than a normal 
{{ls}} on this namespace, without deleting sub-directories.

If there is a data point to back up how rare such an race condition will occur, 
it would be great for us to justify this design though.

Maybe we should print more verbose messages and tell the users re-run `ls -R`?

 race condition crashes hadoop ls -R when directories are moved/removed
 

 Key: HDFS-5546
 URL: https://issues.apache.org/jira/browse/HDFS-5546
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.2.0
Reporter: Colin Patrick McCabe
Assignee: Lei (Eddy) Xu
Priority: Minor
 Fix For: 3.0.0

 Attachments: HDFS-5546.1.patch, HDFS-5546.2.000.patch, 
 HDFS-5546.2.001.patch, HDFS-5546.2.002.patch


 This seems to be a rare race condition where we have a sequence of events 
 like this:
 1. org.apache.hadoop.shell.Ls calls DFS#getFileStatus on directory D.
 2. someone deletes or moves directory D
 3. org.apache.hadoop.shell.Ls calls PathData#getDirectoryContents(D), which 
 calls DFS#listStatus(D). This throws FileNotFoundException.
 4. ls command terminates with FNF



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-5546) race condition crashes hadoop ls -R when directories are moved/removed

2014-06-18 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14036087#comment-14036087
 ] 

Colin Patrick McCabe commented on HDFS-5546:


bq. Maybe we should print more verbose messages and tell the users re-run `ls 
-R`?

If you run {{/bin/ls -R}}, it doesn't fail with a warning message if someone is 
making changes to the directories you're listing.  I think we should stick to 
trying to implement that behavior, which is what users expect.

 race condition crashes hadoop ls -R when directories are moved/removed
 

 Key: HDFS-5546
 URL: https://issues.apache.org/jira/browse/HDFS-5546
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.2.0
Reporter: Colin Patrick McCabe
Assignee: Lei (Eddy) Xu
Priority: Minor
 Fix For: 3.0.0

 Attachments: HDFS-5546.1.patch, HDFS-5546.2.000.patch, 
 HDFS-5546.2.001.patch, HDFS-5546.2.002.patch


 This seems to be a rare race condition where we have a sequence of events 
 like this:
 1. org.apache.hadoop.shell.Ls calls DFS#getFileStatus on directory D.
 2. someone deletes or moves directory D
 3. org.apache.hadoop.shell.Ls calls PathData#getDirectoryContents(D), which 
 calls DFS#listStatus(D). This throws FileNotFoundException.
 4. ls command terminates with FNF



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-5546) race condition crashes hadoop ls -R when directories are moved/removed

2014-06-17 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14034535#comment-14034535
 ] 

Hadoop QA commented on HDFS-5546:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12650888/HDFS-5546.2.001.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-common-project/hadoop-common:

  org.apache.hadoop.fs.TestPath
  org.apache.hadoop.fs.shell.TestPathData

  The following test timeouts occurred in 
hadoop-common-project/hadoop-common:

org.apache.hadoop.http.TestHttpServer

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/7151//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7151//console

This message is automatically generated.

 race condition crashes hadoop ls -R when directories are moved/removed
 

 Key: HDFS-5546
 URL: https://issues.apache.org/jira/browse/HDFS-5546
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.2.0
Reporter: Colin Patrick McCabe
Assignee: Kousuke Saruta
Priority: Minor
 Fix For: 3.0.0

 Attachments: HDFS-5546.1.patch, HDFS-5546.2.000.patch, 
 HDFS-5546.2.001.patch


 This seems to be a rare race condition where we have a sequence of events 
 like this:
 1. org.apache.hadoop.shell.Ls calls DFS#getFileStatus on directory D.
 2. someone deletes or moves directory D
 3. org.apache.hadoop.shell.Ls calls PathData#getDirectoryContents(D), which 
 calls DFS#listStatus(D). This throws FileNotFoundException.
 4. ls command terminates with FNF



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-5546) race condition crashes hadoop ls -R when directories are moved/removed

2014-06-17 Thread Daryn Sharp (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14034630#comment-14034630
 ] 

Daryn Sharp commented on HDFS-5546:
---

This is not an acceptable patch.  It's not ok to swallow the FNF and return a 
success exit code.

Maybe I'm misunderstanding the description.  Is this jira only trying to 
address a tiny race if the path existed when the command started, but 
disappeared before being listed?  If yes, then FNF is exactly the correct 
behavior.  After that, the stats you see being checked in the code are supposed 
to be from listStatus.

If you are trying to make ls always forge ahead when it gets FNF while in a 
subdir, that has some peril associated with it.  What if the item being listed 
isn't what was deleted?  What if an ancestor directory was deleted?  Should ls 
keep pounding on the NN to list every directory it thinks should be there?  And 
then as it ascends back up the tree should keep trying to list other siblings 
it thinks should be there?

 race condition crashes hadoop ls -R when directories are moved/removed
 

 Key: HDFS-5546
 URL: https://issues.apache.org/jira/browse/HDFS-5546
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.2.0
Reporter: Colin Patrick McCabe
Assignee: Lei (Eddy) Xu
Priority: Minor
 Fix For: 3.0.0

 Attachments: HDFS-5546.1.patch, HDFS-5546.2.000.patch, 
 HDFS-5546.2.001.patch, HDFS-5546.2.002.patch


 This seems to be a rare race condition where we have a sequence of events 
 like this:
 1. org.apache.hadoop.shell.Ls calls DFS#getFileStatus on directory D.
 2. someone deletes or moves directory D
 3. org.apache.hadoop.shell.Ls calls PathData#getDirectoryContents(D), which 
 calls DFS#listStatus(D). This throws FileNotFoundException.
 4. ls command terminates with FNF



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-5546) race condition crashes hadoop ls -R when directories are moved/removed

2014-06-17 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14034671#comment-14034671
 ] 

Hadoop QA commented on HDFS-5546:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12650918/HDFS-5546.2.002.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-common-project/hadoop-common.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/7154//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7154//console

This message is automatically generated.

 race condition crashes hadoop ls -R when directories are moved/removed
 

 Key: HDFS-5546
 URL: https://issues.apache.org/jira/browse/HDFS-5546
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.2.0
Reporter: Colin Patrick McCabe
Assignee: Lei (Eddy) Xu
Priority: Minor
 Fix For: 3.0.0

 Attachments: HDFS-5546.1.patch, HDFS-5546.2.000.patch, 
 HDFS-5546.2.001.patch, HDFS-5546.2.002.patch


 This seems to be a rare race condition where we have a sequence of events 
 like this:
 1. org.apache.hadoop.shell.Ls calls DFS#getFileStatus on directory D.
 2. someone deletes or moves directory D
 3. org.apache.hadoop.shell.Ls calls PathData#getDirectoryContents(D), which 
 calls DFS#listStatus(D). This throws FileNotFoundException.
 4. ls command terminates with FNF



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-5546) race condition crashes hadoop ls -R when directories are moved/removed

2014-06-17 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14034712#comment-14034712
 ] 

Colin Patrick McCabe commented on HDFS-5546:


bq. Maybe I'm misunderstanding the description. Is this jira only trying to 
address a tiny race if the path existed when the command started, but 
disappeared before being listed? If yes, then FNF is exactly the correct 
behavior. After that, the stats you see being checked in the code are supposed 
to be from listStatus.

The problem is that right now, we have a race between getting back a directory 
entry from {{listStatus}} on the parent directory, and calling {{listStatus}} 
on it.  Think of the following interleaving:

1. Eddy issues hadoop fs -ls -R /
2. ls command calls {{listStatus( /)}} and gets back status_a, status_b, 
status_c
3. ls command uses status_a to print out a line describing /a
4. Colin removes directory a
5. ls command calls {{listStatus(/a)}}
6. {{FileNotFoundException}} aborts the whole ls command.  Nothing else is 
printed.

Basically, this makes the {{ls -R}} command unusable in situations where files 
are changing.  From a user's perspective, this just translates to {{ls -R}} is 
broken since you effectively can't really use it.

bq. If you are trying to make ls always forge ahead when it gets FNF while in a 
subdir, that has some peril associated with it. What if the item being listed 
isn't what was deleted? What if an ancestor directory was deleted? Should ls 
keep pounding on the NN to list every directory it thinks should be there? And 
then as it ascends back up the tree should keep trying to list other siblings 
it thinks should be there?

We're never going to be able to provide a 100% consistent view of the 
filesystem via {{ls -R}}.  HDFS simply doesn't have a way of getting back a 
snapshot an entire subtree (well, except HDFS snapshots, which I think we can 
all agree are overkill here.).  You are going to need multiple calls to 
{{listDir}}, and things may change in between those calls.  These are just the 
facts of life, and something we have to accept.

After all, we can't even get back a 100% consistent view of a single large 
directory via {{listStatus}}.  Large directories will need multiple 
{{listStatus}} RPC calls in between and something may have changed in between 
RPCs.  The {{/bin/ls}} command on UNIX has similar issues.  But clearly, 
despite the lack of snapshot consistency, people do find {{ls}} to be a useful 
command, though.

Unless I'm missing something, there is no major harm if we just do forge ahead 
and try to call {{listStatus}} on subdirectories we retrieved from the previous 
{{listStatus}} call.  The worst that can happen is we try to list something 
that isn't there and get an FNF which we ignore.  We could also print out the 
FNFs, but I'm not sure what the user would do with this information.

bq. This is not an acceptable patch. It's not ok to swallow the FNF and return 
a success exit code.

We still throw an FNF if the directory where {{ls -R}} starts doesn't exist.  
It's just that we don't shut down the whole enterprise if something underneath 
that directory changes during our recursion.  Does that make sense?

 race condition crashes hadoop ls -R when directories are moved/removed
 

 Key: HDFS-5546
 URL: https://issues.apache.org/jira/browse/HDFS-5546
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.2.0
Reporter: Colin Patrick McCabe
Assignee: Lei (Eddy) Xu
Priority: Minor
 Fix For: 3.0.0

 Attachments: HDFS-5546.1.patch, HDFS-5546.2.000.patch, 
 HDFS-5546.2.001.patch, HDFS-5546.2.002.patch


 This seems to be a rare race condition where we have a sequence of events 
 like this:
 1. org.apache.hadoop.shell.Ls calls DFS#getFileStatus on directory D.
 2. someone deletes or moves directory D
 3. org.apache.hadoop.shell.Ls calls PathData#getDirectoryContents(D), which 
 calls DFS#listStatus(D). This throws FileNotFoundException.
 4. ls command terminates with FNF



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-5546) race condition crashes hadoop ls -R when directories are moved/removed

2014-06-16 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14032895#comment-14032895
 ] 

Colin Patrick McCabe commented on HDFS-5546:


I think we need a unit test to go with this that tests a few threads doing 
{{mkdir}} and {{remove}} while another thread is using {{Ls}}

 race condition crashes hadoop ls -R when directories are moved/removed
 

 Key: HDFS-5546
 URL: https://issues.apache.org/jira/browse/HDFS-5546
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.2.0
Reporter: Colin Patrick McCabe
Assignee: Kousuke Saruta
Priority: Minor
 Attachments: HDFS-5546.1.patch, HDFS-5546.2.000.patch


 This seems to be a rare race condition where we have a sequence of events 
 like this:
 1. org.apache.hadoop.shell.Ls calls DFS#getFileStatus on directory D.
 2. someone deletes or moves directory D
 3. org.apache.hadoop.shell.Ls calls PathData#getDirectoryContents(D), which 
 calls DFS#listStatus(D). This throws FileNotFoundException.
 4. ls command terminates with FNF



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-5546) race condition crashes hadoop ls -R when directories are moved/removed

2014-06-16 Thread Lei (Eddy) Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14033150#comment-14033150
 ] 

Lei (Eddy) Xu commented on HDFS-5546:
-

Thanks [~cmccabe]. Would it be nondeterministic to use multithreads to test 
these race conditions?

 race condition crashes hadoop ls -R when directories are moved/removed
 

 Key: HDFS-5546
 URL: https://issues.apache.org/jira/browse/HDFS-5546
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.2.0
Reporter: Colin Patrick McCabe
Assignee: Kousuke Saruta
Priority: Minor
 Attachments: HDFS-5546.1.patch, HDFS-5546.2.000.patch


 This seems to be a rare race condition where we have a sequence of events 
 like this:
 1. org.apache.hadoop.shell.Ls calls DFS#getFileStatus on directory D.
 2. someone deletes or moves directory D
 3. org.apache.hadoop.shell.Ls calls PathData#getDirectoryContents(D), which 
 calls DFS#listStatus(D). This throws FileNotFoundException.
 4. ls command terminates with FNF



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-5546) race condition crashes hadoop ls -R when directories are moved/removed

2014-06-16 Thread Lei (Eddy) Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14033240#comment-14033240
 ] 

Lei (Eddy) Xu commented on HDFS-5546:
-

I will try to use MockFileSystem to test this issue. 

 race condition crashes hadoop ls -R when directories are moved/removed
 

 Key: HDFS-5546
 URL: https://issues.apache.org/jira/browse/HDFS-5546
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.2.0
Reporter: Colin Patrick McCabe
Assignee: Kousuke Saruta
Priority: Minor
 Attachments: HDFS-5546.1.patch, HDFS-5546.2.000.patch


 This seems to be a rare race condition where we have a sequence of events 
 like this:
 1. org.apache.hadoop.shell.Ls calls DFS#getFileStatus on directory D.
 2. someone deletes or moves directory D
 3. org.apache.hadoop.shell.Ls calls PathData#getDirectoryContents(D), which 
 calls DFS#listStatus(D). This throws FileNotFoundException.
 4. ls command terminates with FNF



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-5546) race condition crashes hadoop ls -R when directories are moved/removed

2014-06-16 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14033393#comment-14033393
 ] 

Colin Patrick McCabe commented on HDFS-5546:


It's nondeterministic in the sense that the unit test might not fail on buggy 
code 100% of the time.  It should not be nondeterministic in the sense that it 
fails on good code :)

 race condition crashes hadoop ls -R when directories are moved/removed
 

 Key: HDFS-5546
 URL: https://issues.apache.org/jira/browse/HDFS-5546
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.2.0
Reporter: Colin Patrick McCabe
Assignee: Kousuke Saruta
Priority: Minor
 Attachments: HDFS-5546.1.patch, HDFS-5546.2.000.patch


 This seems to be a rare race condition where we have a sequence of events 
 like this:
 1. org.apache.hadoop.shell.Ls calls DFS#getFileStatus on directory D.
 2. someone deletes or moves directory D
 3. org.apache.hadoop.shell.Ls calls PathData#getDirectoryContents(D), which 
 calls DFS#listStatus(D). This throws FileNotFoundException.
 4. ls command terminates with FNF



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-5546) race condition crashes hadoop ls -R when directories are moved/removed

2013-11-22 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13830353#comment-13830353
 ] 

Colin Patrick McCabe commented on HDFS-5546:


{code}
@@ -86,7 +87,11 @@ protected void processOptions(LinkedListString args)
   protected void processPathArgument(PathData item) throws IOException {
 // implicitly recurse once for cmdline directories
 if (dirRecurse  item.stat.isDirectory()) {
-  recursePath(item);
+  try {
+recursePath(item);
+  } catch (FileNotFoundException e){
+displayError(e);
+  } 
 } else {
   super.processPathArgument(item);
 }
{code}

This will result in the first moved/removed file aborting the entire recursive 
ls with an error message.  Basically, the same behavior as now, only with a 
process exit code of 0 rather than nonzero.  That's not what we want.

 race condition crashes hadoop ls -R when directories are moved/removed
 

 Key: HDFS-5546
 URL: https://issues.apache.org/jira/browse/HDFS-5546
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.2.0
Reporter: Colin Patrick McCabe
Priority: Minor
 Attachments: HDFS-5546.1.patch


 This seems to be a rare race condition where we have a sequence of events 
 like this:
 1. org.apache.hadoop.shell.Ls calls DFS#getFileStatus on directory D.
 2. someone deletes or moves directory D
 3. org.apache.hadoop.shell.Ls calls PathData#getDirectoryContents(D), which 
 calls DFS#listStatus(D). This throws FileNotFoundException.
 4. ls command terminates with FNF



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HDFS-5546) race condition crashes hadoop ls -R when directories are moved/removed

2013-11-22 Thread Kousuke Saruta (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13830366#comment-13830366
 ] 

Kousuke Saruta commented on HDFS-5546:
--

I see, and I will try to modify that.

 race condition crashes hadoop ls -R when directories are moved/removed
 

 Key: HDFS-5546
 URL: https://issues.apache.org/jira/browse/HDFS-5546
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.2.0
Reporter: Colin Patrick McCabe
Priority: Minor
 Attachments: HDFS-5546.1.patch


 This seems to be a rare race condition where we have a sequence of events 
 like this:
 1. org.apache.hadoop.shell.Ls calls DFS#getFileStatus on directory D.
 2. someone deletes or moves directory D
 3. org.apache.hadoop.shell.Ls calls PathData#getDirectoryContents(D), which 
 calls DFS#listStatus(D). This throws FileNotFoundException.
 4. ls command terminates with FNF



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HDFS-5546) race condition crashes hadoop ls -R when directories are moved/removed

2013-11-22 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13830418#comment-13830418
 ] 

Colin Patrick McCabe commented on HDFS-5546:


Thanks, Kousuke.  I think the goal is to have it continue, ignoring the failure 
to stat that entry.  This will be a bit tricky since when listing a single 
file, we can't ignore that error.  It probably makes sense to print out a 
warning at the end, as well.

 race condition crashes hadoop ls -R when directories are moved/removed
 

 Key: HDFS-5546
 URL: https://issues.apache.org/jira/browse/HDFS-5546
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.2.0
Reporter: Colin Patrick McCabe
Priority: Minor
 Attachments: HDFS-5546.1.patch


 This seems to be a rare race condition where we have a sequence of events 
 like this:
 1. org.apache.hadoop.shell.Ls calls DFS#getFileStatus on directory D.
 2. someone deletes or moves directory D
 3. org.apache.hadoop.shell.Ls calls PathData#getDirectoryContents(D), which 
 calls DFS#listStatus(D). This throws FileNotFoundException.
 4. ls command terminates with FNF



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HDFS-5546) race condition crashes hadoop ls -R when directories are moved/removed

2013-11-21 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13829261#comment-13829261
 ] 

Colin Patrick McCabe commented on HDFS-5546:


The best solution is probably to catch the FNF in #3, and simply not put that 
directory in the listing, since it doesn't exist by that name any more. I guess 
you could argue that we should re-list the parent directory in this case to 
make sure new stuff doesn't exist in it (like a renamed version of D), but that 
seems like it would open a difficult can of worms since we'd have arbitrary 
levels of backtracking. Also, we can't really know whether any of the work we 
had already done is still valid, since the names of directories could all have 
changed.

 race condition crashes hadoop ls -R when directories are moved/removed
 

 Key: HDFS-5546
 URL: https://issues.apache.org/jira/browse/HDFS-5546
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.2.0
Reporter: Colin Patrick McCabe
Priority: Minor

 This seems to be a rare race condition where we have a sequence of events 
 like this:
 1. org.apache.hadoop.shell.Ls calls DFS#getFileStatus on directory D.
 2. someone deletes or moves directory D
 3. org.apache.hadoop.shell.Ls calls PathData#getDirectoryContents(D), which 
 calls DFS#listStatus(D). This throws FileNotFoundException.
 4. ls command terminates with FNF



--
This message was sent by Atlassian JIRA
(v6.1#6144)