[ 
https://issues.apache.org/jira/browse/HDFS-5546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14041159#comment-14041159
 ] 

Colin Patrick McCabe commented on HDFS-5546:
--------------------------------------------

I think what Daryn is advocating is that when attempting to recurse into a 
directory, we should catch IOE for the {{listStatus}} operation, not just FNF.

Although this makes sense to me, there is a bit of a fly in the ointment-- if 
we have a glob expression like {{/\*/\*}}, the Globber internally will throw an 
exception if there is a path error while resolving the globs.  For example, if 
you have {{/a/b/c}} and {{/a/r/c}}, and /a/r is inaccessible to you, {{ls 
/\*/\*/c}} will fail with an {{AccessControlException}} before displaying 
anything.

This behavior has existed basically forever in the globber code (it wasn't 
added by the globber rewrite) and unfortunately, there is no good way to fix it 
now.  The problem is that there is no way to indicate that we got an error 
other than throwing an exception, and an exception terminates the whole glob 
operation, even if there were other valid results.  So in the interest of 
consistency, perhaps we should keep things the way they are, and only catch 
FNF?  {{ls /a/b/c /a/r/c}} seems similar conceptually to {{ls /\*/\*/c}}... it 
is tricky to explain why an exception should terminate one but not the other...

Eddy, can you take a look at the internal JIRA that prompted this and see if it 
was user error?  I'm less and less convinced we should change {{ls -R}}...

> race condition crashes "hadoop ls -R" when directories are moved/removed
> ------------------------------------------------------------------------
>
>                 Key: HDFS-5546
>                 URL: https://issues.apache.org/jira/browse/HDFS-5546
>             Project: Hadoop HDFS
>          Issue Type: Bug
>    Affects Versions: 2.2.0
>            Reporter: Colin Patrick McCabe
>            Assignee: Lei (Eddy) Xu
>            Priority: Minor
>             Fix For: 3.0.0
>
>         Attachments: HDFS-5546.1.patch, HDFS-5546.2.000.patch, 
> HDFS-5546.2.001.patch, HDFS-5546.2.002.patch, HDFS-5546.2.003.patch
>
>
> This seems to be a rare race condition where we have a sequence of events 
> like this:
> 1. org.apache.hadoop.shell.Ls calls DFS#getFileStatus on directory D.
> 2. someone deletes or moves directory D
> 3. org.apache.hadoop.shell.Ls calls PathData#getDirectoryContents(D), which 
> calls DFS#listStatus(D). This throws FileNotFoundException.
> 4. ls command terminates with FNF



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to