[
https://issues.apache.org/jira/browse/HADOOP-5836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12709509#action_12709509
]
Ian Nowland commented on HADOOP-5836:
-------------------------------------
The main fix here is to check for and just not return this empty file in
listStatus(). However along with this, I broadened handling in all S3N methods
for the different ways of designating directories in S3, in this way:
* A note about directories. S3 of course has no "native" support for them.
* The idiom we choose then is: for any directory created by this class,
* we use an empty object "#{dirpath}_$folder$" as a marker.
* Further, to interoperate with other S3 tools, we also accept the following:
* - an object "#{dirpath}/' denoting a directory marker
* - if there exists any objects with the prefix "#{dirpath}/", then the
* directory is said to exist
* - if both a file with the name of a directory and a marker for that
* directory exists, then the *file masks the directory*, and the directory
* is never returned.
In particular this meant fixing delete() and rename() to handle all three
possible meanings of directory without failing.
This patch also includes the following:
- Add logging any time a file in S3 is accessed for read or write, so
when you get failure accessing/using a file its name will be in the task log
- Fix when opening a file for reading which doesn't exist, change the
behavior to immediately throw a FileNotFoundException, rather than returning a
hard to debug NPE later when the file is closed.
- Rewrite rename so that it only deletes the source files after every
destination file has been written, so you never end up with half the files in
each location
- Set up retryer so rename automatically retries on S3 errors.
> Bug in S3N handling of directory markers using an object with a trailing "/"
> causes jobs to fail
> ------------------------------------------------------------------------------------------------
>
> Key: HADOOP-5836
> URL: https://issues.apache.org/jira/browse/HADOOP-5836
> Project: Hadoop Core
> Issue Type: Bug
> Components: fs/s3
> Affects Versions: 0.18.3
> Reporter: Ian Nowland
>
> Some tools which upload to S3 and use a object terminated with a "/" as a
> directory marker, for instance "s3n://mybucket/mydir/". If asked to iterate
> that "directory" via listStatus(), then the current code will return an empty
> file "", which the InputFormatter happily assigns to a split, and which later
> causes a task to fail, and probably the job to fail.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.