[jira] [Commented] (HADOOP-11487) FileNotFound on distcp to s3n/s3a due to creation inconsistency

2017-06-16 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-11487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16051749#comment-16051749
 ] 

Steve Loughran commented on HADOOP-11487:
-

The specific codepath here is addressed by HADOOP-13145; other inconsistency 
issues are different and addresssed elsewhere (HADOOP-13345, HADOOP-13786). 
Closing as fixed

> FileNotFound on distcp to s3n/s3a due to creation inconsistency 
> 
>
> Key: HADOOP-11487
> URL: https://issues.apache.org/jira/browse/HADOOP-11487
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs, fs/s3
>Affects Versions: 2.7.2
>Reporter: Paulo Motta
>Assignee: John Zhuge
> Fix For: 2.8.0
>
>
> I'm trying to copy a large amount of files from HDFS to S3 via distcp and I'm 
> getting the following exception:
> {code:java}
> 2015-01-16 20:53:18,187 ERROR [main] 
> org.apache.hadoop.tools.mapred.CopyMapper: Failure in copying 
> hdfs://10.165.35.216/hdfsFolder/file.gz to s3n://s3-bucket/file.gz
> java.io.FileNotFoundException: No such file or directory 
> 's3n://s3-bucket/file.gz'
>   at 
> org.apache.hadoop.fs.s3native.NativeS3FileSystem.getFileStatus(NativeS3FileSystem.java:445)
>   at 
> org.apache.hadoop.tools.util.DistCpUtils.preserve(DistCpUtils.java:187)
>   at org.apache.hadoop.tools.mapred.CopyMapper.map(CopyMapper.java:233)
>   at org.apache.hadoop.tools.mapred.CopyMapper.map(CopyMapper.java:45)
>   at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
>   at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
>   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
>   at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
>   at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)
> 2015-01-16 20:53:18,276 WARN [main] org.apache.hadoop.mapred.YarnChild: 
> Exception running child : java.io.FileNotFoundException: No such file or 
> directory 's3n://s3-bucket/file.gz'
>   at 
> org.apache.hadoop.fs.s3native.NativeS3FileSystem.getFileStatus(NativeS3FileSystem.java:445)
>   at 
> org.apache.hadoop.tools.util.DistCpUtils.preserve(DistCpUtils.java:187)
>   at org.apache.hadoop.tools.mapred.CopyMapper.map(CopyMapper.java:233)
>   at org.apache.hadoop.tools.mapred.CopyMapper.map(CopyMapper.java:45)
>   at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
>   at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
>   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
>   at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
>   at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)
> {code}
> However, when I try hadoop fs -ls s3n://s3-bucket/file.gz the file is there. 
> So probably due to Amazon's S3 eventual consistency the job failure.
> In my opinion, in order to fix this problem NativeS3FileSystem.getFileStatus 
> must use fs.s3.maxRetries property in order to avoid failures like this.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-11487) FileNotFound on distcp to s3n/s3a due to creation inconsistency

2017-01-09 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-11487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15811707#comment-15811707
 ] 

Steve Loughran commented on HADOOP-11487:
-

Linking to HADOOP-13950, as it may be a lower cost solution.

AWS caches negative GETs on a path, and FS.create() does an existence check as 
part of its getFileStatus call, which is always done to look for the dest being 
a directory.

if overwrite=true, we don't care if a dir exists, so only need the 2 directory 
probes, not the direct path HEAD, so will skip the possibility of a -ve HEAD 
result being cached. This *may* be all that is needed

> FileNotFound on distcp to s3n/s3a due to creation inconsistency 
> 
>
> Key: HADOOP-11487
> URL: https://issues.apache.org/jira/browse/HADOOP-11487
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs, fs/s3
>Affects Versions: 2.7.2
>Reporter: Paulo Motta
>Assignee: John Zhuge
>
> I'm trying to copy a large amount of files from HDFS to S3 via distcp and I'm 
> getting the following exception:
> {code:java}
> 2015-01-16 20:53:18,187 ERROR [main] 
> org.apache.hadoop.tools.mapred.CopyMapper: Failure in copying 
> hdfs://10.165.35.216/hdfsFolder/file.gz to s3n://s3-bucket/file.gz
> java.io.FileNotFoundException: No such file or directory 
> 's3n://s3-bucket/file.gz'
>   at 
> org.apache.hadoop.fs.s3native.NativeS3FileSystem.getFileStatus(NativeS3FileSystem.java:445)
>   at 
> org.apache.hadoop.tools.util.DistCpUtils.preserve(DistCpUtils.java:187)
>   at org.apache.hadoop.tools.mapred.CopyMapper.map(CopyMapper.java:233)
>   at org.apache.hadoop.tools.mapred.CopyMapper.map(CopyMapper.java:45)
>   at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
>   at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
>   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
>   at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
>   at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)
> 2015-01-16 20:53:18,276 WARN [main] org.apache.hadoop.mapred.YarnChild: 
> Exception running child : java.io.FileNotFoundException: No such file or 
> directory 's3n://s3-bucket/file.gz'
>   at 
> org.apache.hadoop.fs.s3native.NativeS3FileSystem.getFileStatus(NativeS3FileSystem.java:445)
>   at 
> org.apache.hadoop.tools.util.DistCpUtils.preserve(DistCpUtils.java:187)
>   at org.apache.hadoop.tools.mapred.CopyMapper.map(CopyMapper.java:233)
>   at org.apache.hadoop.tools.mapred.CopyMapper.map(CopyMapper.java:45)
>   at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
>   at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
>   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
>   at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
>   at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)
> {code}
> However, when I try hadoop fs -ls s3n://s3-bucket/file.gz the file is there. 
> So probably due to Amazon's S3 eventual consistency the job failure.
> In my opinion, in order to fix this problem NativeS3FileSystem.getFileStatus 
> must use fs.s3.maxRetries property in order to avoid failures like this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-11487) FileNotFound on distcp to s3n/s3a due to creation inconsistency

2016-07-21 Thread $iddhe$h Divekar (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-11487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15388499#comment-15388499
 ] 

$iddhe$h Divekar commented on HADOOP-11487:
---

Np, will start watching it.
Thanks for the help.

> FileNotFound on distcp to s3n/s3a due to creation inconsistency 
> 
>
> Key: HADOOP-11487
> URL: https://issues.apache.org/jira/browse/HADOOP-11487
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs, fs/s3
>Affects Versions: 2.7.2
>Reporter: Paulo Motta
>
> I'm trying to copy a large amount of files from HDFS to S3 via distcp and I'm 
> getting the following exception:
> {code:java}
> 2015-01-16 20:53:18,187 ERROR [main] 
> org.apache.hadoop.tools.mapred.CopyMapper: Failure in copying 
> hdfs://10.165.35.216/hdfsFolder/file.gz to s3n://s3-bucket/file.gz
> java.io.FileNotFoundException: No such file or directory 
> 's3n://s3-bucket/file.gz'
>   at 
> org.apache.hadoop.fs.s3native.NativeS3FileSystem.getFileStatus(NativeS3FileSystem.java:445)
>   at 
> org.apache.hadoop.tools.util.DistCpUtils.preserve(DistCpUtils.java:187)
>   at org.apache.hadoop.tools.mapred.CopyMapper.map(CopyMapper.java:233)
>   at org.apache.hadoop.tools.mapred.CopyMapper.map(CopyMapper.java:45)
>   at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
>   at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
>   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
>   at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
>   at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)
> 2015-01-16 20:53:18,276 WARN [main] org.apache.hadoop.mapred.YarnChild: 
> Exception running child : java.io.FileNotFoundException: No such file or 
> directory 's3n://s3-bucket/file.gz'
>   at 
> org.apache.hadoop.fs.s3native.NativeS3FileSystem.getFileStatus(NativeS3FileSystem.java:445)
>   at 
> org.apache.hadoop.tools.util.DistCpUtils.preserve(DistCpUtils.java:187)
>   at org.apache.hadoop.tools.mapred.CopyMapper.map(CopyMapper.java:233)
>   at org.apache.hadoop.tools.mapred.CopyMapper.map(CopyMapper.java:45)
>   at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
>   at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
>   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
>   at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
>   at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)
> {code}
> However, when I try hadoop fs -ls s3n://s3-bucket/file.gz the file is there. 
> So probably due to Amazon's S3 eventual consistency the job failure.
> In my opinion, in order to fix this problem NativeS3FileSystem.getFileStatus 
> must use fs.s3.maxRetries property in order to avoid failures like this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-11487) FileNotFound on distcp to s3n/s3a due to creation inconsistency

2016-07-21 Thread Chris Nauroth (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-11487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15388497#comment-15388497
 ] 

Chris Nauroth commented on HADOOP-11487:


bq. Is patch for HADOOP-13345 available for general use ?

No, that's just a prototype right now.  It's still under development.  I can't 
recommend running it.

> FileNotFound on distcp to s3n/s3a due to creation inconsistency 
> 
>
> Key: HADOOP-11487
> URL: https://issues.apache.org/jira/browse/HADOOP-11487
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs, fs/s3
>Affects Versions: 2.7.2
>Reporter: Paulo Motta
>
> I'm trying to copy a large amount of files from HDFS to S3 via distcp and I'm 
> getting the following exception:
> {code:java}
> 2015-01-16 20:53:18,187 ERROR [main] 
> org.apache.hadoop.tools.mapred.CopyMapper: Failure in copying 
> hdfs://10.165.35.216/hdfsFolder/file.gz to s3n://s3-bucket/file.gz
> java.io.FileNotFoundException: No such file or directory 
> 's3n://s3-bucket/file.gz'
>   at 
> org.apache.hadoop.fs.s3native.NativeS3FileSystem.getFileStatus(NativeS3FileSystem.java:445)
>   at 
> org.apache.hadoop.tools.util.DistCpUtils.preserve(DistCpUtils.java:187)
>   at org.apache.hadoop.tools.mapred.CopyMapper.map(CopyMapper.java:233)
>   at org.apache.hadoop.tools.mapred.CopyMapper.map(CopyMapper.java:45)
>   at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
>   at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
>   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
>   at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
>   at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)
> 2015-01-16 20:53:18,276 WARN [main] org.apache.hadoop.mapred.YarnChild: 
> Exception running child : java.io.FileNotFoundException: No such file or 
> directory 's3n://s3-bucket/file.gz'
>   at 
> org.apache.hadoop.fs.s3native.NativeS3FileSystem.getFileStatus(NativeS3FileSystem.java:445)
>   at 
> org.apache.hadoop.tools.util.DistCpUtils.preserve(DistCpUtils.java:187)
>   at org.apache.hadoop.tools.mapred.CopyMapper.map(CopyMapper.java:233)
>   at org.apache.hadoop.tools.mapred.CopyMapper.map(CopyMapper.java:45)
>   at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
>   at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
>   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
>   at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
>   at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)
> {code}
> However, when I try hadoop fs -ls s3n://s3-bucket/file.gz the file is there. 
> So probably due to Amazon's S3 eventual consistency the job failure.
> In my opinion, in order to fix this problem NativeS3FileSystem.getFileStatus 
> must use fs.s3.maxRetries property in order to avoid failures like this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-11487) FileNotFound on distcp to s3n/s3a due to creation inconsistency

2016-07-21 Thread $iddhe$h Divekar (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-11487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15388474#comment-15388474
 ] 

$iddhe$h Divekar commented on HADOOP-11487:
---

Cool thanks, will take a look.
Is patch for HADOOP-13345 available for general use ?

> FileNotFound on distcp to s3n/s3a due to creation inconsistency 
> 
>
> Key: HADOOP-11487
> URL: https://issues.apache.org/jira/browse/HADOOP-11487
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs, fs/s3
>Affects Versions: 2.7.2
>Reporter: Paulo Motta
>
> I'm trying to copy a large amount of files from HDFS to S3 via distcp and I'm 
> getting the following exception:
> {code:java}
> 2015-01-16 20:53:18,187 ERROR [main] 
> org.apache.hadoop.tools.mapred.CopyMapper: Failure in copying 
> hdfs://10.165.35.216/hdfsFolder/file.gz to s3n://s3-bucket/file.gz
> java.io.FileNotFoundException: No such file or directory 
> 's3n://s3-bucket/file.gz'
>   at 
> org.apache.hadoop.fs.s3native.NativeS3FileSystem.getFileStatus(NativeS3FileSystem.java:445)
>   at 
> org.apache.hadoop.tools.util.DistCpUtils.preserve(DistCpUtils.java:187)
>   at org.apache.hadoop.tools.mapred.CopyMapper.map(CopyMapper.java:233)
>   at org.apache.hadoop.tools.mapred.CopyMapper.map(CopyMapper.java:45)
>   at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
>   at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
>   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
>   at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
>   at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)
> 2015-01-16 20:53:18,276 WARN [main] org.apache.hadoop.mapred.YarnChild: 
> Exception running child : java.io.FileNotFoundException: No such file or 
> directory 's3n://s3-bucket/file.gz'
>   at 
> org.apache.hadoop.fs.s3native.NativeS3FileSystem.getFileStatus(NativeS3FileSystem.java:445)
>   at 
> org.apache.hadoop.tools.util.DistCpUtils.preserve(DistCpUtils.java:187)
>   at org.apache.hadoop.tools.mapred.CopyMapper.map(CopyMapper.java:233)
>   at org.apache.hadoop.tools.mapred.CopyMapper.map(CopyMapper.java:45)
>   at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
>   at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
>   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
>   at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
>   at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)
> {code}
> However, when I try hadoop fs -ls s3n://s3-bucket/file.gz the file is there. 
> So probably due to Amazon's S3 eventual consistency the job failure.
> In my opinion, in order to fix this problem NativeS3FileSystem.getFileStatus 
> must use fs.s3.maxRetries property in order to avoid failures like this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-11487) FileNotFound on distcp to s3n/s3a due to creation inconsistency

2016-07-21 Thread Chris Nauroth (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-11487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15388426#comment-15388426
 ] 

Chris Nauroth commented on HADOOP-11487:


bq. Does listStatus falls outside above consistency ?

Yes, it does.  {{FileSystem#listStatus}} maps to an operation listing the keys 
in an S3 bucket.  For that listing operation, the consistency model you quoted 
does not apply.  Instead, it follows an eventual consistency model.  There may 
be propagation delays between creating a key and that key becoming visible in 
listings.

There are more details on this behavior in the AWS S3 consistency model doc:

http://docs.aws.amazon.com/AmazonS3/latest/dev/Introduction.html#ConsistencyModel


> FileNotFound on distcp to s3n/s3a due to creation inconsistency 
> 
>
> Key: HADOOP-11487
> URL: https://issues.apache.org/jira/browse/HADOOP-11487
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs, fs/s3
>Affects Versions: 2.7.2
>Reporter: Paulo Motta
>
> I'm trying to copy a large amount of files from HDFS to S3 via distcp and I'm 
> getting the following exception:
> {code:java}
> 2015-01-16 20:53:18,187 ERROR [main] 
> org.apache.hadoop.tools.mapred.CopyMapper: Failure in copying 
> hdfs://10.165.35.216/hdfsFolder/file.gz to s3n://s3-bucket/file.gz
> java.io.FileNotFoundException: No such file or directory 
> 's3n://s3-bucket/file.gz'
>   at 
> org.apache.hadoop.fs.s3native.NativeS3FileSystem.getFileStatus(NativeS3FileSystem.java:445)
>   at 
> org.apache.hadoop.tools.util.DistCpUtils.preserve(DistCpUtils.java:187)
>   at org.apache.hadoop.tools.mapred.CopyMapper.map(CopyMapper.java:233)
>   at org.apache.hadoop.tools.mapred.CopyMapper.map(CopyMapper.java:45)
>   at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
>   at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
>   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
>   at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
>   at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)
> 2015-01-16 20:53:18,276 WARN [main] org.apache.hadoop.mapred.YarnChild: 
> Exception running child : java.io.FileNotFoundException: No such file or 
> directory 's3n://s3-bucket/file.gz'
>   at 
> org.apache.hadoop.fs.s3native.NativeS3FileSystem.getFileStatus(NativeS3FileSystem.java:445)
>   at 
> org.apache.hadoop.tools.util.DistCpUtils.preserve(DistCpUtils.java:187)
>   at org.apache.hadoop.tools.mapred.CopyMapper.map(CopyMapper.java:233)
>   at org.apache.hadoop.tools.mapred.CopyMapper.map(CopyMapper.java:45)
>   at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
>   at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
>   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
>   at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
>   at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)
> {code}
> However, when I try hadoop fs -ls s3n://s3-bucket/file.gz the file is there. 
> So probably due to Amazon's S3 eventual consistency the job failure.
> In my opinion, in order to fix this problem NativeS3FileSystem.getFileStatus 
> must use fs.s3.maxRetries property in order to avoid failures like this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-11487) FileNotFound on distcp to s3n/s3a due to creation inconsistency

2016-07-21 Thread $iddhe$h Divekar (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-11487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15388417#comment-15388417
 ] 

$iddhe$h Divekar commented on HADOOP-11487:
---

Hi Chris,
Thanks for replying.
As per AWS forum all of the S3 regions now support read-after-write consistency 
for new objects added to Amazon s3.
https://forums.aws.amazon.com/ann.jspa?annID=3112
Does listStatus falls outside above consistency ?

For Hadoop 2.7 we started using s3a as per spark recommendations but
but after moving to s3a we started using 3x degradation, hence moved backed to 
s3n.

When will be the patch available for general use ?

> FileNotFound on distcp to s3n/s3a due to creation inconsistency 
> 
>
> Key: HADOOP-11487
> URL: https://issues.apache.org/jira/browse/HADOOP-11487
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs, fs/s3
>Affects Versions: 2.7.2
>Reporter: Paulo Motta
>
> I'm trying to copy a large amount of files from HDFS to S3 via distcp and I'm 
> getting the following exception:
> {code:java}
> 2015-01-16 20:53:18,187 ERROR [main] 
> org.apache.hadoop.tools.mapred.CopyMapper: Failure in copying 
> hdfs://10.165.35.216/hdfsFolder/file.gz to s3n://s3-bucket/file.gz
> java.io.FileNotFoundException: No such file or directory 
> 's3n://s3-bucket/file.gz'
>   at 
> org.apache.hadoop.fs.s3native.NativeS3FileSystem.getFileStatus(NativeS3FileSystem.java:445)
>   at 
> org.apache.hadoop.tools.util.DistCpUtils.preserve(DistCpUtils.java:187)
>   at org.apache.hadoop.tools.mapred.CopyMapper.map(CopyMapper.java:233)
>   at org.apache.hadoop.tools.mapred.CopyMapper.map(CopyMapper.java:45)
>   at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
>   at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
>   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
>   at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
>   at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)
> 2015-01-16 20:53:18,276 WARN [main] org.apache.hadoop.mapred.YarnChild: 
> Exception running child : java.io.FileNotFoundException: No such file or 
> directory 's3n://s3-bucket/file.gz'
>   at 
> org.apache.hadoop.fs.s3native.NativeS3FileSystem.getFileStatus(NativeS3FileSystem.java:445)
>   at 
> org.apache.hadoop.tools.util.DistCpUtils.preserve(DistCpUtils.java:187)
>   at org.apache.hadoop.tools.mapred.CopyMapper.map(CopyMapper.java:233)
>   at org.apache.hadoop.tools.mapred.CopyMapper.map(CopyMapper.java:45)
>   at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
>   at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
>   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
>   at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
>   at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)
> {code}
> However, when I try hadoop fs -ls s3n://s3-bucket/file.gz the file is there. 
> So probably due to Amazon's S3 eventual consistency the job failure.
> In my opinion, in order to fix this problem NativeS3FileSystem.getFileStatus 
> must use fs.s3.maxRetries property in order to avoid failures like this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-11487) FileNotFound on distcp to s3n/s3a due to creation inconsistency

2016-07-21 Thread Chris Nauroth (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-11487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15388286#comment-15388286
 ] 

Chris Nauroth commented on HADOOP-11487:


Hello [~$iddhe$h].  The stack trace indicates a problem during a 
{{FileSystem#listStatus}} call.  The listing calls against S3 are subject to 
eventual consistency.  The goals of the S3Guard project, tracked in issue 
HADOOP-13345, would help address this scenario.  However, please note that this 
effort is targeted to the S3A file system, which is where our ongoing 
development effort on Hadoop S3 integration is happening.  (Your stack trace 
indicates you are currently using S3N.)

> FileNotFound on distcp to s3n/s3a due to creation inconsistency 
> 
>
> Key: HADOOP-11487
> URL: https://issues.apache.org/jira/browse/HADOOP-11487
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs, fs/s3
>Affects Versions: 2.7.2
>Reporter: Paulo Motta
>
> I'm trying to copy a large amount of files from HDFS to S3 via distcp and I'm 
> getting the following exception:
> {code:java}
> 2015-01-16 20:53:18,187 ERROR [main] 
> org.apache.hadoop.tools.mapred.CopyMapper: Failure in copying 
> hdfs://10.165.35.216/hdfsFolder/file.gz to s3n://s3-bucket/file.gz
> java.io.FileNotFoundException: No such file or directory 
> 's3n://s3-bucket/file.gz'
>   at 
> org.apache.hadoop.fs.s3native.NativeS3FileSystem.getFileStatus(NativeS3FileSystem.java:445)
>   at 
> org.apache.hadoop.tools.util.DistCpUtils.preserve(DistCpUtils.java:187)
>   at org.apache.hadoop.tools.mapred.CopyMapper.map(CopyMapper.java:233)
>   at org.apache.hadoop.tools.mapred.CopyMapper.map(CopyMapper.java:45)
>   at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
>   at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
>   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
>   at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
>   at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)
> 2015-01-16 20:53:18,276 WARN [main] org.apache.hadoop.mapred.YarnChild: 
> Exception running child : java.io.FileNotFoundException: No such file or 
> directory 's3n://s3-bucket/file.gz'
>   at 
> org.apache.hadoop.fs.s3native.NativeS3FileSystem.getFileStatus(NativeS3FileSystem.java:445)
>   at 
> org.apache.hadoop.tools.util.DistCpUtils.preserve(DistCpUtils.java:187)
>   at org.apache.hadoop.tools.mapred.CopyMapper.map(CopyMapper.java:233)
>   at org.apache.hadoop.tools.mapred.CopyMapper.map(CopyMapper.java:45)
>   at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
>   at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
>   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
>   at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
>   at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)
> {code}
> However, when I try hadoop fs -ls s3n://s3-bucket/file.gz the file is there. 
> So probably due to Amazon's S3 eventual consistency the job failure.
> In my opinion, in order to fix this problem NativeS3FileSystem.getFileStatus 
> must use fs.s3.maxRetries property in order to avoid failures like this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-11487) FileNotFound on distcp to s3n/s3a due to creation inconsistency

2016-07-21 Thread $iddhe$h Divekar (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-11487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15388227#comment-15388227
 ] 

$iddhe$h Divekar commented on HADOOP-11487:
---

Hi,
We are processing data on US west and still seeing consistency issue.
As per forums US west should not be having consistency issue but we
are doing update of a table. Not sure if 'read-after-write' consistency 
will take care of 'read-after-update' consistency also.

Will 9565 help us here.

Below is the back trace of the issue we are seeing.
org.apache.spark.SparkException: Job aborted.
at 
org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelation$$anonfun$run$1.apply$mcV$sp(InsertIntoHadoopFsRelation.scala:154)
at 
org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelation$$anonfun$run$1.apply(InsertIntoHadoopFsRelation.scala:106)
at 
org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelation$$anonfun$run$1.apply(InsertIntoHadoopFsRelation.scala:106)
at 
org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:56)
at 
org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelation.run(InsertIntoHadoopFsRelation.scala:106)
at 
org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult$lzycompute(commands.scala:58)
at 
org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult(commands.scala:56)
at 
org.apache.spark.sql.execution.ExecutedCommand.doExecute(commands.scala:70)
at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:132)
at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:130)
at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:130)
at 
org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:55)
at 
org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:55)
at 
org.apache.spark.sql.DataFrameWriter.insertInto(DataFrameWriter.scala:189)
at 
org.apache.spark.sql.DataFrameWriter.saveAsTable(DataFrameWriter.scala:239)
at 
org.apache.spark.sql.DataFrameWriter.saveAsTable(DataFrameWriter.scala:221)
at com.foo.vAnalytics.xyz_load$.main(xyz_load.scala:130)
at com.foo.vAnalytics.xyz_load.main(xyz_load.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
at 
org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
at org.apache.oozie.action.hadoop.SparkMain.runSpark(SparkMain.java:104)
at org.apache.oozie.action.hadoop.SparkMain.run(SparkMain.java:95)
at org.apache.oozie.action.hadoop.LauncherMain.run(LauncherMain.java:47)
at org.apache.oozie.action.hadoop.SparkMain.main(SparkMain.java:38)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
org.apache.oozie.action.hadoop.LauncherMapper.map(LauncherMapper.java:236)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:430)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342)
at 
org.apache.hadoop.mapred.LocalContainerLauncher$SubtaskRunner.runSubtask(LocalContainerLauncher.java:317)
at 
org.apache.hadoop.mapred.LocalContainerLauncher$SubtaskRunner.run(LocalContainerLauncher.java:232)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.FileNotFoundException: File 
s3n://foo-hive/warehouse/fooabcxyz0719/_temporary/0/task_201607210010_0005_m_41
 does not exist.
at 
org.apache.hadoop.fs.s3native.NativeS3FileSystem.listStatus(NativeS3FileSystem.java:506)
at 
org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.mergePaths(FileOutputCommitter.java:360)
at 
org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.commitJob(FileOutputCommitter.java:310)
at 
org.apache.p

[jira] [Commented] (HADOOP-11487) FileNotFound on distcp to s3n/s3a due to creation inconsistency

2016-05-13 Thread Chris Nauroth (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-11487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15283275#comment-15283275
 ] 

Chris Nauroth commented on HADOOP-11487:


I just attached a patch to HADOOP-13145 to prevent this {{getFileInfo}} call 
when DistCp is run without the {{-p}} option.  I suspect that patch can help us 
get past this problem with eventual consistency on DistCp to a destination on 
S3A, at least when DistCp is run without the {{-p}} option.

I filed that patch on a new JIRA instead of here, because the discussion here 
indicates that it may expand to a larger scope for addressing a wider set of 
eventual consistency concerns.  HADOOP-13145 is more of a spot performance 
enhancement.

> FileNotFound on distcp to s3n/s3a due to creation inconsistency 
> 
>
> Key: HADOOP-11487
> URL: https://issues.apache.org/jira/browse/HADOOP-11487
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs, fs/s3
>Affects Versions: 2.7.2
>Reporter: Paulo Motta
>
> I'm trying to copy a large amount of files from HDFS to S3 via distcp and I'm 
> getting the following exception:
> {code:java}
> 2015-01-16 20:53:18,187 ERROR [main] 
> org.apache.hadoop.tools.mapred.CopyMapper: Failure in copying 
> hdfs://10.165.35.216/hdfsFolder/file.gz to s3n://s3-bucket/file.gz
> java.io.FileNotFoundException: No such file or directory 
> 's3n://s3-bucket/file.gz'
>   at 
> org.apache.hadoop.fs.s3native.NativeS3FileSystem.getFileStatus(NativeS3FileSystem.java:445)
>   at 
> org.apache.hadoop.tools.util.DistCpUtils.preserve(DistCpUtils.java:187)
>   at org.apache.hadoop.tools.mapred.CopyMapper.map(CopyMapper.java:233)
>   at org.apache.hadoop.tools.mapred.CopyMapper.map(CopyMapper.java:45)
>   at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
>   at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
>   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
>   at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
>   at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)
> 2015-01-16 20:53:18,276 WARN [main] org.apache.hadoop.mapred.YarnChild: 
> Exception running child : java.io.FileNotFoundException: No such file or 
> directory 's3n://s3-bucket/file.gz'
>   at 
> org.apache.hadoop.fs.s3native.NativeS3FileSystem.getFileStatus(NativeS3FileSystem.java:445)
>   at 
> org.apache.hadoop.tools.util.DistCpUtils.preserve(DistCpUtils.java:187)
>   at org.apache.hadoop.tools.mapred.CopyMapper.map(CopyMapper.java:233)
>   at org.apache.hadoop.tools.mapred.CopyMapper.map(CopyMapper.java:45)
>   at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
>   at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
>   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
>   at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
>   at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)
> {code}
> However, when I try hadoop fs -ls s3n://s3-bucket/file.gz the file is there. 
> So probably due to Amazon's S3 eventual consistency the job failure.
> In my opinion, in order to fix this problem NativeS3FileSystem.getFileStatus 
> must use fs.s3.maxRetries property in order to avoid failures like this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-11487) FileNotFound on distcp to s3n/s3a due to creation inconsistency

2016-05-12 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-11487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15281472#comment-15281472
 ] 

Steve Loughran commented on HADOOP-11487:
-

eu-central is consistent, so its not create inconsistency. But s3 everywhere 
lags in listing consistency: if you GET a file it should be there, but if you 
list the pattern, it may not

> FileNotFound on distcp to s3n/s3a due to creation inconsistency 
> 
>
> Key: HADOOP-11487
> URL: https://issues.apache.org/jira/browse/HADOOP-11487
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs, fs/s3
>Reporter: Paulo Motta
>
> I'm trying to copy a large amount of files from HDFS to S3 via distcp and I'm 
> getting the following exception:
> {code:java}
> 2015-01-16 20:53:18,187 ERROR [main] 
> org.apache.hadoop.tools.mapred.CopyMapper: Failure in copying 
> hdfs://10.165.35.216/hdfsFolder/file.gz to s3n://s3-bucket/file.gz
> java.io.FileNotFoundException: No such file or directory 
> 's3n://s3-bucket/file.gz'
>   at 
> org.apache.hadoop.fs.s3native.NativeS3FileSystem.getFileStatus(NativeS3FileSystem.java:445)
>   at 
> org.apache.hadoop.tools.util.DistCpUtils.preserve(DistCpUtils.java:187)
>   at org.apache.hadoop.tools.mapred.CopyMapper.map(CopyMapper.java:233)
>   at org.apache.hadoop.tools.mapred.CopyMapper.map(CopyMapper.java:45)
>   at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
>   at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
>   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
>   at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
>   at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)
> 2015-01-16 20:53:18,276 WARN [main] org.apache.hadoop.mapred.YarnChild: 
> Exception running child : java.io.FileNotFoundException: No such file or 
> directory 's3n://s3-bucket/file.gz'
>   at 
> org.apache.hadoop.fs.s3native.NativeS3FileSystem.getFileStatus(NativeS3FileSystem.java:445)
>   at 
> org.apache.hadoop.tools.util.DistCpUtils.preserve(DistCpUtils.java:187)
>   at org.apache.hadoop.tools.mapred.CopyMapper.map(CopyMapper.java:233)
>   at org.apache.hadoop.tools.mapred.CopyMapper.map(CopyMapper.java:45)
>   at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
>   at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
>   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
>   at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
>   at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)
> {code}
> However, when I try hadoop fs -ls s3n://s3-bucket/file.gz the file is there. 
> So probably due to Amazon's S3 eventual consistency the job failure.
> In my opinion, in order to fix this problem NativeS3FileSystem.getFileStatus 
> must use fs.s3.maxRetries property in order to avoid failures like this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org