[jira] [Commented] (MAPREDUCE-3638) Yarn trying to download cacheFile to container but Path is a local file

Philip Su (Commented) (JIRA) Mon, 30 Jan 2012 13:23:33 -0800

    [ 
https://issues.apache.org/jira/browse/MAPREDUCE-3638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13196427#comment-13196427
 ]


Philip Su commented on MAPREDUCE-3638:
--------------------------------------

I did some more follow up testing on this and I think I know more precisely 
where the problem is. 

1) The failure occurs when running a streaming job with the -cacheFile option 
on a local file system using file:///<path>. 
2) I ran hdfs dfs -ls file:///<path> to make sure the file exists. 
3) I ran the same streaming job using the same value from 1). But instead of 
using the deprecated -cacheFile option, I used -files instead. The job ran and 
passed. 

So is seems when running the streaming job using the deprecated option 
-cacheFile on a local file system, it is not getting the correct file 
permission on it. 

                
> Yarn trying to download cacheFile to container but Path is a local file
> -----------------------------------------------------------------------
>
>                 Key: MAPREDUCE-3638
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3638
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv2
>    Affects Versions: 0.23.0
>            Reporter: Thomas Graves
>            Assignee: Mahadev konar
>
> It looks like the AM, which is running on
> host1.com, is trying to access a local file but the file is on host2.com
> (where the command was run).
> ran:
> hadoop --config conf/hadoop/ 
> jar hadoop-streaming.jar          -Dmapreduce.job.acl-view-job=*   
>       -input Streaming/streaming-610/input.txt           -mapper 'xargs cat'  
>          -reducer cat          -output
> Streaming/streaming-610/Output          -cacheFile
> file://Streaming/data/streaming-610//InputFile#testlink
>          -jobconf mapred.map.tasks=1           -jobconf mapred.reduce.tasks=1 
>          -jobconf
> mapred.job.name=streamingTest-610          -jobconf 
> mapreduce.job.acl-view-job=*
> failure:
> 11/11/10 07:48:06 INFO mapreduce.Job: Job job_1320887371559_0215 failed with 
> state FAILED due to: Application
> application_1320887371559_0215 failed 1 times due to AM Container for 
> appattempt_1320887371559_0215_000001 exited with 
> exitCode: -1000 due to: java.io.FileNotFoundException: File
> file:/Streaming/data/streaming-610/InputFile
> does not exist
>         at 
> org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:431)
>         at 
> org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:315)
>         at org.apache.hadoop.yarn.util.FSDownload.copy(FSDownload.java:85)
>         at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:152)
>         at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:50)
>         at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>         at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
>         at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>         at java.lang.Thread.run(Thread.java:619)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-3638) Yarn trying to download cacheFile to container but Path is a local file

Reply via email to