[jira] [Commented] (MAPREDUCE-3638) Yarn trying to download cacheFile to container but Path is a local file
[ https://issues.apache.org/jira/browse/MAPREDUCE-3638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13219605#comment-13219605 ] Ramya Sunil commented on MAPREDUCE-3638: cacheFile for local FS was never supported. cacheFile downloads files from HDFS only. This is a deprecated option and files option has to be used for downloading files from local FS. This is not an issue. Yarn trying to download cacheFile to container but Path is a local file --- Key: MAPREDUCE-3638 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3638 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv2 Affects Versions: 0.23.0 Reporter: Thomas Graves Assignee: Mahadev konar It looks like the AM, which is running on host1.com, is trying to access a local file but the file is on host2.com (where the command was run). ran: hadoop --config conf/hadoop/ jar hadoop-streaming.jar -Dmapreduce.job.acl-view-job=* -input Streaming/streaming-610/input.txt -mapper 'xargs cat' -reducer cat -output Streaming/streaming-610/Output -cacheFile file://Streaming/data/streaming-610//InputFile#testlink -jobconf mapred.map.tasks=1 -jobconf mapred.reduce.tasks=1 -jobconf mapred.job.name=streamingTest-610 -jobconf mapreduce.job.acl-view-job=* failure: 11/11/10 07:48:06 INFO mapreduce.Job: Job job_1320887371559_0215 failed with state FAILED due to: Application application_1320887371559_0215 failed 1 times due to AM Container for appattempt_1320887371559_0215_01 exited with exitCode: -1000 due to: java.io.FileNotFoundException: File file:/Streaming/data/streaming-610/InputFile does not exist at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:431) at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:315) at org.apache.hadoop.yarn.util.FSDownload.copy(FSDownload.java:85) at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:152) at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:50) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:619) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-3638) Yarn trying to download cacheFile to container but Path is a local file
[ https://issues.apache.org/jira/browse/MAPREDUCE-3638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13196427#comment-13196427 ] Philip Su commented on MAPREDUCE-3638: -- I did some more follow up testing on this and I think I know more precisely where the problem is. 1) The failure occurs when running a streaming job with the -cacheFile option on a local file system using file:///path. 2) I ran hdfs dfs -ls file:///path to make sure the file exists. 3) I ran the same streaming job using the same value from 1). But instead of using the deprecated -cacheFile option, I used -files instead. The job ran and passed. So is seems when running the streaming job using the deprecated option -cacheFile on a local file system, it is not getting the correct file permission on it. Yarn trying to download cacheFile to container but Path is a local file --- Key: MAPREDUCE-3638 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3638 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv2 Affects Versions: 0.23.0 Reporter: Thomas Graves Assignee: Mahadev konar It looks like the AM, which is running on host1.com, is trying to access a local file but the file is on host2.com (where the command was run). ran: hadoop --config conf/hadoop/ jar hadoop-streaming.jar -Dmapreduce.job.acl-view-job=* -input Streaming/streaming-610/input.txt -mapper 'xargs cat' -reducer cat -output Streaming/streaming-610/Output -cacheFile file://Streaming/data/streaming-610//InputFile#testlink -jobconf mapred.map.tasks=1 -jobconf mapred.reduce.tasks=1 -jobconf mapred.job.name=streamingTest-610 -jobconf mapreduce.job.acl-view-job=* failure: 11/11/10 07:48:06 INFO mapreduce.Job: Job job_1320887371559_0215 failed with state FAILED due to: Application application_1320887371559_0215 failed 1 times due to AM Container for appattempt_1320887371559_0215_01 exited with exitCode: -1000 due to: java.io.FileNotFoundException: File file:/Streaming/data/streaming-610/InputFile does not exist at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:431) at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:315) at org.apache.hadoop.yarn.util.FSDownload.copy(FSDownload.java:85) at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:152) at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:50) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:619) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-3638) Yarn trying to download cacheFile to container but Path is a local file
[ https://issues.apache.org/jira/browse/MAPREDUCE-3638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13196499#comment-13196499 ] Philip Su commented on MAPREDUCE-3638: -- It's not urgent. We do have 4 regression tests blocked by this, so it would be good to have this fixed at some point in the near future. Thanks! Yarn trying to download cacheFile to container but Path is a local file --- Key: MAPREDUCE-3638 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3638 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv2 Affects Versions: 0.23.0 Reporter: Thomas Graves Assignee: Mahadev konar It looks like the AM, which is running on host1.com, is trying to access a local file but the file is on host2.com (where the command was run). ran: hadoop --config conf/hadoop/ jar hadoop-streaming.jar -Dmapreduce.job.acl-view-job=* -input Streaming/streaming-610/input.txt -mapper 'xargs cat' -reducer cat -output Streaming/streaming-610/Output -cacheFile file://Streaming/data/streaming-610//InputFile#testlink -jobconf mapred.map.tasks=1 -jobconf mapred.reduce.tasks=1 -jobconf mapred.job.name=streamingTest-610 -jobconf mapreduce.job.acl-view-job=* failure: 11/11/10 07:48:06 INFO mapreduce.Job: Job job_1320887371559_0215 failed with state FAILED due to: Application application_1320887371559_0215 failed 1 times due to AM Container for appattempt_1320887371559_0215_01 exited with exitCode: -1000 due to: java.io.FileNotFoundException: File file:/Streaming/data/streaming-610/InputFile does not exist at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:431) at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:315) at org.apache.hadoop.yarn.util.FSDownload.copy(FSDownload.java:85) at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:152) at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:50) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:619) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-3638) Yarn trying to download cacheFile to container but Path is a local file
[ https://issues.apache.org/jira/browse/MAPREDUCE-3638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13195877#comment-13195877 ] Arun C Murthy commented on MAPREDUCE-3638: -- This looks like a very long-standing bug, this code hasn't changed since 2009... Yarn trying to download cacheFile to container but Path is a local file --- Key: MAPREDUCE-3638 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3638 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv2 Affects Versions: 0.23.0 Reporter: Thomas Graves Assignee: Mahadev konar Priority: Blocker It looks like the AM, which is running on host1.com, is trying to access a local file but the file is on host2.com (where the command was run). ran: hadoop --config conf/hadoop/ jar hadoop-streaming.jar -Dmapreduce.job.acl-view-job=* -input Streaming/streaming-610/input.txt -mapper 'xargs cat' -reducer cat -output Streaming/streaming-610/Output -cacheFile file://Streaming/data/streaming-610//InputFile#testlink -jobconf mapred.map.tasks=1 -jobconf mapred.reduce.tasks=1 -jobconf mapred.job.name=streamingTest-610 -jobconf mapreduce.job.acl-view-job=* failure: 11/11/10 07:48:06 INFO mapreduce.Job: Job job_1320887371559_0215 failed with state FAILED due to: Application application_1320887371559_0215 failed 1 times due to AM Container for appattempt_1320887371559_0215_01 exited with exitCode: -1000 due to: java.io.FileNotFoundException: File file:/Streaming/data/streaming-610/InputFile does not exist at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:431) at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:315) at org.apache.hadoop.yarn.util.FSDownload.copy(FSDownload.java:85) at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:152) at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:50) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:619) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira