Re: distcp failing
i'm not sure that's the issue, i basically tarred up the hadoop directory from the cluster and copied it over to the non-data node but i do agree i've likely got a setting wrong, since i can run distcp from the namenode and it works fine. the question is which one On Mon, Sep 8, 2008 at 7:04 PM, Aaron Kimball [EMAIL PROTECTED]wrote: It is likely that you mapred.system.dir and/or fs.default.name settings are incorrect on the non-datanode machine that you are launching the task from. These two settings (in your conf/hadoop-site.xml file) must match the settings on the cluster itself. - Aaron On Sun, Sep 7, 2008 at 8:58 PM, Michael Di Domenico [EMAIL PROTECTED]wrote: I'm attempting to load data into hadoop (version 0.17.1), from a non-datanode machine in the cluster. I can run jobs and copyFromLocal works fine, but when i try to use distcp i get the below. I'm don't understand what the error, can anyone help? Thanks blue:hadoop-0.17.1 mdidomenico$ time bin/hadoop distcp -overwrite file:///Users/mdidomenico/hadoop/1gTestfile /user/mdidomenico/1gTestfile 08/09/07 23:56:06 INFO util.CopyFiles: srcPaths=[file:/Users/mdidomenico/hadoop/1gTestfile] 08/09/07 23:56:06 INFO util.CopyFiles: destPath=/user/mdidomenico/1gTestfile1 08/09/07 23:56:07 INFO util.CopyFiles: srcCount=1 With failures, global counters are inaccurate; consider running with -i Copy failed: org.apache.hadoop.ipc.RemoteException: java.io.IOException: /tmp/hadoop-hadoop/mapred/system/job_200809072254_0005/job.xml: No such file or directory at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:215) at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:149) at org.apache.hadoop.fs.FileSystem.copyToLocalFile(FileSystem.java:1155) at org.apache.hadoop.fs.FileSystem.copyToLocalFile(FileSystem.java:1136) at org.apache.hadoop.mapred.JobInProgress.init(JobInProgress.java:175) at org.apache.hadoop.mapred.JobTracker.submitJob(JobTracker.java:1755) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source) at java.lang.reflect.Method.invoke(Unknown Source) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:446) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:896) at org.apache.hadoop.ipc.Client.call(Client.java:557) at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:212) at $Proxy1.submitJob(Unknown Source) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:585) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59) at $Proxy1.submitJob(Unknown Source) at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:758) at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:973) at org.apache.hadoop.util.CopyFiles.copy(CopyFiles.java:604) at org.apache.hadoop.util.CopyFiles.run(CopyFiles.java:743) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79) at org.apache.hadoop.util.CopyFiles.main(CopyFiles.java:763)
Re: distcp failing
a little more digging and it appears i cannot run distcp as someone other then hadoop on the namenode /tmp/hadoop-hadoop/mapred/system/job_200809091231_0005/job.xml looking at this directory from the error file the system directory does not exist on the namenode, i only have a local directory On Tue, Sep 9, 2008 at 12:41 PM, Michael Di Domenico [EMAIL PROTECTED] wrote: i'm not sure that's the issue, i basically tarred up the hadoop directory from the cluster and copied it over to the non-data node but i do agree i've likely got a setting wrong, since i can run distcp from the namenode and it works fine. the question is which one On Mon, Sep 8, 2008 at 7:04 PM, Aaron Kimball [EMAIL PROTECTED]wrote: It is likely that you mapred.system.dir and/or fs.default.name settings are incorrect on the non-datanode machine that you are launching the task from. These two settings (in your conf/hadoop-site.xml file) must match the settings on the cluster itself. - Aaron On Sun, Sep 7, 2008 at 8:58 PM, Michael Di Domenico [EMAIL PROTECTED]wrote: I'm attempting to load data into hadoop (version 0.17.1), from a non-datanode machine in the cluster. I can run jobs and copyFromLocal works fine, but when i try to use distcp i get the below. I'm don't understand what the error, can anyone help? Thanks blue:hadoop-0.17.1 mdidomenico$ time bin/hadoop distcp -overwrite file:///Users/mdidomenico/hadoop/1gTestfile /user/mdidomenico/1gTestfile 08/09/07 23:56:06 INFO util.CopyFiles: srcPaths=[file:/Users/mdidomenico/hadoop/1gTestfile] 08/09/07 23:56:06 INFO util.CopyFiles: destPath=/user/mdidomenico/1gTestfile1 08/09/07 23:56:07 INFO util.CopyFiles: srcCount=1 With failures, global counters are inaccurate; consider running with -i Copy failed: org.apache.hadoop.ipc.RemoteException: java.io.IOException: /tmp/hadoop-hadoop/mapred/system/job_200809072254_0005/job.xml: No such file or directory at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:215) at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:149) at org.apache.hadoop.fs.FileSystem.copyToLocalFile(FileSystem.java:1155) at org.apache.hadoop.fs.FileSystem.copyToLocalFile(FileSystem.java:1136) at org.apache.hadoop.mapred.JobInProgress.init(JobInProgress.java:175) at org.apache.hadoop.mapred.JobTracker.submitJob(JobTracker.java:1755) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source) at java.lang.reflect.Method.invoke(Unknown Source) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:446) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:896) at org.apache.hadoop.ipc.Client.call(Client.java:557) at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:212) at $Proxy1.submitJob(Unknown Source) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:585) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59) at $Proxy1.submitJob(Unknown Source) at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:758) at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:973) at org.apache.hadoop.util.CopyFiles.copy(CopyFiles.java:604) at org.apache.hadoop.util.CopyFiles.run(CopyFiles.java:743) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79) at org.apache.hadoop.util.CopyFiles.main(CopyFiles.java:763)
Re: distcp failing
manually creating the system directory gets me past the first error, but now i get this. i'm not necessarily sure its a step forward though, because the map task never shows up in the jobtracker [EMAIL PROTECTED] hadoop-0.17.1]$ bin/hadoop distcp file:///home/mdidomenico/1gTestfile 1gTestfile 08/09/09 13:12:06 INFO util.CopyFiles: srcPaths=[file:/home/mdidomenico/1gTestfile] 08/09/09 13:12:06 INFO util.CopyFiles: destPath=1gTestfile 08/09/09 13:12:07 INFO dfs.DFSClient: Exception in createBlockOutputStream java.io.IOException: Could not read from stream 08/09/09 13:12:07 INFO dfs.DFSClient: Abandoning block blk_5758513071638050362 08/09/09 13:12:13 INFO dfs.DFSClient: Exception in createBlockOutputStream java.io.IOException: Could not read from stream 08/09/09 13:12:13 INFO dfs.DFSClient: Abandoning block blk_1691495306775808049 08/09/09 13:12:17 INFO dfs.DFSClient: Exception in createBlockOutputStream java.io.IOException: Could not read from stream 08/09/09 13:12:17 INFO dfs.DFSClient: Abandoning block blk_1027634596973755899 08/09/09 13:12:19 INFO dfs.DFSClient: Exception in createBlockOutputStream java.io.IOException: Could not read from stream 08/09/09 13:12:19 INFO dfs.DFSClient: Abandoning block blk_4535302510016050282 08/09/09 13:12:23 INFO dfs.DFSClient: Exception in createBlockOutputStream java.io.IOException: Could not read from stream 08/09/09 13:12:23 INFO dfs.DFSClient: Abandoning block blk_7022658012001626339 08/09/09 13:12:25 INFO dfs.DFSClient: Exception in createBlockOutputStream java.io.IOException: Could not read from stream 08/09/09 13:12:25 INFO dfs.DFSClient: Abandoning block blk_-4509681241839967328 08/09/09 13:12:29 INFO dfs.DFSClient: Exception in createBlockOutputStream java.io.IOException: Could not read from stream 08/09/09 13:12:29 INFO dfs.DFSClient: Abandoning block blk_8318033979013580420 08/09/09 13:12:31 WARN dfs.DFSClient: DataStreamer Exception: java.io.IOException: Unable to create new block. 08/09/09 13:12:31 WARN dfs.DFSClient: Error Recovery for block blk_-4509681241839967328 bad datanode[0] 08/09/09 13:12:35 INFO dfs.DFSClient: Exception in createBlockOutputStream java.io.IOException: Could not read from stream 08/09/09 13:12:35 INFO dfs.DFSClient: Abandoning block blk_2848354798649979411 08/09/09 13:12:41 WARN dfs.DFSClient: DataStreamer Exception: java.io.IOException: Unable to create new block. 08/09/09 13:12:41 WARN dfs.DFSClient: Error Recovery for block blk_2848354798649979411 bad datanode[0] Exception in thread Thread-0 java.util.ConcurrentModificationException at java.util.TreeMap$PrivateEntryIterator.nextEntry(Unknown Source) at java.util.TreeMap$KeyIterator.next(Unknown Source) at org.apache.hadoop.dfs.DFSClient.close(DFSClient.java:217) at org.apache.hadoop.dfs.DistributedFileSystem.close(DistributedFileSystem.java:214) at org.apache.hadoop.fs.FileSystem$Cache.closeAll(FileSystem.java:1324) at org.apache.hadoop.fs.FileSystem.closeAll(FileSystem.java:224) at org.apache.hadoop.fs.FileSystem$ClientFinalizer.run(FileSystem.java:209) 08/09/09 13:12:41 INFO dfs.DFSClient: Exception in createBlockOutputStream java.io.IOException: Could not read from stream 08/09/09 13:12:41 INFO dfs.DFSClient: Abandoning block blk_9189111926428577428 On Tue, Sep 9, 2008 at 1:03 PM, Michael Di Domenico [EMAIL PROTECTED]wrote: a little more digging and it appears i cannot run distcp as someone other then hadoop on the namenode /tmp/hadoop-hadoop/mapred/system/job_200809091231_0005/job.xml looking at this directory from the error file the system directory does not exist on the namenode, i only have a local directory On Tue, Sep 9, 2008 at 12:41 PM, Michael Di Domenico [EMAIL PROTECTED] wrote: i'm not sure that's the issue, i basically tarred up the hadoop directory from the cluster and copied it over to the non-data node but i do agree i've likely got a setting wrong, since i can run distcp from the namenode and it works fine. the question is which one On Mon, Sep 8, 2008 at 7:04 PM, Aaron Kimball [EMAIL PROTECTED]wrote: It is likely that you mapred.system.dir and/or fs.default.name settings are incorrect on the non-datanode machine that you are launching the task from. These two settings (in your conf/hadoop-site.xml file) must match the settings on the cluster itself. - Aaron On Sun, Sep 7, 2008 at 8:58 PM, Michael Di Domenico [EMAIL PROTECTED]wrote: I'm attempting to load data into hadoop (version 0.17.1), from a non-datanode machine in the cluster. I can run jobs and copyFromLocal works fine, but when i try to use distcp i get the below. I'm don't understand what the error, can anyone help? Thanks blue:hadoop-0.17.1 mdidomenico$ time bin/hadoop distcp -overwrite file:///Users/mdidomenico/hadoop/1gTestfile /user/mdidomenico/1gTestfile 08/09/07 23:56:06 INFO util.CopyFiles: srcPaths=[file:/Users/mdidomenico/hadoop/1gTestfile] 08/09/07 23:56:06 INFO
Re: distcp failing
Apparently, the fix to my original error is because hadoop is setup for a single local machine out of the box and i had to change these directories property namemapred.local.dir/name value/hadoop/mapred/local/value /property property namemapred.system.dir/name value/hadoop/mapred/system/value /property property namemapred.temp.dir/name value/hadoop/mapred/temp/value /property to be hdfs instead of hadoop.tmp.dir So now distcp works as a non-hadoop user and mapred works as a non-hadoop user from the name node, however, from a workstation i get this now blue:hadoop-0.17.1 mdidomenico$ bin/hadoop distcp file:///Users/mdidomenico/hadoop/1gTestfile 1gTestfile-1 08/09/09 13:44:19 INFO util.CopyFiles: srcPaths=[file:/Users/mdidomenico/hadoop/1gTestfile] 08/09/09 13:44:19 INFO util.CopyFiles: destPath=1gTestfile-1 08/09/09 13:44:20 INFO util.CopyFiles: srcCount=1 08/09/09 13:44:22 INFO mapred.JobClient: Running job: job_200809091332_0004 08/09/09 13:44:23 INFO mapred.JobClient: map 0% reduce 0% 08/09/09 13:44:31 INFO mapred.JobClient: Task Id : task_200809091332_0004_m_00_0, Status : FAILED java.io.IOException: Copied: 0 Skipped: 0 Failed: 1 at org.apache.hadoop.util.CopyFiles$CopyFilesMapper.close(CopyFiles.java:527) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:219) at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2124) 08/09/09 13:44:50 INFO mapred.JobClient: Task Id : task_200809091332_0004_m_00_1, Status : FAILED java.io.IOException: Copied: 0 Skipped: 0 Failed: 1 at org.apache.hadoop.util.CopyFiles$CopyFilesMapper.close(CopyFiles.java:527) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:219) at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2124) 08/09/09 13:45:07 INFO mapred.JobClient: Task Id : task_200809091332_0004_m_00_2, Status : FAILED java.io.IOException: Copied: 0 Skipped: 0 Failed: 1 at org.apache.hadoop.util.CopyFiles$CopyFilesMapper.close(CopyFiles.java:527) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:219) at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2124) 08/09/09 13:45:26 INFO mapred.JobClient: map 100% reduce 100% With failures, global counters are inaccurate; consider running with -i Copy failed: java.io.IOException: Job failed! at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1062) at org.apache.hadoop.util.CopyFiles.copy(CopyFiles.java:604) at org.apache.hadoop.util.CopyFiles.run(CopyFiles.java:743) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79) at org.apache.hadoop.util.CopyFiles.main(CopyFiles.java:763) On Tue, Sep 9, 2008 at 1:14 PM, Michael Di Domenico [EMAIL PROTECTED]wrote: manually creating the system directory gets me past the first error, but now i get this. i'm not necessarily sure its a step forward though, because the map task never shows up in the jobtracker [EMAIL PROTECTED] hadoop-0.17.1]$ bin/hadoop distcp file:///home/mdidomenico/1gTestfile 1gTestfile 08/09/09 13:12:06 INFO util.CopyFiles: srcPaths=[file:/home/mdidomenico/1gTestfile] 08/09/09 13:12:06 INFO util.CopyFiles: destPath=1gTestfile 08/09/09 13:12:07 INFO dfs.DFSClient: Exception in createBlockOutputStream java.io.IOException: Could not read from stream 08/09/09 13:12:07 INFO dfs.DFSClient: Abandoning block blk_5758513071638050362 08/09/09 13:12:13 INFO dfs.DFSClient: Exception in createBlockOutputStream java.io.IOException: Could not read from stream 08/09/09 13:12:13 INFO dfs.DFSClient: Abandoning block blk_1691495306775808049 08/09/09 13:12:17 INFO dfs.DFSClient: Exception in createBlockOutputStream java.io.IOException: Could not read from stream 08/09/09 13:12:17 INFO dfs.DFSClient: Abandoning block blk_1027634596973755899 08/09/09 13:12:19 INFO dfs.DFSClient: Exception in createBlockOutputStream java.io.IOException: Could not read from stream 08/09/09 13:12:19 INFO dfs.DFSClient: Abandoning block blk_4535302510016050282 08/09/09 13:12:23 INFO dfs.DFSClient: Exception in createBlockOutputStream java.io.IOException: Could not read from stream 08/09/09 13:12:23 INFO dfs.DFSClient: Abandoning block blk_7022658012001626339 08/09/09 13:12:25 INFO dfs.DFSClient: Exception in createBlockOutputStream java.io.IOException: Could not read from stream 08/09/09 13:12:25 INFO dfs.DFSClient: Abandoning block blk_-4509681241839967328 08/09/09 13:12:29 INFO dfs.DFSClient: Exception in createBlockOutputStream java.io.IOException: Could not read from stream 08/09/09 13:12:29 INFO dfs.DFSClient: Abandoning block blk_8318033979013580420 08/09/09 13:12:31 WARN dfs.DFSClient:
Re: distcp failing
Looking in the task tracker log, i see this This file does exist on my local workstation, but it does not exist on the namenode/datanodes in my cluster. So it begs the question of if i misunderstood the use of distcp or is there still something wrong? I'm looking for something that will read a file from my workstation and load it into the dfs, but instead of going through the namenode like copyFromLocal seems to do, i'd like it to load the data via the datanodes directly, if distcp doesn't do it this way, is there anything that will? 2008-09-09 14:00:54,418 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: Initializing JVM Metrics with processName=MAP, sessionId= 2008-09-09 14:00:54,662 INFO org.apache.hadoop.mapred.MapTask: numReduceTasks: 0 2008-09-09 14:00:54,894 INFO org.apache.hadoop.util.CopyFiles: FAIL 1gTestfile : java.io.FileNotFoundException: File file:/Users/mdidomenico/hadoop/1gTestfile does not exist. at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:402) at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:242) at org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.init(ChecksumFileSystem.java:116) at org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:274) at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:380) at org.apache.hadoop.util.CopyFiles$CopyFilesMapper.copy(CopyFiles.java:366) at org.apache.hadoop.util.CopyFiles$CopyFilesMapper.map(CopyFiles.java:493) at org.apache.hadoop.util.CopyFiles$CopyFilesMapper.map(CopyFiles.java:268) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:47) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:219) at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2124) 2008-09-09 14:01:03,950 WARN org.apache.hadoop.mapred.TaskTracker: Error running child java.io.IOException: Copied: 0 Skipped: 0 Failed: 1 at org.apache.hadoop.util.CopyFiles$CopyFilesMapper.close(CopyFiles.java:527) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:219) at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2124) On Tue, Sep 9, 2008 at 1:47 PM, Michael Di Domenico [EMAIL PROTECTED]wrote: Apparently, the fix to my original error is because hadoop is setup for a single local machine out of the box and i had to change these directories property namemapred.local.dir/name value/hadoop/mapred/local/value /property property namemapred.system.dir/name value/hadoop/mapred/system/value /property property namemapred.temp.dir/name value/hadoop/mapred/temp/value /property to be hdfs instead of hadoop.tmp.dir So now distcp works as a non-hadoop user and mapred works as a non-hadoop user from the name node, however, from a workstation i get this now blue:hadoop-0.17.1 mdidomenico$ bin/hadoop distcp file:///Users/mdidomenico/hadoop/1gTestfile 1gTestfile-1 08/09/09 13:44:19 INFO util.CopyFiles: srcPaths=[file:/Users/mdidomenico/hadoop/1gTestfile] 08/09/09 13:44:19 INFO util.CopyFiles: destPath=1gTestfile-1 08/09/09 13:44:20 INFO util.CopyFiles: srcCount=1 08/09/09 13:44:22 INFO mapred.JobClient: Running job: job_200809091332_0004 08/09/09 13:44:23 INFO mapred.JobClient: map 0% reduce 0% 08/09/09 13:44:31 INFO mapred.JobClient: Task Id : task_200809091332_0004_m_00_0, Status : FAILED java.io.IOException: Copied: 0 Skipped: 0 Failed: 1 at org.apache.hadoop.util.CopyFiles$CopyFilesMapper.close(CopyFiles.java:527) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:219) at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2124) 08/09/09 13:44:50 INFO mapred.JobClient: Task Id : task_200809091332_0004_m_00_1, Status : FAILED java.io.IOException: Copied: 0 Skipped: 0 Failed: 1 at org.apache.hadoop.util.CopyFiles$CopyFilesMapper.close(CopyFiles.java:527) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:219) at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2124) 08/09/09 13:45:07 INFO mapred.JobClient: Task Id : task_200809091332_0004_m_00_2, Status : FAILED java.io.IOException: Copied: 0 Skipped: 0 Failed: 1 at org.apache.hadoop.util.CopyFiles$CopyFilesMapper.close(CopyFiles.java:527) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:219) at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2124) 08/09/09 13:45:26 INFO mapred.JobClient: map 100% reduce 100% With failures, global counters are inaccurate; consider running with -i Copy failed: java.io.IOException: Job failed! at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1062) at
Re: distcp failing
It is likely that you mapred.system.dir and/or fs.default.name settings are incorrect on the non-datanode machine that you are launching the task from. These two settings (in your conf/hadoop-site.xml file) must match the settings on the cluster itself. - Aaron On Sun, Sep 7, 2008 at 8:58 PM, Michael Di Domenico [EMAIL PROTECTED]wrote: I'm attempting to load data into hadoop (version 0.17.1), from a non-datanode machine in the cluster. I can run jobs and copyFromLocal works fine, but when i try to use distcp i get the below. I'm don't understand what the error, can anyone help? Thanks blue:hadoop-0.17.1 mdidomenico$ time bin/hadoop distcp -overwrite file:///Users/mdidomenico/hadoop/1gTestfile /user/mdidomenico/1gTestfile 08/09/07 23:56:06 INFO util.CopyFiles: srcPaths=[file:/Users/mdidomenico/hadoop/1gTestfile] 08/09/07 23:56:06 INFO util.CopyFiles: destPath=/user/mdidomenico/1gTestfile1 08/09/07 23:56:07 INFO util.CopyFiles: srcCount=1 With failures, global counters are inaccurate; consider running with -i Copy failed: org.apache.hadoop.ipc.RemoteException: java.io.IOException: /tmp/hadoop-hadoop/mapred/system/job_200809072254_0005/job.xml: No such file or directory at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:215) at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:149) at org.apache.hadoop.fs.FileSystem.copyToLocalFile(FileSystem.java:1155) at org.apache.hadoop.fs.FileSystem.copyToLocalFile(FileSystem.java:1136) at org.apache.hadoop.mapred.JobInProgress.init(JobInProgress.java:175) at org.apache.hadoop.mapred.JobTracker.submitJob(JobTracker.java:1755) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source) at java.lang.reflect.Method.invoke(Unknown Source) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:446) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:896) at org.apache.hadoop.ipc.Client.call(Client.java:557) at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:212) at $Proxy1.submitJob(Unknown Source) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:585) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59) at $Proxy1.submitJob(Unknown Source) at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:758) at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:973) at org.apache.hadoop.util.CopyFiles.copy(CopyFiles.java:604) at org.apache.hadoop.util.CopyFiles.run(CopyFiles.java:743) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79) at org.apache.hadoop.util.CopyFiles.main(CopyFiles.java:763)