Ask a problem when run hadoop
Hi, i hava a problem when use hadoop. Copy the input files into the distributed filesystem:$ bin/hadoop dfs -put conf input 08/08/01 17:42:05 WARN dfs.DFSClient: NotReplicatedYetException sleeping /user/yicha-a-183/yicha/input/configuration.xsl retries left 208/08/01 17:42:06 INFO dfs.DFSClient: org.apache.hadoop.ipc.RemoteException: java.io.IOException: File /user/yicha-a-183/yicha/input/configuration.xsl could only be replicated to 0 nodes, instead of 1at org.apache.hadoop.dfs.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1127) at org.apache.hadoop.dfs.NameNode.addBlock(NameNode.java:312)at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:585)at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:409)at org.apache.hadoop.ipc.Server$Handler.run(Server.java:901) at org.apache.hadoop.ipc.Client.call(Client.java:512)at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:199)at org.apache.hadoop.dfs.$Proxy0.addBlock(Unknown Source)at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:585)at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59) at org.apache.hadoop.dfs.$Proxy0.addBlock(Unknown Source)at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:2074) at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:1967) at org.a pache.hadoop.dfs.DFSClient$DFSOutputStream.access$9(DFSClient.java:1953) at org.apache.hadoop.dfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:1601) 08/08/01 17:42:06 WARN dfs.DFSClient: NotReplicatedYetException sleeping /user/yicha-a-183/yicha/input/configuration.xsl retries left 108/08/01 17:42:09 WARN dfs.DFSClient: DataStreamer Exception: org.apache.hadoop.ipc.RemoteException: java.io.IOException: File /user/yicha-a-183/yicha/input/configuration.xsl could only be replicated to 0 nodes, instead of 1at org.apache.hadoop.dfs.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1127) at org.apache.hadoop.dfs.NameNode.addBlock(NameNode.java:312)at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:585)at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:409)at org.apache.hadoop.ipc.Server$Handler.run(Server.java:901) 08/08/01 17:42:09 WARN dfs.DFSClient: Error Recovery for block null bad datanode[0]put: Could not get block locations. Aborting... when stop the daemons get: $ bin/stop-all.shstopping jobtrackerlocalhost: no tasktracker to stopstopping namenodelocalhost: no datanode to stoplocalhost: no secondarynamenode to stop it can see that "tasktracker ","datanode ","secondarynamenode "are not start. whether the error before is related with "tasktracker ","datanode ","secondarynamenode "are not start ??? why the three are not start and don't have log??? the environment and progress as below. thank you~ jinglu 2008-8-1 the environment : Cygwin in windows 2000 $ ssh localhost Last login: Fri Aug 1 16:46:33 2008 from 127.0.0.1 it can see ssh is already configed correctly.i can ssh to the localhost without a passphrase. Use the following conf/hadoop-site.xml: fs.default.name localhost:9000 mapred.job.tracker localhost:9001 dfs.replication 1 hadoop.tmp.dir /home/zxf/hadoop/tmp/ Format a new distributed-filesystem:$ bin/hadoop namenode -format08/08/01 17:16:09 INFO dfs.NameNode: STARTUP_MSG:/STARTUP_MSG: Starting NameNodeSTARTUP_MSG: host = yicha-a-183/192.168.1.139STARTUP_MSG: args = [-format]STARTUP_MSG: version = 0.16.4STARTUP_MSG: build = http://svn.apache.org/repos/asf/hadoop/core/branches/branch-0.16 -r 652614; compiled by 'hadoopqa' on Fri May 2 00:18:12 UTC 2008/08/08/01 17:16:09 INFO fs.F
Re: help,error "...failed to report status for xxx seconds..."
The Mapred framework kills the map/reduce tasks if they dont report status within 10 minutes. If your mapper/reducer needs more time they should report status using http://hadoop.apache.org/core/docs/current/api/org/apache/hadoop/mapred/Reporter.html More documentation at http://hadoop.apache.org/core/docs/current/mapred_tutorial.html#Reporter You can increase the task timeout by setting mapred.task.timeout. Thanks Amareshwari wangxu wrote: > Hi,all > I always met this kind of error when do mapping job. > > Task task_200807130149_0067_m_00_0 failed to report status for 604 > seconds. Killing! > > > > I am using hadoop-0.16.4-core.jar ,one namenode,one datanode. > > What does this error message suggest? Does it mean functions in mapper is too > slow? > I assume there is no network connection issue. > What can I do about this error? > > > > Thanks, > Xu > > > > >
Re: mapper input file name
You can get the file name accessed by the mapper using the config property "map.input.file" Thanks Amareshwari Deyaa Adranale wrote: Hi, I need to know inside my mapper, the name of the file that contains the current record. I saw that I can access the name of the input directories inside mapper.config(), but my input contains different files and I need to know the name of the current one. any hints? thanks in advance, Deyaa
Re: Could not find any valid local directory for task
The error "Could not find any valid local directory for task" means that the task could not find a local directory to write file, mostly because there is no enough space on any of the disks. Thanks Amareshwari Shirley Cohen wrote: Hi, Does anyone know what the following error means? hadoop-0.16.4/logs/userlogs/task_200808021906_0002_m_14_2]$ cat syslog 2008-08-02 20:28:00,443 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: Initializing JVM Metrics with processName=MAP, sessionId= 2008-08-02 20:28:00,684 INFO org.apache.hadoop.mapred.MapTask: numReduceTasks: 15 2008-08-02 20:30:08,594 WARN org.apache.hadoop.mapred.TaskTracker: Error running child java.io.IOException at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:719) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:209) at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2084) Caused by: org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find any valid local directory for task_200808021906_0002_m_14_2/spill4.out Please let me know if you need more information about my setup. Thanks in advance, Shirley
Re: Running mapred job from remote machine to a pseudo-distributed hadoop
Arv Mistry wrote: I'll try again, can anyone tell me should it be possible to run hadoop in a pseudo-distributed mode (i.e. everything on one machine) and then submit a mapred job using the ToolRunner from another machine on that hadoop configuration? Cheers Arv Yes. It is possible to do. You can start hadoop cluster on single node. Documentation available at http://hadoop.apache.org/core/docs/current/quickstart.html#PseudoDistributed Once the cluster is up, you can submit jobs from any client, but the client configuration should be aware of Namenode and JobTracker nodes. You can use the generic options *-fs* and *-jt* on commandline for the same. Thanks Amareshwari -Original Message- From: Arv Mistry [mailto:[EMAIL PROTECTED] Sent: Thursday, July 31, 2008 2:32 PM To: core-user@hadoop.apache.org Subject: Running mapred job from remote machine to a pseudo-distributed hadoop I have hadoop setup in a pseudo-distributed mode i.e. everything on one machine, And I'm trying to submit a hadoop mapred job from another machine to that hadoop setup. At the point that I run the mapred job I get the following error. Any ideas as to what I'm doing wrong? Is this possible in a pseudo-distributed mode? Cheers Arv INFO | jvm 1| 2008/07/31 14:01:00 | 2008-07-31 14:01:00,547 ERROR [HadoopJobTool] java.io.IOException: /tmp/hadoop-root/mapred/system/job_200807310809_0006/job.xml: No such file or directory INFO | jvm 1| 2008/07/31 14:01:00 | at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:215) INFO | jvm 1| 2008/07/31 14:01:00 | at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:149) INFO | jvm 1| 2008/07/31 14:01:00 | at org.apache.hadoop.fs.FileSystem.copyToLocalFile(FileSystem.java:1155) INFO | jvm 1| 2008/07/31 14:01:00 | at org.apache.hadoop.fs.FileSystem.copyToLocalFile(FileSystem.java:1136) INFO | jvm 1| 2008/07/31 14:01:00 | at org.apache.hadoop.mapred.JobInProgress.(JobInProgress.java:175) INFO | jvm 1| 2008/07/31 14:01:00 | at org.apache.hadoop.mapred.JobTracker.submitJob(JobTracker.java:1755) INFO | jvm 1| 2008/07/31 14:01:00 | at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) INFO | jvm 1| 2008/07/31 14:01:00 | at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.jav a:39) INFO | jvm 1| 2008/07/31 14:01:00 | at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessor Impl.java:25) INFO | jvm 1| 2008/07/31 14:01:00 | at java.lang.reflect.Method.invoke(Method.java:597) INFO | jvm 1| 2008/07/31 14:01:00 | at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:446) INFO | jvm 1| 2008/07/31 14:01:00 | at org.apache.hadoop.ipc.Server$Handler.run(Server.java:896) INFO | jvm 1| 2008/07/31 14:01:00 | INFO | jvm 1| 2008/07/31 14:01:00 | org.apache.hadoop.ipc.RemoteException: java.io.IOException: /tmp/hadoop-root/mapred/system/job_200807310809_0006/job.xml: No such file or directory INFO | jvm 1| 2008/07/31 14:01:00 | at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:215) INFO | jvm 1| 2008/07/31 14:01:00 | at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:149) INFO | jvm 1| 2008/07/31 14:01:00 | at org.apache.hadoop.fs.FileSystem.copyToLocalFile(FileSystem.java:1155) INFO | jvm 1| 2008/07/31 14:01:00 | at org.apache.hadoop.fs.FileSystem.copyToLocalFile(FileSystem.java:1136) INFO | jvm 1| 2008/07/31 14:01:00 | at org.apache.hadoop.mapred.JobInProgress.(JobInProgress.java:175) INFO | jvm 1| 2008/07/31 14:01:00 | at org.apache.hadoop.mapred.JobTracker.submitJob(JobTracker.java:1755) INFO | jvm 1| 2008/07/31 14:01:00 | at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) INFO | jvm 1| 2008/07/31 14:01:00 | at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.jav a:39) INFO | jvm 1| 2008/07/31 14:01:00 | at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessor Impl.java:25) INFO | jvm 1| 2008/07/31 14:01:00 | at java.lang.reflect.Method.invoke(Method.java:597) INFO | jvm 1| 2008/07/31 14:01:00 | at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:446) INFO | jvm 1| 2008/07/31 14:01:00 | at org.apache.hadoop.ipc.Server$Handler.run(Server.java:896) INFO | jvm 1| 2008/07/31 14:01:00 | INFO | jvm 1| 2008/07/31 14:01:00 | at org.apache.hadoop.ipc.Client.call(Client.java:557) INFO | jvm 1| 2008/07/31 14:01:00 | at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:212) INFO | jvm 1| 2008/07/31 14:01:00 | at $Proxy5.submitJob(Unknown Source) INFO | jvm 1| 2008/07/31 14:01:00 | at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) INFO | jvm 1| 2008/07/31 14:01:00 | at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl
Could not find any valid local directory for task
Hi, Does anyone know what the following error means? hadoop-0.16.4/logs/userlogs/task_200808021906_0002_m_14_2]$ cat syslog 2008-08-02 20:28:00,443 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: Initializing JVM Metrics with processName=MAP, sessionId= 2008-08-02 20:28:00,684 INFO org.apache.hadoop.mapred.MapTask: numReduceTasks: 15 2008-08-02 20:30:08,594 WARN org.apache.hadoop.mapred.TaskTracker: Error running child java.io.IOException at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush (MapTask.java:719) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:209) at org.apache.hadoop.mapred.TaskTracker$Child.main (TaskTracker.java:2084) Caused by: org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find any valid local directory for task_200808021906_0002_m_14_2/spill4.out Please let me know if you need more information about my setup. Thanks in advance, Shirley