Ask a problem when run hadoop

2008-08-03 Thread 纯 郭

 
Hi, i hava a problem when use hadoop.
 
Copy the input files into the distributed filesystem:$ bin/hadoop dfs -put conf 
input 
08/08/01 17:42:05 WARN dfs.DFSClient: NotReplicatedYetException sleeping 
/user/yicha-a-183/yicha/input/configuration.xsl retries left 208/08/01 17:42:06 
INFO dfs.DFSClient: org.apache.hadoop.ipc.RemoteException: java.io.IOException: 
File /user/yicha-a-183/yicha/input/configuration.xsl could only be replicated 
to 0 nodes, instead of 1at 
org.apache.hadoop.dfs.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1127)   
 at org.apache.hadoop.dfs.NameNode.addBlock(NameNode.java:312)at 
sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)   
 at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:585)at 
org.apache.hadoop.ipc.RPC$Server.call(RPC.java:409)at 
org.apache.hadoop.ipc.Server$Handler.run(Server.java:901)
at org.apache.hadoop.ipc.Client.call(Client.java:512)at 
org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:199)at 
org.apache.hadoop.dfs.$Proxy0.addBlock(Unknown Source)at 
sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)   
 at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:585)at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
at org.apache.hadoop.dfs.$Proxy0.addBlock(Unknown Source)at 
org.apache.hadoop.dfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:2074)
at 
org.apache.hadoop.dfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:1967)
at org.a
 pache.hadoop.dfs.DFSClient$DFSOutputStream.access$9(DFSClient.java:1953)   
 at 
org.apache.hadoop.dfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:1601)
08/08/01 17:42:06 WARN dfs.DFSClient: NotReplicatedYetException sleeping 
/user/yicha-a-183/yicha/input/configuration.xsl retries left 108/08/01 17:42:09 
WARN dfs.DFSClient: DataStreamer Exception: 
org.apache.hadoop.ipc.RemoteException: java.io.IOException: File 
/user/yicha-a-183/yicha/input/configuration.xsl could only be replicated to 0 
nodes, instead of 1at 
org.apache.hadoop.dfs.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1127)   
 at org.apache.hadoop.dfs.NameNode.addBlock(NameNode.java:312)at 
sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)   
 at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:585)at 
org.apache.hadoop.ipc.RPC$Server.call(RPC.java:409)at 
org.apache.hadoop.ipc.Server$Handler.run(Server.java:901)
08/08/01 17:42:09 WARN dfs.DFSClient: Error Recovery for block null bad 
datanode[0]put: Could not get block locations. Aborting...
when stop the daemons get:
$ bin/stop-all.shstopping jobtrackerlocalhost: no tasktracker to stopstopping 
namenodelocalhost: no datanode to stoplocalhost: no secondarynamenode to stop
it can see that "tasktracker ","datanode ","secondarynamenode "are not start. 
whether the error before is related with "tasktracker ","datanode 
","secondarynamenode "are not start ??? why the three are not start and don't 
have log??? 
 
the environment and progress as below. thank you~
 

jinglu

2008-8-1
 
 
 
 
 
 
the environment : Cygwin in windows 2000
 
$ ssh localhost 
Last login: Fri Aug  1 16:46:33 2008 from 127.0.0.1
it can see ssh is already configed correctly.i can ssh to the localhost without 
a passphrase.
 
Use the following conf/hadoop-site.xml:
 fs.default.name 
localhost:9000   
mapred.job.tracker localhost:9001  
 dfs.replication 1 
   hadoop.tmp.dir   
/home/zxf/hadoop/tmp/
 
Format a new distributed-filesystem:$ bin/hadoop namenode -format08/08/01 
17:16:09 INFO dfs.NameNode: 
STARTUP_MSG:/STARTUP_MSG:
 Starting NameNodeSTARTUP_MSG:   host = yicha-a-183/192.168.1.139STARTUP_MSG:   
args = [-format]STARTUP_MSG:   version = 0.16.4STARTUP_MSG:   build = 
http://svn.apache.org/repos/asf/hadoop/core/branches/branch-0.16 -r 652614; 
compiled by 'hadoopqa' on Fri May  2 00:18:12 UTC 
2008/08/08/01 
17:16:09 INFO fs.F

Re: help,error "...failed to report status for xxx seconds..."

2008-08-03 Thread Amareshwari Sriramadasu
The Mapred framework kills the map/reduce tasks if they dont report
status within 10 minutes. If your mapper/reducer needs more time they
should report status using
http://hadoop.apache.org/core/docs/current/api/org/apache/hadoop/mapred/Reporter.html
More documentation at
http://hadoop.apache.org/core/docs/current/mapred_tutorial.html#Reporter
You can increase the task timeout by setting mapred.task.timeout.

Thanks
Amareshwari
wangxu wrote:
> Hi,all
> I always met this kind of error when do mapping job.
>
> Task task_200807130149_0067_m_00_0 failed to report status for 604 
> seconds. Killing!
>
>
>
> I am using hadoop-0.16.4-core.jar ,one namenode,one datanode.
>
> What does this error message suggest? Does it mean functions in mapper is too 
> slow? 
> I assume there is no network connection issue.
> What can I do about this error?
>
>
>
> Thanks,
> Xu
>
>
>
>
>   



Re: mapper input file name

2008-08-03 Thread Amareshwari Sriramadasu
You can get the file name accessed by the mapper using the config 
property "map.input.file"


Thanks
Amareshwari
Deyaa Adranale wrote:

Hi,

I need to know inside my mapper, the name of the file that contains 
the current record.
I saw that I can access the name of the input directories inside 
mapper.config(), but my input contains different files and I need to 
know the name of the current one.


any hints?

thanks in advance,

Deyaa




Re: Could not find any valid local directory for task

2008-08-03 Thread Amareshwari Sriramadasu
The error "Could not find any valid local directory for task" means that 
the task could not find a local directory to write file, mostly because 
there is no enough space on any of the disks.


Thanks
Amareshwari

Shirley Cohen wrote:

Hi,

Does anyone know what the following error means?

hadoop-0.16.4/logs/userlogs/task_200808021906_0002_m_14_2]$ cat 
syslog
2008-08-02 20:28:00,443 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: 
Initializing JVM Metrics with processName=MAP, sessionId=
2008-08-02 20:28:00,684 INFO org.apache.hadoop.mapred.MapTask: 
numReduceTasks: 15
2008-08-02 20:30:08,594 WARN org.apache.hadoop.mapred.TaskTracker: 
Error running child

java.io.IOException
at 
org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:719)

at org.apache.hadoop.mapred.MapTask.run(MapTask.java:209)
at 
org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2084)
Caused by: org.apache.hadoop.util.DiskChecker$DiskErrorException: 
Could not find any valid local directory for 
task_200808021906_0002_m_14_2/spill4.out


Please let me know if you need more information about my setup.

Thanks in advance,

Shirley




Re: Running mapred job from remote machine to a pseudo-distributed hadoop

2008-08-03 Thread Amareshwari Sriramadasu

Arv Mistry wrote:

I'll try again, can anyone tell me should it be possible to run hadoop
in a pseudo-distributed mode (i.e. everything on one machine) and then
submit a mapred job using the ToolRunner from another machine on that
hadoop configuration?

Cheers Arv
 
  

Yes. It is possible to do. You can start hadoop cluster on single node.
Documentation available at 
http://hadoop.apache.org/core/docs/current/quickstart.html#PseudoDistributed
Once the cluster is up, you can submit jobs from any client, but the 
client configuration should be aware of Namenode and JobTracker nodes. 
You can use the generic options *-fs* and *-jt* on commandline for the same.


Thanks
Amareshwari


-Original Message-
From: Arv Mistry [mailto:[EMAIL PROTECTED] 
Sent: Thursday, July 31, 2008 2:32 PM

To: core-user@hadoop.apache.org
Subject: Running mapred job from remote machine to a pseudo-distributed
hadoop

 
I have hadoop setup in a pseudo-distributed mode i.e. everything on one

machine, And I'm trying to submit a hadoop mapred job from another
machine to that hadoop setup.

At the point that I run the mapred job I get the following error. Any
ideas as to what I'm doing wrong?
Is this possible in a pseudo-distributed mode?

Cheers Arv

 INFO   | jvm 1| 2008/07/31 14:01:00 | 2008-07-31 14:01:00,547 ERROR
[HadoopJobTool] java.io.IOException:
/tmp/hadoop-root/mapred/system/job_200807310809_0006/job.xml: No such
file or directory
INFO   | jvm 1| 2008/07/31 14:01:00 |   at
org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:215)
INFO   | jvm 1| 2008/07/31 14:01:00 |   at
org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:149)
INFO   | jvm 1| 2008/07/31 14:01:00 |   at
org.apache.hadoop.fs.FileSystem.copyToLocalFile(FileSystem.java:1155)
INFO   | jvm 1| 2008/07/31 14:01:00 |   at
org.apache.hadoop.fs.FileSystem.copyToLocalFile(FileSystem.java:1136)
INFO   | jvm 1| 2008/07/31 14:01:00 |   at
org.apache.hadoop.mapred.JobInProgress.(JobInProgress.java:175)
INFO   | jvm 1| 2008/07/31 14:01:00 |   at
org.apache.hadoop.mapred.JobTracker.submitJob(JobTracker.java:1755)
INFO   | jvm 1| 2008/07/31 14:01:00 |   at
sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
INFO   | jvm 1| 2008/07/31 14:01:00 |   at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.jav
a:39)
INFO   | jvm 1| 2008/07/31 14:01:00 |   at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessor
Impl.java:25)
INFO   | jvm 1| 2008/07/31 14:01:00 |   at
java.lang.reflect.Method.invoke(Method.java:597)
INFO   | jvm 1| 2008/07/31 14:01:00 |   at
org.apache.hadoop.ipc.RPC$Server.call(RPC.java:446)
INFO   | jvm 1| 2008/07/31 14:01:00 |   at
org.apache.hadoop.ipc.Server$Handler.run(Server.java:896)
INFO   | jvm 1| 2008/07/31 14:01:00 |
INFO   | jvm 1| 2008/07/31 14:01:00 |
org.apache.hadoop.ipc.RemoteException: java.io.IOException:
/tmp/hadoop-root/mapred/system/job_200807310809_0006/job.xml: No such
file or directory
INFO   | jvm 1| 2008/07/31 14:01:00 |   at
org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:215)
INFO   | jvm 1| 2008/07/31 14:01:00 |   at
org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:149)
INFO   | jvm 1| 2008/07/31 14:01:00 |   at
org.apache.hadoop.fs.FileSystem.copyToLocalFile(FileSystem.java:1155)
INFO   | jvm 1| 2008/07/31 14:01:00 |   at
org.apache.hadoop.fs.FileSystem.copyToLocalFile(FileSystem.java:1136)
INFO   | jvm 1| 2008/07/31 14:01:00 |   at
org.apache.hadoop.mapred.JobInProgress.(JobInProgress.java:175)
INFO   | jvm 1| 2008/07/31 14:01:00 |   at
org.apache.hadoop.mapred.JobTracker.submitJob(JobTracker.java:1755)
INFO   | jvm 1| 2008/07/31 14:01:00 |   at
sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
INFO   | jvm 1| 2008/07/31 14:01:00 |   at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.jav
a:39)
INFO   | jvm 1| 2008/07/31 14:01:00 |   at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessor
Impl.java:25)
INFO   | jvm 1| 2008/07/31 14:01:00 |   at
java.lang.reflect.Method.invoke(Method.java:597)
INFO   | jvm 1| 2008/07/31 14:01:00 |   at
org.apache.hadoop.ipc.RPC$Server.call(RPC.java:446)
INFO   | jvm 1| 2008/07/31 14:01:00 |   at
org.apache.hadoop.ipc.Server$Handler.run(Server.java:896)
INFO   | jvm 1| 2008/07/31 14:01:00 |
INFO   | jvm 1| 2008/07/31 14:01:00 |   at
org.apache.hadoop.ipc.Client.call(Client.java:557)
INFO   | jvm 1| 2008/07/31 14:01:00 |   at
org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:212)
INFO   | jvm 1| 2008/07/31 14:01:00 |   at
$Proxy5.submitJob(Unknown Source)
INFO   | jvm 1| 2008/07/31 14:01:00 |   at
sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
INFO   | jvm 1| 2008/07/31 14:01:00 |   at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl

Could not find any valid local directory for task

2008-08-03 Thread Shirley Cohen

Hi,

Does anyone know what the following error means?

hadoop-0.16.4/logs/userlogs/task_200808021906_0002_m_14_2]$ cat  
syslog
2008-08-02 20:28:00,443 INFO  
org.apache.hadoop.metrics.jvm.JvmMetrics: Initializing JVM Metrics  
with processName=MAP, sessionId=
2008-08-02 20:28:00,684 INFO org.apache.hadoop.mapred.MapTask:  
numReduceTasks: 15
2008-08-02 20:30:08,594 WARN org.apache.hadoop.mapred.TaskTracker:  
Error running child

java.io.IOException
at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush 
(MapTask.java:719)

at org.apache.hadoop.mapred.MapTask.run(MapTask.java:209)
at org.apache.hadoop.mapred.TaskTracker$Child.main 
(TaskTracker.java:2084)
Caused by: org.apache.hadoop.util.DiskChecker$DiskErrorException:  
Could not find any valid local directory for  
task_200808021906_0002_m_14_2/spill4.out


Please let me know if you need more information about my setup.

Thanks in advance,

Shirley