Hello all,

I'm new to this list and to Hadoop too. I'm testing some basic
configurations before I start to own my own experiments. I've installed
a Hadoop cluster of 2 machines as explained here:

http://www.michael-noll.com/wiki/Running_Hadoop_On_Ubuntu_Linux_(Multi-Node_Cluster)

I'm not using Ubuntu, but Debian Lenny, with Java 1.6.x and Hadoop
0.18.1 installed on the systems.

All daemons are running correctly, and the HDFS is working properly,
with the default replication level of 2 so that all files are replicated
in both PCs. Both hosts have the clock set up correctly.

The problem begins when I try to run the classic wordcount test, I load
some Gutemberg files on the HDFS, and then:

$ bin/hadoop jar hadoop-0.18.1-examples.jar wordcount gutemberg
gutemberg-output

The map phase starts and reaches 100%, then reduce starts and freezes at
approx 14%. I waited several minutes but the job didn't finish.

running "hadoop job -list" gives me this ouput:

$ bin/hadoop job -list
1 jobs currently running
JobId   State   StartTime       UserName
job_200810151758_0003   1       1224106290709   hadoop

...and I can kill it successfully.

In my last test I left the job running and 1 hour later it was
terminated with the following messages:


08/10/15 20:56:38 INFO mapred.JobClient: Task Id :
attempt_200810151952_0001_m_000002_0, Status : FAILED
Too many fetch-failures
08/10/15 20:59:47 WARN mapred.JobClient: Error reading task
outputConnection timed out
08/10/15 21:02:56 WARN mapred.JobClient: Error reading task
outputConnection timed out
08/10/15 21:02:57 INFO mapred.JobClient: Job complete:
job_200810151952_0001
08/10/15 21:02:57 INFO mapred.JobClient: Counters: 16
08/10/15 21:02:57 INFO mapred.JobClient:   File Systems
08/10/15 21:02:57 INFO mapred.JobClient:     HDFS bytes read=6945126
08/10/15 21:02:57 INFO mapred.JobClient:     HDFS bytes written=1410309
08/10/15 21:02:57 INFO mapred.JobClient:     Local bytes read=3472685
08/10/15 21:02:57 INFO mapred.JobClient:     Local bytes written=6422750
08/10/15 21:02:57 INFO mapred.JobClient:   Job Counters 
08/10/15 21:02:57 INFO mapred.JobClient:     Launched reduce tasks=1
08/10/15 21:02:57 INFO mapred.JobClient:     Launched map tasks=12
08/10/15 21:02:57 INFO mapred.JobClient:     Data-local map tasks=12
08/10/15 21:02:57 INFO mapred.JobClient:   Map-Reduce Framework
08/10/15 21:02:57 INFO mapred.JobClient:     Reduce input groups=128360
08/10/15 21:02:57 INFO mapred.JobClient:     Combine output
records=329346
08/10/15 21:02:57 INFO mapred.JobClient:     Map input records=137114
08/10/15 21:02:57 INFO mapred.JobClient:     Reduce output
records=128360
08/10/15 21:02:57 INFO mapred.JobClient:     Map output bytes=11428977
08/10/15 21:02:57 INFO mapred.JobClient:     Map input bytes=6945126
08/10/15 21:02:57 INFO mapred.JobClient:     Combine input
records=1375481
08/10/15 21:02:57 INFO mapred.JobClient:     Map output records=1174495
08/10/15 21:02:57 INFO mapred.JobClient:     Reduce input records=128360

When I start the cluster with only the master server (namenode,
jobtracker, datanode and tasktracker) the job works perfectly, so I
suppose there's some problem with the communication against the slave
node.

Any help will be appreciated.

--
Lucas Di Pentima - http://lucas.di-pentima.com.ar
GnuPG Public Key:
http://pgp.mit.edu:11371/pks/lookup?op=get&search=0x6AA54FC9
Key fingerprint = BD3B 08C4 661A 8C3B 1855  740C 8F98 3FCF 6AA5 4FC9

Attachment: signature.asc
Description: Esta parte del mensaje está firmada digitalmente

Reply via email to