Hello all, I'm new to this list and to Hadoop too. I'm testing some basic configurations before I start to own my own experiments. I've installed a Hadoop cluster of 2 machines as explained here:
http://www.michael-noll.com/wiki/Running_Hadoop_On_Ubuntu_Linux_(Multi-Node_Cluster) I'm not using Ubuntu, but Debian Lenny, with Java 1.6.x and Hadoop 0.18.1 installed on the systems. All daemons are running correctly, and the HDFS is working properly, with the default replication level of 2 so that all files are replicated in both PCs. Both hosts have the clock set up correctly. The problem begins when I try to run the classic wordcount test, I load some Gutemberg files on the HDFS, and then: $ bin/hadoop jar hadoop-0.18.1-examples.jar wordcount gutemberg gutemberg-output The map phase starts and reaches 100%, then reduce starts and freezes at approx 14%. I waited several minutes but the job didn't finish. running "hadoop job -list" gives me this ouput: $ bin/hadoop job -list 1 jobs currently running JobId State StartTime UserName job_200810151758_0003 1 1224106290709 hadoop ...and I can kill it successfully. In my last test I left the job running and 1 hour later it was terminated with the following messages: 08/10/15 20:56:38 INFO mapred.JobClient: Task Id : attempt_200810151952_0001_m_000002_0, Status : FAILED Too many fetch-failures 08/10/15 20:59:47 WARN mapred.JobClient: Error reading task outputConnection timed out 08/10/15 21:02:56 WARN mapred.JobClient: Error reading task outputConnection timed out 08/10/15 21:02:57 INFO mapred.JobClient: Job complete: job_200810151952_0001 08/10/15 21:02:57 INFO mapred.JobClient: Counters: 16 08/10/15 21:02:57 INFO mapred.JobClient: File Systems 08/10/15 21:02:57 INFO mapred.JobClient: HDFS bytes read=6945126 08/10/15 21:02:57 INFO mapred.JobClient: HDFS bytes written=1410309 08/10/15 21:02:57 INFO mapred.JobClient: Local bytes read=3472685 08/10/15 21:02:57 INFO mapred.JobClient: Local bytes written=6422750 08/10/15 21:02:57 INFO mapred.JobClient: Job Counters 08/10/15 21:02:57 INFO mapred.JobClient: Launched reduce tasks=1 08/10/15 21:02:57 INFO mapred.JobClient: Launched map tasks=12 08/10/15 21:02:57 INFO mapred.JobClient: Data-local map tasks=12 08/10/15 21:02:57 INFO mapred.JobClient: Map-Reduce Framework 08/10/15 21:02:57 INFO mapred.JobClient: Reduce input groups=128360 08/10/15 21:02:57 INFO mapred.JobClient: Combine output records=329346 08/10/15 21:02:57 INFO mapred.JobClient: Map input records=137114 08/10/15 21:02:57 INFO mapred.JobClient: Reduce output records=128360 08/10/15 21:02:57 INFO mapred.JobClient: Map output bytes=11428977 08/10/15 21:02:57 INFO mapred.JobClient: Map input bytes=6945126 08/10/15 21:02:57 INFO mapred.JobClient: Combine input records=1375481 08/10/15 21:02:57 INFO mapred.JobClient: Map output records=1174495 08/10/15 21:02:57 INFO mapred.JobClient: Reduce input records=128360 When I start the cluster with only the master server (namenode, jobtracker, datanode and tasktracker) the job works perfectly, so I suppose there's some problem with the communication against the slave node. Any help will be appreciated. -- Lucas Di Pentima - http://lucas.di-pentima.com.ar GnuPG Public Key: http://pgp.mit.edu:11371/pks/lookup?op=get&search=0x6AA54FC9 Key fingerprint = BD3B 08C4 661A 8C3B 1855 740C 8F98 3FCF 6AA5 4FC9
signature.asc
Description: Esta parte del mensaje está firmada digitalmente