Hello to all.
I have 2 nodes in cluster - master + slave.
names "master1" and "slave1" stored in /etc/hosts on both hosts and they are 100% correct.

conf/masters:
master1

conf/slaves:
master1
slave1

"conf/slaves" + "conf/masters" are empty on "slave1" node. I tried to fill them in many ways - it didn't helped.

"master1" is AMD-64, "slave1" is Xeon-32.
I have compiled one C++ wordcount-simple binary on 32bit machine and put it on HDFS.
The binary successfully runs on both machines.

I have 5 files in "/input" on HDFS:
i1.txt - 2 MB
i2.txt - 2 MB
i3.txt - 2 MB
i4.txt ~ 50 MB
i5.txt ~ 50 MB

I have tried 0.18.3, 0.19.1, "trunk" svn dir, "branch-0.20" svn dir. Result the same...

running job on "master1":
localhost$> bin/hadoop pipes -conf src/examples/pipes/conf/word.xml -input /input -output /o1

word.xml: http://pastebin.com/m25577ea4
conf/hadoop-default.xml: http://pastebin.com/m199c08f0
conf/hadoop-site.xml: http://pastebin.com/m321ead97
conf/hadoop-env.sh: http://pastebin.com/m41c36f2f



Console output on "master1" contains WARN messages about fetching errors:

09/03/06 09:44:23 WARN mapred.JobClient: Error reading task outputhttp://localhost:50060/tasklog?plaintext=true&taskid=attempt_200903060939_0001_m_000000_0&filter=stdout

[master1] logs/hadoop-hadoop-tasktracker-localhost.log contains this many times:

2009-03-06 09:41:51,178 WARN org.apache.hadoop.mapred.TaskTracker: getMapOutput(attempt_200903060939_0001_m_000000_0,1) failed : org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find taskTracker/jobcache/job_200903060939_0001/attempt_200903060939_0001_m_000000_0/output/file.out.index in any of the configured local directories 2009-03-06 09:41:51,179 WARN org.apache.hadoop.mapred.TaskTracker: Unknown child with bad map output: attempt_200903060939_0001_m_000000_0. Ignored. 2009-03-06 09:41:51,224 INFO org.apache.hadoop.mapred.TaskTracker.clienttrace: src: 127.0.0.1:50060, dest: 127.0.0.1:53917, bytes: 0, op: MAPRED_SHUFFLE, cliID: attempt_200903060939_0001_m_000000_0 2009-03-06 09:41:51,224 WARN org.mortbay.log: /mapOutput: org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find taskTracker/jobcache/job_200903060939_0001/attempt_200903060939_0001_m_000000_0/output/file.out.index in any of the configured local directories

[slave1] logs/hadoop-hadoop-tasktracker-srv.log contains this:

...
2009-03-06 09:40:50,094 INFO org.apache.hadoop.mapred.TaskTracker: attempt_200903060939_0001_m_000000_0 0.61383915% hdfs://master1:9000/inputi4.txt:0+53360000 2009-03-06 09:40:50,188 INFO org.apache.hadoop.mapred.TaskTracker: attempt_200903060939_0001_m_000001_0 0.59977823% hdfs://master1:9000/inputi5.txt:0+53360000 2009-03-06 09:40:53,097 INFO org.apache.hadoop.mapred.TaskTracker: attempt_200903060939_0001_m_000000_0 0.66882175% hdfs://master1:9000/inputi4.txt:0+53360000 2009-03-06 09:40:53,191 INFO org.apache.hadoop.mapred.TaskTracker: attempt_200903060939_0001_m_000001_0 0.64430434% hdfs://master1:9000/inputi5.txt:0+53360000 2009-03-06 09:40:56,100 INFO org.apache.hadoop.mapred.TaskTracker: attempt_200903060939_0001_m_000000_0 0.7192957% hdfs://master1:9000/inputi4.txt:0+53360000 2009-03-06 09:40:56,194 INFO org.apache.hadoop.mapred.TaskTracker: attempt_200903060939_0001_m_000001_0 0.68883044% hdfs://master1:9000/inputi5.txt:0+53360000 2009-03-06 09:40:59,103 INFO org.apache.hadoop.mapred.TaskTracker: attempt_200903060939_0001_m_000000_0 0.7661652% hdfs://master1:9000/inputi4.txt:0+53360000 2009-03-06 09:40:59,212 INFO org.apache.hadoop.mapred.TaskTracker: attempt_200903060939_0001_m_000001_0 0.7263261% hdfs://master1:9000/inputi5.txt:0+53360000 2009-03-06 09:41:02,106 INFO org.apache.hadoop.mapred.TaskTracker: attempt_200903060939_0001_m_000000_0 0.80600435% hdfs://master1:9000/inputi4.txt:0+53360000 2009-03-06 09:41:02,271 INFO org.apache.hadoop.mapred.TaskTracker: attempt_200903060939_0001_m_000001_0 0.7802261% hdfs://master1:9000/inputi5.txt:0+53360000
...

I have read some mailing lists and saw discussions about the ability nodes to network connections to each other, but i cant imagine where my error is... Iptables is empty and i can ssh from master to slave and from slave to master... Also i checked tcp-connections from one to another with ports 9000, 9001 and other (by running "nc")...

Just another description of this problem:
http://dramele.livejournal.com/101634.html

Pavel.

Reply via email to