Hello to all.
I have 2 nodes in cluster - master + slave.
names "master1" and "slave1" stored in /etc/hosts on both hosts and they
are 100% correct.
conf/masters:
master1
conf/slaves:
master1
slave1
"conf/slaves" + "conf/masters" are empty on "slave1" node. I tried to fill
them in many ways - it didn't helped.
"master1" is AMD-64, "slave1" is Xeon-32.
I have compiled one C++ wordcount-simple binary on 32bit machine and put
it on HDFS.
The binary successfully runs on both machines.
I have 5 files in "/input" on HDFS:
i1.txt - 2 MB
i2.txt - 2 MB
i3.txt - 2 MB
i4.txt ~ 50 MB
i5.txt ~ 50 MB
I have tried 0.18.3, 0.19.1, "trunk" svn dir, "branch-0.20" svn dir.
Result the same...
running job on "master1":
localhost$> bin/hadoop pipes -conf src/examples/pipes/conf/word.xml -input
/input -output /o1
word.xml: http://pastebin.com/m25577ea4
conf/hadoop-default.xml: http://pastebin.com/m199c08f0
conf/hadoop-site.xml: http://pastebin.com/m321ead97
conf/hadoop-env.sh: http://pastebin.com/m41c36f2f
Console output on "master1" contains WARN messages about fetching errors:
09/03/06 09:44:23 WARN mapred.JobClient: Error reading task
outputhttp://localhost:50060/tasklog?plaintext=true&taskid=attempt_200903060939_0001_m_000000_0&filter=stdout
[master1] logs/hadoop-hadoop-tasktracker-localhost.log contains this many
times:
2009-03-06 09:41:51,178 WARN org.apache.hadoop.mapred.TaskTracker:
getMapOutput(attempt_200903060939_0001_m_000000_0,1) failed :
org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find
taskTracker/jobcache/job_200903060939_0001/attempt_200903060939_0001_m_000000_0/output/file.out.index
in any of the configured local directories
2009-03-06 09:41:51,179 WARN org.apache.hadoop.mapred.TaskTracker: Unknown
child with bad map output: attempt_200903060939_0001_m_000000_0. Ignored.
2009-03-06 09:41:51,224 INFO
org.apache.hadoop.mapred.TaskTracker.clienttrace: src: 127.0.0.1:50060,
dest: 127.0.0.1:53917, bytes: 0, op: MAPRED_SHUFFLE, cliID:
attempt_200903060939_0001_m_000000_0
2009-03-06 09:41:51,224 WARN org.mortbay.log: /mapOutput:
org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find
taskTracker/jobcache/job_200903060939_0001/attempt_200903060939_0001_m_000000_0/output/file.out.index
in any of the configured local directories
[slave1] logs/hadoop-hadoop-tasktracker-srv.log contains this:
...
2009-03-06 09:40:50,094 INFO org.apache.hadoop.mapred.TaskTracker:
attempt_200903060939_0001_m_000000_0 0.61383915%
hdfs://master1:9000/inputi4.txt:0+53360000
2009-03-06 09:40:50,188 INFO org.apache.hadoop.mapred.TaskTracker:
attempt_200903060939_0001_m_000001_0 0.59977823%
hdfs://master1:9000/inputi5.txt:0+53360000
2009-03-06 09:40:53,097 INFO org.apache.hadoop.mapred.TaskTracker:
attempt_200903060939_0001_m_000000_0 0.66882175%
hdfs://master1:9000/inputi4.txt:0+53360000
2009-03-06 09:40:53,191 INFO org.apache.hadoop.mapred.TaskTracker:
attempt_200903060939_0001_m_000001_0 0.64430434%
hdfs://master1:9000/inputi5.txt:0+53360000
2009-03-06 09:40:56,100 INFO org.apache.hadoop.mapred.TaskTracker:
attempt_200903060939_0001_m_000000_0 0.7192957%
hdfs://master1:9000/inputi4.txt:0+53360000
2009-03-06 09:40:56,194 INFO org.apache.hadoop.mapred.TaskTracker:
attempt_200903060939_0001_m_000001_0 0.68883044%
hdfs://master1:9000/inputi5.txt:0+53360000
2009-03-06 09:40:59,103 INFO org.apache.hadoop.mapred.TaskTracker:
attempt_200903060939_0001_m_000000_0 0.7661652%
hdfs://master1:9000/inputi4.txt:0+53360000
2009-03-06 09:40:59,212 INFO org.apache.hadoop.mapred.TaskTracker:
attempt_200903060939_0001_m_000001_0 0.7263261%
hdfs://master1:9000/inputi5.txt:0+53360000
2009-03-06 09:41:02,106 INFO org.apache.hadoop.mapred.TaskTracker:
attempt_200903060939_0001_m_000000_0 0.80600435%
hdfs://master1:9000/inputi4.txt:0+53360000
2009-03-06 09:41:02,271 INFO org.apache.hadoop.mapred.TaskTracker:
attempt_200903060939_0001_m_000001_0 0.7802261%
hdfs://master1:9000/inputi5.txt:0+53360000
...
I have read some mailing lists and saw discussions about the ability nodes
to network connections to each other,
but i cant imagine where my error is... Iptables is empty and i can ssh
from master to slave and from slave to master... Also i checked
tcp-connections from one to another with ports 9000, 9001 and other (by
running "nc")...
Just another description of this problem:
http://dramele.livejournal.com/101634.html
Pavel.