Fetch errors. 2 node cluster.

pavelkolodin Thu, 05 Mar 2009 23:12:32 -0800


Hello to all.
I have 2 nodes in cluster - master + slave.

names "master1" and "slave1" stored in /etc/hosts on both hosts and theyare 100% correct.


conf/masters:
master1

conf/slaves:
master1
slave1

"conf/slaves" + "conf/masters" are empty on "slave1" node. I tried to fillthem in many ways - it didn't helped.


"master1" is AMD-64, "slave1" is Xeon-32.

I have compiled one C++ wordcount-simple binary on 32bit machine and putit on HDFS.

The binary successfully runs on both machines.

I have 5 files in "/input" on HDFS:
i1.txt - 2 MB
i2.txt - 2 MB
i3.txt - 2 MB
i4.txt ~ 50 MB
i5.txt ~ 50 MB

I have tried 0.18.3, 0.19.1, "trunk" svn dir, "branch-0.20" svn dir.Result the same...


running job on "master1":

localhost$> bin/hadoop pipes -conf src/examples/pipes/conf/word.xml -input/input -output /o1


word.xml: http://pastebin.com/m25577ea4
conf/hadoop-default.xml: http://pastebin.com/m199c08f0
conf/hadoop-site.xml: http://pastebin.com/m321ead97
conf/hadoop-env.sh: http://pastebin.com/m41c36f2f



Console output on "master1" contains WARN messages about fetching errors:

09/03/06 09:44:23 WARN mapred.JobClient: Error reading taskoutputhttp://localhost:50060/tasklog?plaintext=true&taskid=attempt_200903060939_0001_m_000000_0&filter=stdout

[master1] logs/hadoop-hadoop-tasktracker-localhost.log contains this manytimes:

2009-03-06 09:41:51,178 WARN org.apache.hadoop.mapred.TaskTracker:getMapOutput(attempt_200903060939_0001_m_000000_0,1) failed :org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not findtaskTracker/jobcache/job_200903060939_0001/attempt_200903060939_0001_m_000000_0/output/file.out.indexin any of the configured local directories2009-03-06 09:41:51,179 WARN org.apache.hadoop.mapred.TaskTracker: Unknownchild with bad map output: attempt_200903060939_0001_m_000000_0. Ignored.2009-03-06 09:41:51,224 INFOorg.apache.hadoop.mapred.TaskTracker.clienttrace: src: 127.0.0.1:50060,dest: 127.0.0.1:53917, bytes: 0, op: MAPRED_SHUFFLE, cliID:attempt_200903060939_0001_m_000000_02009-03-06 09:41:51,224 WARN org.mortbay.log: /mapOutput:org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not findtaskTracker/jobcache/job_200903060939_0001/attempt_200903060939_0001_m_000000_0/output/file.out.indexin any of the configured local directories


[slave1] logs/hadoop-hadoop-tasktracker-srv.log contains this:

...

2009-03-06 09:40:50,094 INFO org.apache.hadoop.mapred.TaskTracker:attempt_200903060939_0001_m_000000_0 0.61383915%hdfs://master1:9000/inputi4.txt:0+533600002009-03-06 09:40:50,188 INFO org.apache.hadoop.mapred.TaskTracker:attempt_200903060939_0001_m_000001_0 0.59977823%hdfs://master1:9000/inputi5.txt:0+533600002009-03-06 09:40:53,097 INFO org.apache.hadoop.mapred.TaskTracker:attempt_200903060939_0001_m_000000_0 0.66882175%hdfs://master1:9000/inputi4.txt:0+533600002009-03-06 09:40:53,191 INFO org.apache.hadoop.mapred.TaskTracker:attempt_200903060939_0001_m_000001_0 0.64430434%hdfs://master1:9000/inputi5.txt:0+533600002009-03-06 09:40:56,100 INFO org.apache.hadoop.mapred.TaskTracker:attempt_200903060939_0001_m_000000_0 0.7192957%hdfs://master1:9000/inputi4.txt:0+533600002009-03-06 09:40:56,194 INFO org.apache.hadoop.mapred.TaskTracker:attempt_200903060939_0001_m_000001_0 0.68883044%hdfs://master1:9000/inputi5.txt:0+533600002009-03-06 09:40:59,103 INFO org.apache.hadoop.mapred.TaskTracker:attempt_200903060939_0001_m_000000_0 0.7661652%hdfs://master1:9000/inputi4.txt:0+533600002009-03-06 09:40:59,212 INFO org.apache.hadoop.mapred.TaskTracker:attempt_200903060939_0001_m_000001_0 0.7263261%hdfs://master1:9000/inputi5.txt:0+533600002009-03-06 09:41:02,106 INFO org.apache.hadoop.mapred.TaskTracker:attempt_200903060939_0001_m_000000_0 0.80600435%hdfs://master1:9000/inputi4.txt:0+533600002009-03-06 09:41:02,271 INFO org.apache.hadoop.mapred.TaskTracker:attempt_200903060939_0001_m_000001_0 0.7802261%hdfs://master1:9000/inputi5.txt:0+53360000

...

I have read some mailing lists and saw discussions about the ability nodesto network connections to each other,but i cant imagine where my error is... Iptables is empty and i can sshfrom master to slave and from slave to master... Also i checkedtcp-connections from one to another with ports 9000, 9001 and other (byrunning "nc")...


Just another description of this problem:
http://dramele.livejournal.com/101634.html

Pavel.

Fetch errors. 2 node cluster.

Reply via email to