Hi all, My problem is the same problem as http://issues.apache.org/jira/browse/HADOOP-3362 and there no solution is given :(
1. I am using hadoop 20.1. My structure is very simple. I have two machines (both are Ubuntu machines) machine1 = namenode, jobtracker and also datanode and tasktracker. (We will call this as master) machine2 = datanode, namenode (We will call this as slave) Same as given in http://www.michael-noll.com/wiki/Running_Hadoop_On_Ubuntu_Linux_(Multi-Node_Cluster) Just one difference I have not changed my /etc/hosts file as I am using ip address in conf files. *Is it ok?* * * 2. The program is running fine with stand alone mode but in multi node mode it is halting in reduce phase and eventually returning successfully. I am running just word count example. /**************************/ 10/01/27 12:08:21 INFO input.FileInputFormat: Total input paths to process : 17 10/01/27 12:08:21 INFO mapred.JobClient: Running job: job_201001271157_0002 10/01/27 12:08:22 INFO mapred.JobClient: map 0% reduce 0% 10/01/27 12:08:39 INFO mapred.JobClient: map 11% reduce 0% 10/01/27 12:08:46 INFO mapred.JobClient: map 23% reduce 0% 10/01/27 12:08:53 INFO mapred.JobClient: map 35% reduce 0% 10/01/27 12:08:56 INFO mapred.JobClient: map 47% reduce 3% 10/01/27 12:09:02 INFO mapred.JobClient: map 58% reduce 7% 10/01/27 12:09:05 INFO mapred.JobClient: map 70% reduce 7% 10/01/27 12:09:08 INFO mapred.JobClient: map 82% reduce 11% 10/01/27 12:09:11 INFO mapred.JobClient: map 88% reduce 11% 10/01/27 12:09:14 INFO mapred.JobClient: map 100% reduce 11% 10/01/27 12:09:23 INFO mapred.JobClient: map 100% reduce 17% 10/01/27 12:16:39 INFO mapred.JobClient: Task Id : attempt_201001271157_0002_m_000002_0, Status : FAILED Too many fetch-failures 10/01/27 12:16:54 INFO mapred.JobClient: map 100% reduce 19% 10/01/27 12:26:52 INFO mapred.JobClient: Task Id : attempt_201001271157_0002_m_000003_0, Status : FAILED Too many fetch-failures 10/01/27 12:27:08 INFO mapred.JobClient: map 100% reduce 21% 10/01/27 12:37:08 INFO mapred.JobClient: Task Id : attempt_201001271157_0002_m_000006_0, Status : FAILED Too many fetch-failures 10/01/27 12:37:24 INFO mapred.JobClient: map 100% reduce 23% 10/01/27 12:47:24 INFO mapred.JobClient: Task Id : attempt_201001271157_0002_m_000007_0, Status : FAILED Too many fetch-failures 10/01/27 12:47:28 INFO mapred.JobClient: map 94% reduce 23% 10/01/27 12:47:31 INFO mapred.JobClient: map 100% reduce 23% 10/01/27 12:47:40 INFO mapred.JobClient: map 100% reduce 25% 10/01/27 12:57:38 INFO mapred.JobClient: Task Id : attempt_201001271157_0002_m_000010_0, Status : FAILED Too many fetch-failures 10/01/27 12:57:54 INFO mapred.JobClient: map 100% reduce 27% 10/01/27 13:07:55 INFO mapred.JobClient: Task Id : attempt_201001271157_0002_m_000011_0, Status : FAILED Too many fetch-failures 10/01/27 13:08:11 INFO mapred.JobClient: map 100% reduce 29% 10/01/27 13:18:11 INFO mapred.JobClient: Task Id : attempt_201001271157_0002_m_000014_0, Status : FAILED Too many fetch-failures 10/01/27 13:18:27 INFO mapred.JobClient: map 100% reduce 31% 10/01/27 13:28:24 INFO mapred.JobClient: Task Id : attempt_201001271157_0002_m_000015_0, Status : FAILED Too many fetch-failures 10/01/27 13:28:40 INFO mapred.JobClient: map 100% reduce 100% 10/01/27 13:28:42 INFO mapred.JobClient: Job complete: job_201001271157_0002 10/01/27 13:28:42 INFO mapred.JobClient: Counters: 17 10/01/27 13:28:42 INFO mapred.JobClient: Job Counters 10/01/27 13:28:42 INFO mapred.JobClient: Launched reduce tasks=1 10/01/27 13:28:42 INFO mapred.JobClient: Launched map tasks=25 10/01/27 13:28:42 INFO mapred.JobClient: Data-local map tasks=25 10/01/27 13:28:42 INFO mapred.JobClient: FileSystemCounters 10/01/27 13:28:42 INFO mapred.JobClient: FILE_BYTES_READ=16584 10/01/27 13:28:42 INFO mapred.JobClient: HDFS_BYTES_READ=18805 10/01/27 13:28:42 INFO mapred.JobClient: FILE_BYTES_WRITTEN=33808 10/01/27 13:28:42 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=10731 10/01/27 13:28:42 INFO mapred.JobClient: Map-Reduce Framework 10/01/27 13:28:42 INFO mapred.JobClient: Reduce input groups=0 10/01/27 13:28:42 INFO mapred.JobClient: Combine output records=821 10/01/27 13:28:42 INFO mapred.JobClient: Map input records=580 10/01/27 13:28:42 INFO mapred.JobClient: Reduce shuffle bytes=16680 10/01/27 13:28:42 INFO mapred.JobClient: Reduce output records=0 10/01/27 13:28:42 INFO mapred.JobClient: Spilled Records=1642 10/01/27 13:28:42 INFO mapred.JobClient: Map output bytes=25180 10/01/27 13:28:42 INFO mapred.JobClient: Combine input records=1818 10/01/27 13:28:42 INFO mapred.JobClient: Map output records=1818 10/01/27 13:28:42 INFO mapred.JobClient: Reduce input records=821 /**************************/ I checked the logs for namenodes/jobtracker/datanodes/tasktracker: (attached herewith.) There no exception in the files. Just failure statement in jobtracker logs as /*-------------------------*/ 2010-01-27 12:26:51,554 INFO org.apache.hadoop.mapred.JobTracker: Adding task 'attempt_201001271157_0002_m_000003_1' to tip task_201001271157_0002_m_000003, for tracker 'tracker_hadoop-desktop2:localhost/127.0.0.1:55734' 2010-01-27 12:26:51,554 INFO org.apache.hadoop.mapred.JobInProgress: Choosing data-local task task_201001271157_0002_m_000003 2010-01-27 12:26:54,350 INFO org.apache.hadoop.mapred.JobTracker: Removed completed task 'attempt_201001271157_0002_m_000003_0' from 'tracker_hadoop-desktop1:localhost/127.0.0.1:36778' 2010-01-27 12:26:54,626 INFO org.apache.hadoop.mapred.JobInProgress: Task 'attempt_201001271157_0002_m_000003_1' has completed task_201001271157_0002_m_000003 successfully. 2010-01-27 12:26:54,627 INFO org.apache.hadoop.mapred.ResourceEstimator: completedMapsUpdates:19 completedMapsInputSize:23876 completedMapsOutputSize:22641 2010-01-27 12:29:30,987 INFO org.apache.hadoop.mapred.JobInProgress: Failed fetch notification #1 for task attempt_201001271157_0002_m_000006_0 2010-01-27 12:32:07,410 INFO org.apache.hadoop.mapred.JobInProgress: Failed fetch notification #2 for task attempt_201001271157_0002_m_000006_0 2010-01-27 12:37:08,075 INFO org.apache.hadoop.mapred.JobInProgress: Failed fetch notification #3 for task attempt_201001271157_0002_m_000006_0 2010-01-27 12:37:08,075 INFO org.apache.hadoop.mapred.JobInProgress: Too many fetch-failures for output of task: attempt_201001271157_0002_m_000006_0 ... killing it 2010-01-27 12:37:08,075 INFO org.apache.hadoop.mapred.TaskInProgress: Error from attempt_201001271157_0002_m_000006_0: Too many fetch-failures 2010-01-27 12:37:08,076 INFO org.apache.hadoop.mapred.JobTracker: Adding task 'attempt_201001271157_0002_m_000006_1' to tip task_201001271157_0002_m_000006, for tracker 'tracker_hadoop-desktop2:localhost/127.0.0.1:55734' 2010-01-27 12:37:08,076 INFO org.apache.hadoop.mapred.JobInProgress: Choosing data-local task task_201001271157_0002_m_000006 2010-01-27 12:37:10,613 INFO org.apache.hadoop.mapred.JobTracker: Removed completed task 'attempt_201001271157_0002_m_000006_0' from 'tracker_hadoop-desktop1:localhost/127.0.0.1:36778' 2010-01-27 12:37:11,084 INFO org.apache.hadoop.mapred.JobInProgress: Task 'attempt_201001271157_0002_m_000006_1' has completed task_201001271157_0002_m_000006 successfully. 2010-01-27 12:37:11,084 INFO org.apache.hadoop.mapred.ResourceEstimator: completedMapsUpdates:20 completedMapsInputSize:25072 completedMapsOutputSize:23508 2010-01-27 12:39:47,424 INFO org.apache.hadoop.mapred.JobInProgress: Failed fetch notification #1 for task attempt_201001271157_0002_m_000007_0 2010-01-27 12:42:23,822 INFO org.apache.hadoop.mapred.JobInProgress: Failed fetch notification #2 for task attempt_201001271157_0002_m_000007_0 2010-01-27 12:47:24,576 INFO org.apache.hadoop.mapred.JobInProgress: Failed fetch notification #3 for task attempt_201001271157_0002_m_000007_0 2010-01-27 12:47:24,578 INFO org.apache.hadoop.mapred.JobInProgress: Too many fetch-failures for output of task: attempt_201001271157_0002_m_000007_0 ... killing it 2010-01-27 12:47:24,578 INFO org.apache.hadoop.mapred.TaskInProgress: Error from attempt_201001271157_0002_m_000007_0: Too many fetch-failures 2010-01-27 12:47:24,578 INFO org.apache.hadoop.mapred.JobInProgress: TaskTracker at 'hadoop-desktop1' turned 'flaky' 2010-01-27 12:47:24,579 INFO org.apache.hadoop.mapred.JobTracker: Adding task 'attempt_201001271157_0002_m_000007_1' to tip task_201001271157_0002_m_000007, for tracker 'tracker_hadoop-desktop2:localhost/127.0.0.1:55734' 2010-01-27 12:47:24,579 INFO org.apache.hadoop.mapred.JobInProgress: Choosing data-local task task_201001271157_0002_m_000007 /*--------------------------*/ More info: 1. The job filed due to "Too many fetch-failures" are on the *master machine only*. the slave able to finish those jobs. 2. From master/slave machine, we could not able to access web UI when ip address of master is given. But we can access web UI, when use localhost instead of ip address of master machine on master machine. - Nachiket On Fri, Jan 22, 2010 at 7:13 PM, Sayali <sayali.kulka...@gmail.com> wrote: > Hey Nachiket! > So nice to hear from you! I recently joined back PSL and currently working > hard on adjusting with the new environment :) I guess you can understand > what I mean -- 2 years in IIT, its tough to get back :) > > Well... your news server needs to be tested! It should not give out such > false info! :P (but anyways, ye reporter logon ko masala lagake bolane ki > aadat hoti hai... to samajh lo jo samajhana hai :) ) > > Jokes apart... I have worked little bit on hadoop. so let me know what help > you need. I will try to help as much as my little memory can allow.. > > :) > --s > > > > On Fri, Jan 22, 2010 at 9:39 PM, Nachiket Vaidya <vaidy...@gmail.com>wrote: > >> Hey Sayali, >> How are you? Where are you now? >> >> I am using Hadoop. From news server I got the info that you are boss in >> hadoop. I want some help about it. >> Do you help me? >> >> - Nachiket >> > >