FYI: The issue was happening due to a bloated distributed cache. I was reusing the same Configuration object between the jobs and adding the same numbr of jar files to the distrubuted cache in every job. Once I reduced the number of cached jar files, it ran more jobs before failing (Earlier it was failing in 3rd job). I tried DistributedCache.purgeCache(conf) between jobs, but it didnot fix the problem, could be a bug some where ? The work around was to use a differnt Configuration object per job. Thanks, Murali Krishna
________________________________ From: Murali Krishna. P <muralikpb...@yahoo.com> To: common-user@hadoop.apache.org Sent: Fri, 15 October, 2010 6:01:10 PM Subject: Re: Hadoop starting extra map tasks and eventually failing Thanks Amareshwari, That explains the spurious extra tasks n the log. However I am not getting the userlogs for the failed setup task because the jvm it tries to run in fails immediately. I get only tasktracker log like this: 2010-10-15 03:46:53,397 INFO org.apache.hadoop.mapred.JvmManager: In JvmRunner constructed JVM ID: jvm_201010140533_0157_m_-1758278022 2010-10-15 03:46:53,398 INFO org.apache.hadoop.mapred.JvmManager: JVM Runner jvm_201010140533_0157_m_-1758278022 spawned. 2010-10-15 03:46:53,918 INFO org.apache.hadoop.mapred.JvmManager: JVM : jvm_201010140533_0157_m_-1758278022 exited. Number of tasks it ran: 0 2010-10-15 03:46:56,946 INFO org.apache.hadoop.mapred.TaskRunner: attempt_201010140533_0157_m_000005_1 done; removing files. 2010-10-15 03:46:56,946 INFO org.apache.hadoop.mapred.TaskTracker: addFreeSlot : current free slots : 2 2010-10-15 03:46:58,050 INFO org.apache.hadoop.mapred.TaskTracker: LaunchTaskAction (registerTask): attempt_201010140533_0157_m_000005_2 task's state:UNASSIG NED It is tough to figure out what is going wrong in my setup task without userlogs . It is a series of same job with different input. Usually the first 2 job succeeds and the 3rd job fails. What exactly gets run in setup task? I guess the split calculation etc. Since the jvm is getting exited with in few milliseconds according to the above log, I am not sure whether it is reaching the application's code at all. Thanks, Murali Krishna ________________________________ From: Amareshwari Sri Ramadasu <amar...@yahoo-inc.com> To: "common-user@hadoop.apache.org" <common-user@hadoop.apache.org> Sent: Fri, 15 October, 2010 5:17:38 PM Subject: Re: Hadoop starting extra map tasks and eventually failing These extra tasks are job-setup and job-cleanup tasks which use map/reduce slots to run. Looks like job-setup task failed for your second job even after retries, so no maps are scheduled. But you should see tasklogs for the failed tasks. Thanks Amareshwari On 10/15/10 5:11 PM, "Murali Krishna. P" <muralikpb...@yahoo.com> wrote: Hi, I have attached the relevant part of jobtracker log. The job1 had 3 splits, but it started 5 map tasks, m_00000 through m_00004. ( I have the speculative execution turned off). The job some how succededs, the log files for 4th and 5th task didnt get any records. Hovewer the next job again has 3 splits but this time it schedules only m_00003 m_00004 and both of them fail. There is no userlogs created for these 2 tasks. The tasktracker log mentions that the jvm has spawned and exited immediately. And it doesnot schedule the first 3 map tasks and the job fails since 4th and 5th task fail even after retries. Why is extra tasks gettin scheduled ? How did those tasks pass in the first case? Why the right tasks are not scheduled in the second job? This is easily reproducible, please take a look at JT log and advise. Thanks, Murali Krishna