Hi,
    I have attached the relevant part of jobtracker log. The job1 had 3 splits, 
but it started 5 map tasks, m_00000 through m_00004. ( I have the speculative 
execution turned off). The job some how succededs, the log files for 4th and 
5th 
task didnt get any records. Hovewer the next job again has 3 splits but this 
time it schedules only m_00003 m_00004 and both of them fail. There is no 
userlogs created for these 2 tasks. The tasktracker log mentions that the jvm 
has spawned and exited immediately. And it doesnot schedule the first 3 map 
tasks and the job fails since 4th and 5th task fail even after retries.

Why is extra tasks gettin scheduled ?
How did those tasks pass in the first case?
Why the right tasks are not scheduled in the second job?

This is easily reproducible, please take a look at JT log and advise. 

 Thanks,
Murali Krishna
2010-10-15 02:33:26,192 INFO org.apache.hadoop.mapred.JobTracker: Initializing job_201010140533_0134
2010-10-15 02:33:26,192 INFO org.apache.hadoop.mapred.JobInProgress: Initializing job_201010140533_0134
2010-10-15 02:33:26,294 INFO org.apache.hadoop.mapred.JobInProgress: Input size for job job_201010140533_0134 = 0. Number of splits = 3
2010-10-15 02:33:26,294 INFO org.apache.hadoop.mapred.JobInProgress: tip:task_201010140533_0134_m_000000 has split on node:/default-rack/machine1.com
2010-10-15 02:33:26,294 INFO org.apache.hadoop.mapred.JobInProgress: tip:task_201010140533_0134_m_000001 has split on node:/default-rack/machine1.com
2010-10-15 02:33:26,294 INFO org.apache.hadoop.mapred.JobInProgress: tip:task_201010140533_0134_m_000002 has split on node:/default-rack/machine1.com
2010-10-15 02:33:26,349 INFO org.apache.hadoop.mapred.JobTracker: Adding task 'attempt_201010140533_0134_m_000004_0' to tip task_201010140533_0134_m_000004, for tracker 'tracker_machine2.com:localhost.localdomain/127.0.0.1:33439'
2010-10-15 02:33:59,799 INFO org.apache.hadoop.mapred.JobInProgress: Task 'attempt_201010140533_0134_m_000004_0' has completed task_201010140533_0134_m_000004 successfully.
2010-10-15 02:33:59,800 INFO org.apache.hadoop.mapred.JobTracker: Adding task 'attempt_201010140533_0134_m_000000_0' to tip task_201010140533_0134_m_000000, for tracker 'tracker_machine2.com:localhost.localdomain/127.0.0.1:33439'
2010-10-15 02:33:59,800 INFO org.apache.hadoop.mapred.JobInProgress: Choosing rack-local task task_201010140533_0134_m_000000
2010-10-15 02:33:59,801 INFO org.apache.hadoop.mapred.JobTracker: Adding task 'attempt_201010140533_0134_m_000001_0' to tip task_201010140533_0134_m_000001, for tracker 'tracker_machine2.com:localhost.localdomain/127.0.0.1:33439'
2010-10-15 02:33:59,801 INFO org.apache.hadoop.mapred.JobInProgress: Choosing rack-local task task_201010140533_0134_m_000001
2010-10-15 02:39:16,361 INFO org.apache.hadoop.mapred.JobInProgress: Task 'attempt_201010140533_0134_m_000000_0' has completed task_201010140533_0134_m_000000 successfully.
2010-10-15 02:39:16,362 INFO org.apache.hadoop.mapred.JobTracker: Adding task 'attempt_201010140533_0134_m_000002_0' to tip task_201010140533_0134_m_000002, for tracker 'tracker_machine2.com:localhost.localdomain/127.0.0.1:33439'
2010-10-15 02:39:16,362 INFO org.apache.hadoop.mapred.JobInProgress: Choosing rack-local task task_201010140533_0134_m_000002
2010-10-15 02:39:16,363 INFO org.apache.hadoop.mapred.JobTracker: Adding task 'attempt_201010140533_0134_r_000000_0' to tip task_201010140533_0134_r_000000, for tracker 'tracker_machine2.com:localhost.localdomain/127.0.0.1:33439'
2010-10-15 02:39:19,375 INFO org.apache.hadoop.mapred.JobTracker: Adding task 'attempt_201010140533_0134_r_000001_0' to tip task_201010140533_0134_r_000001, for tracker 'tracker_machine2.com:localhost.localdomain/127.0.0.1:33439'
2010-10-15 02:39:28,603 INFO org.apache.hadoop.mapred.JobInProgress: Task 'attempt_201010140533_0134_m_000001_0' has completed task_201010140533_0134_m_000001 successfully.
2010-10-15 02:42:59,694 INFO org.apache.hadoop.mapred.JobInProgress: Task 'attempt_201010140533_0134_m_000002_0' has completed task_201010140533_0134_m_000002 successfully.
2010-10-15 02:43:23,786 INFO org.apache.hadoop.mapred.JobInProgress: Task 'attempt_201010140533_0134_r_000000_0' has completed task_201010140533_0134_r_000000 successfully.
2010-10-15 02:43:23,787 INFO org.apache.hadoop.mapred.JobInProgress: Task 'attempt_201010140533_0134_r_000001_0' has completed task_201010140533_0134_r_000001 successfully.
2010-10-15 02:43:23,787 INFO org.apache.hadoop.mapred.JobTracker: Adding task 'attempt_201010140533_0134_r_000002_0' to tip task_201010140533_0134_r_000002, for tracker 'tracker_machine2.com:localhost.localdomain/127.0.0.1:33439'
2010-10-15 02:43:26,794 INFO org.apache.hadoop.mapred.JobTracker: Adding task 'attempt_201010140533_0134_r_000003_0' to tip task_201010140533_0134_r_000003, for tracker 'tracker_machine2.com:localhost.localdomain/127.0.0.1:33439'
2010-10-15 02:43:54,053 INFO org.apache.hadoop.mapred.JobInProgress: Task 'attempt_201010140533_0134_r_000002_0' has completed task_201010140533_0134_r_000002 successfully.
2010-10-15 02:43:54,054 INFO org.apache.hadoop.mapred.JobTracker: Adding task 'attempt_201010140533_0134_r_000004_0' to tip task_201010140533_0134_r_000004, for tracker 'tracker_machine2.com:localhost.localdomain/127.0.0.1:33439'
2010-10-15 02:43:57,058 INFO org.apache.hadoop.mapred.JobInProgress: Task 'attempt_201010140533_0134_r_000003_0' has completed task_201010140533_0134_r_000003 successfully.
2010-10-15 02:43:57,059 INFO org.apache.hadoop.mapred.JobTracker: Adding task 'attempt_201010140533_0134_r_000005_0' to tip task_201010140533_0134_r_000005, for tracker 'tracker_machine2.com:localhost.localdomain/127.0.0.1:33439'
2010-10-15 02:44:27,179 INFO org.apache.hadoop.mapred.JobInProgress: Task 'attempt_201010140533_0134_r_000004_0' has completed task_201010140533_0134_r_000004 successfully.
2010-10-15 02:44:27,180 INFO org.apache.hadoop.mapred.JobInProgress: Task 'attempt_201010140533_0134_r_000005_0' has completed task_201010140533_0134_r_000005 successfully.
2010-10-15 02:44:27,181 INFO org.apache.hadoop.mapred.JobTracker: Adding task 'attempt_201010140533_0134_r_000006_0' to tip task_201010140533_0134_r_000006, for tracker 'tracker_machine2.com:localhost.localdomain/127.0.0.1:33439'
2010-10-15 02:44:30,376 INFO org.apache.hadoop.mapred.JobTracker: Adding task 'attempt_201010140533_0134_r_000007_0' to tip task_201010140533_0134_r_000007, for tracker 'tracker_machine2.com:localhost.localdomain/127.0.0.1:33439'
2010-10-15 02:44:57,474 INFO org.apache.hadoop.mapred.JobInProgress: Task 'attempt_201010140533_0134_r_000006_0' has completed task_201010140533_0134_r_000006 successfully.
2010-10-15 02:44:57,475 INFO org.apache.hadoop.mapred.JobTracker: Adding task 'attempt_201010140533_0134_r_000008_0' to tip task_201010140533_0134_r_000008, for tracker 'tracker_machine2.com:localhost.localdomain/127.0.0.1:33439'
2010-10-15 02:45:00,482 INFO org.apache.hadoop.mapred.JobInProgress: Task 'attempt_201010140533_0134_r_000007_0' has completed task_201010140533_0134_r_000007 successfully.
2010-10-15 02:45:00,483 INFO org.apache.hadoop.mapred.JobTracker: Adding task 'attempt_201010140533_0134_r_000009_0' to tip task_201010140533_0134_r_000009, for tracker 'tracker_machine2.com:localhost.localdomain/127.0.0.1:33439'
2010-10-15 02:45:27,596 INFO org.apache.hadoop.mapred.JobInProgress: Task 'attempt_201010140533_0134_r_000009_0' has completed task_201010140533_0134_r_000009 successfully.
2010-10-15 02:45:30,789 INFO org.apache.hadoop.mapred.JobInProgress: Task 'attempt_201010140533_0134_r_000008_0' has completed task_201010140533_0134_r_000008 successfully.
2010-10-15 02:45:30,790 INFO org.apache.hadoop.mapred.JobTracker: Adding task 'attempt_201010140533_0134_m_000003_0' to tip task_201010140533_0134_m_000003, for tracker 'tracker_machine2.com:localhost.localdomain/127.0.0.1:33439'
2010-10-15 02:45:33,798 INFO org.apache.hadoop.mapred.JobInProgress: Task 'attempt_201010140533_0134_m_000003_0' has completed task_201010140533_0134_m_000003 successfully.
2010-10-15 02:45:33,799 INFO org.apache.hadoop.mapred.JobInProgress: Job job_201010140533_0134 has completed successfully.
2010-10-15 02:45:33,816 INFO org.apache.hadoop.mapred.JobTracker: Removed completed task 'attempt_201010140533_0134_m_000000_0' from 'tracker_machine2.com:localhost.localdomain/127.0.0.1:33439'
2010-10-15 02:45:33,816 INFO org.apache.hadoop.mapred.JobTracker: Removed completed task 'attempt_201010140533_0134_m_000001_0' from 'tracker_machine2.com:localhost.localdomain/127.0.0.1:33439'
2010-10-15 02:45:33,816 INFO org.apache.hadoop.mapred.JobTracker: Removed completed task 'attempt_201010140533_0134_m_000002_0' from 'tracker_machine2.com:localhost.localdomain/127.0.0.1:33439'
2010-10-15 02:45:33,816 INFO org.apache.hadoop.mapred.JobTracker: Removed completed task 'attempt_201010140533_0134_m_000003_0' from 'tracker_machine2.com:localhost.localdomain/127.0.0.1:33439'
2010-10-15 02:45:33,816 INFO org.apache.hadoop.mapred.JobTracker: Removed completed task 'attempt_201010140533_0134_m_000004_0' from 'tracker_machine2.com:localhost.localdomain/127.0.0.1:33439'
2010-10-15 02:45:33,816 INFO org.apache.hadoop.mapred.JobTracker: Removed completed task 'attempt_201010140533_0134_r_000000_0' from 'tracker_machine2.com:localhost.localdomain/127.0.0.1:33439'
2010-10-15 02:45:33,816 INFO org.apache.hadoop.mapred.JobTracker: Removed completed task 'attempt_201010140533_0134_r_000001_0' from 'tracker_machine2.com:localhost.localdomain/127.0.0.1:33439'
2010-10-15 02:45:33,816 INFO org.apache.hadoop.mapred.JobTracker: Removed completed task 'attempt_201010140533_0134_r_000002_0' from 'tracker_machine2.com:localhost.localdomain/127.0.0.1:33439'
2010-10-15 02:45:33,816 INFO org.apache.hadoop.mapred.JobTracker: Removed completed task 'attempt_201010140533_0134_r_000003_0' from 'tracker_machine2.com:localhost.localdomain/127.0.0.1:33439'
2010-10-15 02:45:33,816 INFO org.apache.hadoop.mapred.JobTracker: Removed completed task 'attempt_201010140533_0134_r_000004_0' from 'tracker_machine2.com:localhost.localdomain/127.0.0.1:33439'
2010-10-15 02:45:33,816 INFO org.apache.hadoop.mapred.JobTracker: Removed completed task 'attempt_201010140533_0134_r_000005_0' from 'tracker_machine2.com:localhost.localdomain/127.0.0.1:33439'
2010-10-15 02:45:33,816 INFO org.apache.hadoop.mapred.JobTracker: Removed completed task 'attempt_201010140533_0134_r_000006_0' from 'tracker_machine2.com:localhost.localdomain/127.0.0.1:33439'
2010-10-15 02:45:33,817 INFO org.apache.hadoop.mapred.JobTracker: Removed completed task 'attempt_201010140533_0134_r_000007_0' from 'tracker_machine2.com:localhost.localdomain/127.0.0.1:33439'
2010-10-15 02:45:33,817 INFO org.apache.hadoop.mapred.JobTracker: Removed completed task 'attempt_201010140533_0134_r_000008_0' from 'tracker_machine2.com:localhost.localdomain/127.0.0.1:33439'
2010-10-15 02:45:33,817 INFO org.apache.hadoop.mapred.JobTracker: Removed completed task 'attempt_201010140533_0134_r_000009_0' from 'tracker_machine2.com:localhost.localdomain/127.0.0.1:33439'




2010-10-15 02:46:37,155 INFO org.apache.hadoop.mapred.JobTracker: Initializing job_201010140533_0135
2010-10-15 02:46:37,156 INFO org.apache.hadoop.mapred.JobInProgress: Initializing job_201010140533_0135
2010-10-15 02:46:37,263 INFO org.apache.hadoop.mapred.JobInProgress: Input size for job job_201010140533_0135 = 0. Number of splits = 3
2010-10-15 02:46:37,263 INFO org.apache.hadoop.mapred.JobInProgress: tip:task_201010140533_0135_m_000000 has split on node:/default-rack/machine1.com
2010-10-15 02:46:37,263 INFO org.apache.hadoop.mapred.JobInProgress: tip:task_201010140533_0135_m_000001 has split on node:/default-rack/machine1.com
2010-10-15 02:46:37,263 INFO org.apache.hadoop.mapred.JobInProgress: tip:task_201010140533_0135_m_000002 has split on node:/default-rack/machine1.com
2010-10-15 02:46:40,159 INFO org.apache.hadoop.mapred.JobTracker: Adding task 'attempt_201010140533_0135_m_000004_0' to tip task_201010140533_0135_m_000004, for tracker 'tracker_machine2.com:localhost.localdomain/127.0.0.1:33439'
2010-10-15 02:47:16,439 INFO org.apache.hadoop.mapred.JobTracker: Adding task 'attempt_201010140533_0135_m_000004_1' to tip task_201010140533_0135_m_000004, for tracker 'tracker_machine2.com:localhost.localdomain/127.0.0.1:33439'
2010-10-15 02:47:16,439 INFO org.apache.hadoop.mapred.JobTracker: Removed completed task 'attempt_201010140533_0135_m_000004_0' from 'tracker_machine2.com:localhost.localdomain/127.0.0.1:33439'
2010-10-15 02:47:22,452 INFO org.apache.hadoop.mapred.JobTracker: Adding task 'attempt_201010140533_0135_m_000004_2' to tip task_201010140533_0135_m_000004, for tracker 'tracker_machine2.com:localhost.localdomain/127.0.0.1:33439'
2010-10-15 02:47:22,452 INFO org.apache.hadoop.mapred.JobTracker: Removed completed task 'attempt_201010140533_0135_m_000004_1' from 'tracker_machine2.com:localhost.localdomain/127.0.0.1:33439'
2010-10-15 02:47:28,463 INFO org.apache.hadoop.mapred.JobTracker: Adding task 'attempt_201010140533_0135_m_000004_3' to tip task_201010140533_0135_m_000004, for tracker 'tracker_machine2.com:localhost.localdomain/127.0.0.1:33439'
2010-10-15 02:47:28,463 INFO org.apache.hadoop.mapred.JobTracker: Removed completed task 'attempt_201010140533_0135_m_000004_2' from 'tracker_machine2.com:localhost.localdomain/127.0.0.1:33439'
2010-10-15 02:47:34,663 INFO org.apache.hadoop.mapred.TaskInProgress: TaskInProgress task_201010140533_0135_m_000004 has failed 4 times.
2010-10-15 02:47:34,663 INFO org.apache.hadoop.mapred.JobInProgress: Aborting job job_201010140533_0135
2010-10-15 02:47:34,663 INFO org.apache.hadoop.mapred.JobInProgress: Killing job 'job_201010140533_0135'
2010-10-15 02:47:34,663 INFO org.apache.hadoop.mapred.JobTracker: Adding task 'attempt_201010140533_0135_m_000003_0' to tip task_201010140533_0135_m_000003, for tracker 'tracker_machine2.com:localhost.localdomain/127.0.0.1:33439'
2010-10-15 02:47:34,664 INFO org.apache.hadoop.mapred.JobTracker: Removed completed task 'attempt_201010140533_0135_m_000004_3' from 'tracker_machine2.com:localhost.localdomain/127.0.0.1:33439'
2010-10-15 02:47:40,683 INFO org.apache.hadoop.mapred.JobTracker: Adding task 'attempt_201010140533_0135_m_000003_1' to tip task_201010140533_0135_m_000003, for tracker 'tracker_machine2.com:localhost.localdomain/127.0.0.1:33439'
2010-10-15 02:47:40,684 INFO org.apache.hadoop.mapred.JobTracker: Removed completed task 'attempt_201010140533_0135_m_000003_0' from 'tracker_machine2.com:localhost.localdomain/127.0.0.1:33439'
2010-10-15 02:47:46,695 INFO org.apache.hadoop.mapred.JobTracker: Adding task 'attempt_201010140533_0135_m_000003_2' to tip task_201010140533_0135_m_000003, for tracker 'tracker_machine2.com:localhost.localdomain/127.0.0.1:33439'
2010-10-15 02:47:46,695 INFO org.apache.hadoop.mapred.JobTracker: Removed completed task 'attempt_201010140533_0135_m_000003_1' from 'tracker_machine2.com:localhost.localdomain/127.0.0.1:33439'
2010-10-15 02:47:52,707 INFO org.apache.hadoop.mapred.JobTracker: Adding task 'attempt_201010140533_0135_m_000003_3' to tip task_201010140533_0135_m_000003, for tracker 'tracker_machine2.com:localhost.localdomain/127.0.0.1:33439'
2010-10-15 02:47:52,707 INFO org.apache.hadoop.mapred.JobTracker: Removed completed task 'attempt_201010140533_0135_m_000003_2' from 'tracker_machine2.com:localhost.localdomain/127.0.0.1:33439'
2010-10-15 02:47:58,715 INFO org.apache.hadoop.mapred.TaskInProgress: TaskInProgress task_201010140533_0135_m_000003 has failed 4 times.
2010-10-15 02:47:58,715 INFO org.apache.hadoop.mapred.JobInProgress: Aborting job job_201010140533_0135
2010-10-15 02:47:58,729 INFO org.apache.hadoop.mapred.JobTracker: Removed completed task 'attempt_201010140533_0135_m_000003_3' from 'tracker_machine2.com:localhost.localdomain/127.0.0.1:33439'
2010-10-15 02:47:58,729 INFO org.apache.hadoop.mapred.JobTracker: Removed completed task 'attempt_201010140533_0135_m_000004_0' from 'tracker_machine2.com:localhost.localdomain/127.0.0.1:33439'
2010-10-15 02:47:58,729 INFO org.apache.hadoop.mapred.JobTracker: Removed completed task 'attempt_201010140533_0135_m_000004_1' from 'tracker_machine2.com:localhost.localdomain/127.0.0.1:33439'
2010-10-15 02:47:58,730 INFO org.apache.hadoop.mapred.JobTracker: Removed completed task 'attempt_201010140533_0135_m_000004_2' from 'tracker_machine2.com:localhost.localdomain/127.0.0.1:33439'
2010-10-15 02:47:58,730 INFO org.apache.hadoop.mapred.JobTracker: Removed completed task 'attempt_201010140533_0135_m_000004_3' from 'tracker_machine2.com:localhost.localdomain/127.0.0.1:33439'

Reply via email to