Hi, First of all, I use Hadoop-0.20.2 on Windows XP Pro with Eclipse Plug-In. I have a Cluster with 1 Master (Jobtracker and Namenode) and 4 Slaves(Datanode and TaskTracker).
I have some problems about my Hadoop-Cluster last few weeks. When I start a job with big Input(4GB - it`s may be not to big, but algorithm require some time), then i lose one of my Task Trackers in several minutes or seconds, but Datanode still alive. That means, i have still commucation with this slave. In addition, it is not always happen. I can say 50 procent of my Jobs... I examined my Log files and I saw that I lose my Task Tracker after this Line in tasktracker.log : "JVM Runner jvm_201103311133_0005_m_850277002 spawned." one slice of my tasktracker.log until Lost Task Tracker problem: 2011-03-31 16:10:03,522 INFO org.apache.hadoop.mapred.TaskTracker: Task attempt_201103311133_0021_m_000132_0 is done. 2011-03-31 16:10:03,522 INFO org.apache.hadoop.mapred.TaskTracker: reported output size for attempt_201103311133_0021_m_000132_0 was 0 2011-03-31 16:10:03,538 INFO org.apache.hadoop.mapred.TaskTracker: addFreeSlot : current free slots : 2 2011-03-31 16:10:03,741 INFO org.apache.hadoop.mapred.JvmManager: JVM : jvm_201103311133_0021_m_-1306155067 exited. Number of tasks it ran: 1 2011-03-31 16:10:03,741 INFO org.apache.hadoop.mapred.JvmManager: JVM : jvm_201103311133_0021_m_-470981146 exited. Number of tasks it ran: 1 2011-03-31 16:10:04,069 INFO org.apache.hadoop.mapred.TaskTracker: LaunchTaskAction (registerTask): attempt_201103311133_0021_m_000138_0 task's state:UNASSIGNED 2011-03-31 16:10:04,069 INFO org.apache.hadoop.mapred.TaskTracker: Trying to launch : attempt_201103311133_0021_m_000138_0 2011-03-31 16:10:04,069 INFO org.apache.hadoop.mapred.TaskTracker: In TaskLauncher, current free slots : 2 and trying to launch attempt_201103311133_0021_m_000138_0 - 2011-03-31 16:10:05,882 INFO org.apache.hadoop.mapred.JvmManager: In JvmRunner constructed JVM ID: jvm_201103311133_0021_m_34147226 - 2011-03-31 16:10:05,882 INFO org.apache.hadoop.mapred.JvmManager: JVM Runner jvm_201103311133_0021_m_34147226 spawned. - 2011-03-31 16:10:07,100 INFO org.apache.hadoop.mapred.TaskTracker: LaunchTaskAction (registerTask): attempt_201103311133_0021_m_000140_0 task's state:UNASSIGNED - 2011-03-31 16:10:07,100 INFO org.apache.hadoop.mapred.TaskTracker: Trying to launch : attempt_201103311133_0021_m_000140_0 - 2011-03-31 16:10:07,100 INFO org.apache.hadoop.mapred.TaskTracker: In TaskLauncher, current free slots : 1 and trying to launch attempt_201103311133_0021_m_000140_0 - 2011-03-31 16:10:07,460 INFO org.apache.hadoop.mapred.JvmManager: In JvmRunner constructed JVM ID: jvm_201103311133_0021_m_1797481322 - 2011-03-31 16:10:07,460 INFO org.apache.hadoop.mapred.JvmManager: JVM Runner jvm_201103311133_0021_m_1797481322 spawned. - 2011-03-31 16:10:10,116 INFO org.apache.hadoop.mapred.TaskTracker: JVM with ID: jvm_201103311133_0021_m_34147226 given task: attempt_201103311133_0021_m_000138_0 - 2011-03-31 16:10:14,007 INFO org.apache.hadoop.mapred.TaskTracker: JVM with ID: jvm_201103311133_0021_m_1797481322 given task: attempt_201103311133_0021_m_000140_0 - 2011-03-31 16:10:18,694 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201103311133_0021_m_000138_0 0.0% 2011-03-31 16:10:20,803 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201103311133_0021_m_000140_0 0.0% 2011-03-31 16:10:21,538 INFO org.apache.hadoop.mapred.TaskTracker: Sent out 4685696 bytes for reduce: 0 from map: attempt_201103311133_0021_m_000130_0 given 4685696/4685692 2011-03-31 16:10:21,538 INFO org.apache.hadoop.mapred.TaskTracker.clienttrace: src: 157.163.194.181:50060, dest: 157.163.200.75:2223, bytes: 4685696, op: MAPRED_SHUFFLE, cliID: attempt_201103311133_0021_m_000130_0 2011-03-31 16:10:21,710 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201103311133_0021_m_000138_0 0.050699152% 2011-03-31 16:10:23,803 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201103311133_0021_m_000140_0 0.0% 2011-03-31 16:10:24,725 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201103311133_0021_m_000138_0 0.21863972% 2011-03-31 16:10:26,803 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201103311133_0021_m_000140_0 0.022273919% 2011-03-31 16:10:27,757 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201103311133_0021_m_000138_0 0.3422284% 2011-03-31 16:10:29,803 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201103311133_0021_m_000140_0 0.119258955% 2011-03-31 16:10:30,850 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201103311133_0021_m_000138_0 0.44045892% 2011-03-31 16:10:31,178 INFO org.apache.hadoop.mapred.TaskTracker: Sent out 4618152 bytes for reduce: 0 from map: attempt_201103311133_0021_m_000132_0 given 4618152/4618148 2011-03-31 16:10:31,178 INFO org.apache.hadoop.mapred.TaskTracker.clienttrace: src: 157.163.194.181:50060, dest: 157.163.200.75:2227, bytes: 4618152, op: MAPRED_SHUFFLE, cliID: attempt_201103311133_0021_m_000132_0 2011-03-31 16:10:32,944 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201103311133_0021_m_000140_0 0.20685153% 2011-03-31 16:10:33,882 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201103311133_0021_m_000138_0 0.55659634% 2011-03-31 16:10:36,022 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201103311133_0021_m_000140_0 0.28004178% 2011-03-31 16:10:36,897 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201103311133_0021_m_000138_0 0.6907835% 2011-03-31 16:10:39,132 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201103311133_0021_m_000140_0 0.36914998% 2011-03-31 16:10:39,944 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201103311133_0021_m_000138_0 0.77318066% 2011-03-31 16:10:42,225 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201103311133_0021_m_000140_0 0.4423412% 2011-03-31 16:10:42,991 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201103311133_0021_m_000138_0 0.85240847% 2011-03-31 16:10:45,319 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201103311133_0021_m_000140_0 0.5346344% 2011-03-31 16:10:46,038 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201103311133_0021_m_000138_0 0.94430083% 2011-03-31 16:10:48,319 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201103311133_0021_m_000140_0 0.636471% 2011-03-31 16:10:49,053 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201103311133_0021_m_000138_0 1.0% 2011-03-31 16:10:51,319 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201103311133_0021_m_000140_0 0.7351242% 2011-03-31 16:10:52,085 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201103311133_0021_m_000138_0 1.0% 2011-03-31 16:10:54,335 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201103311133_0021_m_000140_0 0.8974207% 2011-03-31 16:10:57,335 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201103311133_0021_m_000140_0 1.0% 2011-03-31 16:10:58,944 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201103311133_0021_m_000138_0 1.0% 2011-03-31 16:10:58,944 INFO org.apache.hadoop.mapred.TaskTracker: Task attempt_201103311133_0021_m_000138_0 is done. 2011-03-31 16:10:58,944 INFO org.apache.hadoop.mapred.TaskTracker: reported output size for attempt_201103311133_0021_m_000138_0 was 0 2011-03-31 16:10:58,944 INFO org.apache.hadoop.mapred.TaskTracker: addFreeSlot : current free slots : 1 2011-03-31 16:10:59,007 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201103311133_0021_m_000140_0 1.0% 2011-03-31 16:10:59,007 INFO org.apache.hadoop.mapred.TaskTracker: Task attempt_201103311133_0021_m_000140_0 is done. 2011-03-31 16:10:59,007 INFO org.apache.hadoop.mapred.TaskTracker: reported output size for attempt_201103311133_0021_m_000140_0 was 0 2011-03-31 16:10:59,007 INFO org.apache.hadoop.mapred.TaskTracker: addFreeSlot : current free slots : 2 2011-03-31 16:10:59,210 INFO org.apache.hadoop.mapred.JvmManager: JVM : jvm_201103311133_0021_m_34147226 exited. Number of tasks it ran: 1 2011-03-31 16:10:59,319 INFO org.apache.hadoop.mapred.JvmManager: JVM : jvm_201103311133_0021_m_1797481322 exited. Number of tasks it ran: 1 - 2011-03-31 16:11:03,210 INFO org.apache.hadoop.mapred.TaskTracker: LaunchTaskAction (registerTask): attempt_201103311133_0021_m_000145_0 task's state:UNASSIGNED - 2011-03-31 16:11:03,210 INFO org.apache.hadoop.mapred.TaskTracker: Trying to launch : attempt_201103311133_0021_m_000145_0 - 2011-03-31 16:11:03,210 INFO org.apache.hadoop.mapred.TaskTracker: In TaskLauncher, current free slots : 2 and trying to launch attempt_201103311133_0021_m_000145_0 - 2011-03-31 16:11:03,210 INFO org.apache.hadoop.mapred.TaskTracker: LaunchTaskAction (registerTask): attempt_201103311133_0021_m_000146_0 task's state:UNASSIGNED - 2011-03-31 16:11:04,975 INFO org.apache.hadoop.mapred.TaskTracker: Trying to launch : attempt_201103311133_0021_m_000146_0 - 2011-03-31 16:11:04,975 INFO org.apache.hadoop.mapred.TaskTracker: In TaskLauncher, current free slots : 1 and trying to launch attempt_201103311133_0021_m_000146_0 - 2011-03-31 16:11:05,022 INFO org.apache.hadoop.mapred.JvmManager: In JvmRunner constructed JVM ID: jvm_201103311133_0021_m_129451324 - 2011-03-31 16:11:05,022 INFO org.apache.hadoop.mapred.JvmManager: JVM Runner jvm_201103311133_0021_m_129451324 spawned. I await after last line "JVM with ID....", but after this line within 600 seconds I lose my Task Tracker. I want to know, what happend after "JVM Runner spawned" in Hadoop??? If I know that, I can maybe find a solution. and one slice of my jobtracker.log until Lost Task Tracker problem: 2011-03-31 16:13:55,049 INFO org.apache.hadoop.mapred.JobInProgress: Choosing data-local task task_201103311133_0021_m_000162 2011-03-31 16:13:55,049 INFO org.apache.hadoop.mapred.JobTracker: Adding task 'attempt_201103311133_0021_m_000163_0' to tip task_201103311133_0021_m_000163, for tracker 'tracker_eh410pvc.ww005.siemens.net:localhost/127.0.0.1:3264' 2011-03-31 16:13:55,049 INFO org.apache.hadoop.mapred.JobInProgress: Choosing data-local task task_201103311133_0021_m_000163 2011-03-31 16:14:07,798 INFO org.apache.hadoop.mapred.JobTracker: attempt_201103311133_0021_m_000145_0 is 184557 ms debug. 2011-03-31 16:14:07,798 INFO org.apache.hadoop.mapred.JobTracker: Launching task attempt_201103311133_0021_m_000145_0 timed out. 2011-03-31 16:14:07,798 INFO org.apache.hadoop.mapred.TaskInProgress: Error from attempt_201103311133_0021_m_000145_0: Error launching task 2011-03-31 16:14:07,798 INFO org.apache.hadoop.mapred.JobTracker: attempt_201103311133_0021_m_000146_0 is 184557 ms debug. 2011-03-31 16:14:07,798 INFO org.apache.hadoop.mapred.JobTracker: Launching task attempt_201103311133_0021_m_000146_0 timed out. 2011-03-31 16:14:07,798 INFO org.apache.hadoop.mapred.TaskInProgress: Error from attempt_201103311133_0021_m_000146_0: Error launching task 2011-03-31 16:14:07,798 INFO org.apache.hadoop.mapred.JobTracker: Lost tracker 'tracker_eh40z1cc.ww005.siemens.net:localhost/127.0.0.1:2861' 2011-03-31 16:14:07,798 INFO org.apache.hadoop.mapred.TaskInProgress: Error from attempt_201103311133_0021_m_000000_0: Lost task tracker: tracker_eh40z1cc.ww005.siemens.net:localhost/127.0.0.1:2861 2011-03-31 16:14:07,798 INFO org.apache.hadoop.mapred.TaskInProgress: Error from attempt_201103311133_0021_m_000001_0: Lost task tracker: tracker_eh40z1cc.ww005.siemens.net:localhost/127.0.0.1:2861 2011-03-31 16:14:07,798 INFO org.apache.hadoop.mapred.TaskInProgress: Error from attempt_201103311133_0021_m_000014_0: Lost task tracker: tracker_eh40z1cc.ww005.siemens.net:localhost/127.0.0.1:2861 It can't run the task, and then "Error launching task". Why? or it can be ok... but , why am I lose my Task Tracker?? Thanks, Regards, Baran PS: I opened a new topic, because it looks like an another problem, that I previously one.
