Are there no userlogs from the failed tasks? TaskTracker logs won't carry user-code (task) logs. Could you paste those syslog lines (from the task) to pastebin/etc. since the lists may not be accepting attachments?
On Tue, Aug 2, 2011 at 12:51 AM, Stevens, Keith D. <steven...@llnl.gov> wrote: > Hi all, > > I'm running a simple mapreduce job that connects to an hbase table, reads > each row, counts some co-occurrence frequencies, and writes everything out to > hdfs at the end. Everything seems to be going smoothly until the last 5, out > of 108, tasks run. The last 5 tasks seem to be stuck initializing. As far > as I can tell, setup is never called, and eventually, after 600 seconds, the > task is killed. The task jumps around different nodes to try and run but > regardless of the node, it fails to initialize and is killed. > > My first guess is that it's trying to connect to an hbase region server and > failing, but I don't see anything like this in the task tracker nodes. Here > are the log lines related to one of the failed tasks from the task trackers > logs: > > 2011-08-01 12:01:08,889 INFO org.apache.hadoop.mapred.TaskTracker: > LaunchTaskAction (registerTask): attempt_201107281508_0028_m_000027_0 task's > state:UNASSIGNED > 2011-08-01 12:01:08,889 INFO org.apache.hadoop.mapred.TaskTracker: Trying to > launch : attempt_201107281508_0028_m_000027_0 which needs 1 slots > 2011-08-01 12:01:08,889 INFO org.apache.hadoop.mapred.TaskTracker: In > TaskLauncher, current free slots : 1 and trying to launch > attempt_201107281508_0028_m_000027_0 which needs 1 slots > 2011-08-01 12:01:12,243 INFO org.apache.hadoop.mapred.TaskTracker: JVM with > ID: jvm_201107281508_0028_m_-1189914759 given task: > attempt_201107281508_0028_m_000027_0 > 2011-08-01 12:11:09,462 INFO org.apache.hadoop.mapred.TaskTracker: > attempt_201107281508_0028_m_000027_0: Task > attempt_201107281508_0028_m_000027_0 failed to report status for 600 seconds. > Killing! > 2011-08-01 12:11:09,467 INFO org.apache.hadoop.mapred.TaskTracker: About to > purge task: attempt_201107281508_0028_m_000027_0 > 2011-08-01 12:11:14,488 INFO org.apache.hadoop.mapred.TaskRunner: > attempt_201107281508_0028_m_000027_0 done; removing files. > 2011-08-01 12:11:14,489 INFO org.apache.hadoop.mapred.IndexCache: Map ID > attempt_201107281508_0028_m_000027_0 not found in cache > 2011-08-01 12:11:14,495 INFO org.apache.hadoop.mapred.TaskTracker: > LaunchTaskAction (registerTask): attempt_201107281508_0028_m_000027_0 task's > state:FAILED_UNCLEAN > 2011-08-01 12:11:14,496 INFO org.apache.hadoop.mapred.TaskTracker: Trying to > launch : attempt_201107281508_0028_m_000027_0 which needs 1 slots > 2011-08-01 12:11:14,496 INFO org.apache.hadoop.mapred.TaskTracker: In > TaskLauncher, current free slots : 1 and trying to launch > attempt_201107281508_0028_m_000027_0 which needs 1 slots > 2011-08-01 12:11:15,045 INFO org.apache.hadoop.mapred.TaskTracker: JVM with > ID: jvm_201107281508_0028_m_-1869983962 given task: > attempt_201107281508_0028_m_000027_0 > 2011-08-01 12:11:15,346 INFO org.apache.hadoop.mapred.TaskTracker: > attempt_201107281508_0028_m_000027_0 0.0% > 2011-08-01 12:11:15,348 INFO org.apache.hadoop.mapred.TaskTracker: > attempt_201107281508_0028_m_000027_0 0.0% cleanup > 2011-08-01 12:11:15,349 INFO org.apache.hadoop.mapred.TaskTracker: Task > attempt_201107281508_0028_m_000027_0 is done. > 2011-08-01 12:11:15,349 INFO org.apache.hadoop.mapred.TaskTracker: reported > output size for attempt_201107281508_0028_m_000027_0 was -1 > 2011-08-01 12:11:15,354 INFO org.apache.hadoop.mapred.TaskRunner: > attempt_201107281508_0028_m_000027_0 done; removing files. > 2011-08-01 12:11:17,495 INFO org.apache.hadoop.mapred.TaskRunner: > attempt_201107281508_0028_m_000027_0 done; removing files. > > And here are the syslog lines: > In my job, I set the stats when i enter and exit setup, and I set counters in > map. None of these are triggered for this task. Nothing is written to > stderr or stdout, and the syslogs for the task have nothing beyond the > zookeeper client connection lines. > > Any thoughts as to what might be causing this issue? Is there another log > that indicates which region server this task is trying to connect to? > > Thanks! > --Keith Stevens -- Harsh J