In short, there are no userlogs. stderr and stdout are both empty. I copied the output from syslog to the following pastebin: http://pastebin.com/0XXE9Jze. The first 22 lines look to be exactly the same as the syslogs for other, non-dying, tasks. The main departure is on line 23 where the loader can't seem to load native-hadoop libraries, and this happens about 10 minutes after starting up.
--Keith On Aug 1, 2011, at 1:00 PM, Harsh J wrote: > Are there no userlogs from the failed tasks? TaskTracker logs won't > carry user-code (task) logs. Could you paste those syslog lines (from > the task) to pastebin/etc. since the lists may not be accepting > attachments? > > On Tue, Aug 2, 2011 at 12:51 AM, Stevens, Keith D. <steven...@llnl.gov> wrote: >> Hi all, >> >> I'm running a simple mapreduce job that connects to an hbase table, reads >> each row, counts some co-occurrence frequencies, and writes everything out >> to hdfs at the end. Everything seems to be going smoothly until the last 5, >> out of 108, tasks run. The last 5 tasks seem to be stuck initializing. As >> far as I can tell, setup is never called, and eventually, after 600 seconds, >> the task is killed. The task jumps around different nodes to try and run >> but regardless of the node, it fails to initialize and is killed. >> >> My first guess is that it's trying to connect to an hbase region server and >> failing, but I don't see anything like this in the task tracker nodes. Here >> are the log lines related to one of the failed tasks from the task trackers >> logs: >> >> 2011-08-01 12:01:08,889 INFO org.apache.hadoop.mapred.TaskTracker: >> LaunchTaskAction (registerTask): attempt_201107281508_0028_m_000027_0 task's >> state:UNASSIGNED >> 2011-08-01 12:01:08,889 INFO org.apache.hadoop.mapred.TaskTracker: Trying to >> launch : attempt_201107281508_0028_m_000027_0 which needs 1 slots >> 2011-08-01 12:01:08,889 INFO org.apache.hadoop.mapred.TaskTracker: In >> TaskLauncher, current free slots : 1 and trying to launch >> attempt_201107281508_0028_m_000027_0 which needs 1 slots >> 2011-08-01 12:01:12,243 INFO org.apache.hadoop.mapred.TaskTracker: JVM with >> ID: jvm_201107281508_0028_m_-1189914759 given task: >> attempt_201107281508_0028_m_000027_0 >> 2011-08-01 12:11:09,462 INFO org.apache.hadoop.mapred.TaskTracker: >> attempt_201107281508_0028_m_000027_0: Task >> attempt_201107281508_0028_m_000027_0 failed to report status for 600 >> seconds. Killing! >> 2011-08-01 12:11:09,467 INFO org.apache.hadoop.mapred.TaskTracker: About to >> purge task: attempt_201107281508_0028_m_000027_0 >> 2011-08-01 12:11:14,488 INFO org.apache.hadoop.mapred.TaskRunner: >> attempt_201107281508_0028_m_000027_0 done; removing files. >> 2011-08-01 12:11:14,489 INFO org.apache.hadoop.mapred.IndexCache: Map ID >> attempt_201107281508_0028_m_000027_0 not found in cache >> 2011-08-01 12:11:14,495 INFO org.apache.hadoop.mapred.TaskTracker: >> LaunchTaskAction (registerTask): attempt_201107281508_0028_m_000027_0 task's >> state:FAILED_UNCLEAN >> 2011-08-01 12:11:14,496 INFO org.apache.hadoop.mapred.TaskTracker: Trying to >> launch : attempt_201107281508_0028_m_000027_0 which needs 1 slots >> 2011-08-01 12:11:14,496 INFO org.apache.hadoop.mapred.TaskTracker: In >> TaskLauncher, current free slots : 1 and trying to launch >> attempt_201107281508_0028_m_000027_0 which needs 1 slots >> 2011-08-01 12:11:15,045 INFO org.apache.hadoop.mapred.TaskTracker: JVM with >> ID: jvm_201107281508_0028_m_-1869983962 given task: >> attempt_201107281508_0028_m_000027_0 >> 2011-08-01 12:11:15,346 INFO org.apache.hadoop.mapred.TaskTracker: >> attempt_201107281508_0028_m_000027_0 0.0% >> 2011-08-01 12:11:15,348 INFO org.apache.hadoop.mapred.TaskTracker: >> attempt_201107281508_0028_m_000027_0 0.0% cleanup >> 2011-08-01 12:11:15,349 INFO org.apache.hadoop.mapred.TaskTracker: Task >> attempt_201107281508_0028_m_000027_0 is done. >> 2011-08-01 12:11:15,349 INFO org.apache.hadoop.mapred.TaskTracker: reported >> output size for attempt_201107281508_0028_m_000027_0 was -1 >> 2011-08-01 12:11:15,354 INFO org.apache.hadoop.mapred.TaskRunner: >> attempt_201107281508_0028_m_000027_0 done; removing files. >> 2011-08-01 12:11:17,495 INFO org.apache.hadoop.mapred.TaskRunner: >> attempt_201107281508_0028_m_000027_0 done; removing files. >> >> And here are the syslog lines: >> In my job, I set the stats when i enter and exit setup, and I set counters >> in map. None of these are triggered for this task. Nothing is written to >> stderr or stdout, and the syslogs for the task have nothing beyond the >> zookeeper client connection lines. >> >> Any thoughts as to what might be causing this issue? Is there another log >> that indicates which region server this task is trying to connect to? >> >> Thanks! >> --Keith Stevens > > > > -- > Harsh J