Hi Lance, Is it possible that your mapred.local.dir is in /tmp and you have a cron job that cleans it up at night (default on many systems)?
Thanks -Todd On Fri, May 22, 2009 at 9:33 AM, Lance Riedel <la...@dotspots.com> wrote: > Version 19.1 with patches: > 4780-2v19.patch (Jira 4780) > closeAll3.patch (Jira 3998) > I have confirmed that > https://issues.apache.org/jira/browse/HADOOP-4924patch is in, so that > is not the fix. > > > We are having task trackers die every night with a null pointer exception. > Usually 2 or so out of 8 (25% each night). > > > Here are the logs: > > Version 19.1 with > 2009-05-22 02:46:49,911 INFO org.apache.hadoop.mapred.TaskTracker: Received > 'KillJobAction' for job: job_200905211749_0451 > 2009-05-22 02:46:49,911 INFO org.apache.hadoop.mapred.TaskRunner: > attempt_200905211749_0451_m_000000_0 done; removing files. > 2009-05-22 02:46:54,911 INFO org.apache.hadoop.mapred.TaskTracker: > org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find > > taskTracker/jobcache/job_200905211749_0421/attempt_200905211749_0421_r_000009_0/output/file.out > in any of the configured local directories > 2009-05-22 02:47:13,968 INFO org.apache.hadoop.mapred.TaskTracker: Received > 'KillJobAction' for job: job_200905211749_0444 > 2009-05-22 02:47:13,969 INFO org.apache.hadoop.mapred.TaskRunner: > attempt_200905211749_0444_m_000000_0 done; removing files. > 2009-05-22 02:47:18,968 INFO org.apache.hadoop.mapred.TaskTracker: > org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find > > taskTracker/jobcache/job_200905211749_0421/attempt_200905211749_0421_r_000009_0/output/file.out > in any of the configured local directories > 2009-05-22 02:48:52,324 INFO org.apache.hadoop.mapred.TaskTracker: > org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find > > taskTracker/jobcache/job_200905211749_0421/attempt_200905211749_0421_r_000009_0/output/file.out > in any of the configured local directories > 2009-05-22 02:49:10,779 INFO org.apache.hadoop.mapred.TaskTracker: > LaunchTaskAction (registerTask): attempt_200905211749_0452_m_000006_0 > task's > state:UNASSIGNED > 2009-05-22 02:49:10,779 INFO org.apache.hadoop.mapred.TaskTracker: Trying > to > launch : attempt_200905211749_0452_m_000006_0 > 2009-05-22 02:49:10,779 INFO org.apache.hadoop.mapred.TaskTracker: In > TaskLauncher, current free slots : 4 and trying to launch > attempt_200905211749_0452_m_000006_0 > 2009-05-22 02:49:15,274 INFO org.apache.hadoop.mapred.JvmManager: JVM > Runner > jvm_200905211749_0452_m_1998728288 spawned. > 2009-05-22 02:49:15,765 INFO org.apache.hadoop.mapred.TaskTracker: JVM with > ID: jvm_200905211749_0452_m_1998728288 given task: > attempt_200905211749_0452_m_000006_0 > 2009-05-22 02:49:15,781 INFO org.apache.hadoop.mapred.TaskTracker: > org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find > > taskTracker/jobcache/job_200905211749_0421/attempt_200905211749_0421_r_000009_0/output/file.out > in any of the configured local directories > 2009-05-22 02:49:15,781 INFO org.apache.hadoop.mapred.TaskTracker: > org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find > > taskTracker/jobcache/job_200905211749_0452/attempt_200905211749_0452_m_000006_0/output/file.out > in any of the configured local directories > 2009-05-22 02:49:19,784 INFO org.apache.hadoop.mapred.TaskTracker: > attempt_200905211749_0452_m_000006_0 1.0% hdfs:// > > ec2-75-101-247-52.compute-1.amazonaws.com:54310/paragraphInstances/2009-05-22/rollup#06h#3e04c188-245a-4856-9a54-2fec60e85e3d.seq:0+9674259 > < > http://ec2-75-101-247-52.compute-1.amazonaws.com:54310/paragraphInstances/2009-05-22/rollup#06h%233e04c188-245a-4856-9a54-2fec60e85e3d.seq:0+9674259 > > > 2009-05-22 02:49:19,785 INFO org.apache.hadoop.mapred.TaskTracker: Task > attempt_200905211749_0452_m_000006_0 is done. > 2009-05-22 02:49:19,785 INFO org.apache.hadoop.mapred.TaskTracker: reported > output size for attempt_200905211749_0452_m_000006_0 was 0 > 2009-05-22 02:49:19,787 INFO org.apache.hadoop.mapred.TaskTracker: > addFreeSlot : current free slots : 4 > 2009-05-22 02:49:19,954 INFO org.apache.hadoop.mapred.JvmManager: JVM : > jvm_200905211749_0452_m_1998728288 exited. Number of tasks it ran: 1 > 2009-05-22 02:59:19,297 INFO org.apache.hadoop.mapred.TaskTracker: Recieved > RenitTrackerAction from JobTracker > 2009-05-22 02:59:19,298 ERROR org.apache.hadoop.mapred.TaskTracker: Can not > start task tracker because java.lang.NullPointerException > at > > org.apache.hadoop.mapred.TaskTracker$TaskInProgress.kill(TaskTracker.java:2300) > at > > org.apache.hadoop.mapred.TaskTracker$TaskInProgress.jobHasFinished(TaskTracker.java:2273) > at org.apache.hadoop.mapred.TaskTracker.close(TaskTracker.java:840) > at org.apache.hadoop.mapred.TaskTracker.run(TaskTracker.java:1728) > at org.apache.hadoop.mapred.TaskTracker.main(TaskTracker.java:2785) > > 2009-05-22 02:59:19,300 INFO org.apache.hadoop.mapred.TaskTracker: > SHUTDOWN_MSG: > /************************************************************ > SHUTDOWN_MSG: Shutting down TaskTracker at domU-12-31-38-01-AD-91/ > 10.253.178.95 > ************************************************************/ >