Hi Lance,

Is it possible that your mapred.local.dir is in /tmp and you have a cron job
that cleans it up at night (default on many systems)?

Thanks
-Todd

On Fri, May 22, 2009 at 9:33 AM, Lance Riedel <la...@dotspots.com> wrote:

> Version 19.1 with patches:
> 4780-2v19.patch (Jira  4780)
> closeAll3.patch (Jira 3998)
> I have confirmed that
> https://issues.apache.org/jira/browse/HADOOP-4924patch is in, so that
> is not the fix.
>
>
> We are having task trackers die every night with a null pointer exception.
> Usually 2 or so out of 8 (25% each night).
>
>
> Here are the logs:
>
> Version 19.1 with
> 2009-05-22 02:46:49,911 INFO org.apache.hadoop.mapred.TaskTracker: Received
> 'KillJobAction' for job: job_200905211749_0451
> 2009-05-22 02:46:49,911 INFO org.apache.hadoop.mapred.TaskRunner:
> attempt_200905211749_0451_m_000000_0 done; removing files.
> 2009-05-22 02:46:54,911 INFO org.apache.hadoop.mapred.TaskTracker:
> org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find
>
> taskTracker/jobcache/job_200905211749_0421/attempt_200905211749_0421_r_000009_0/output/file.out
> in any of the configured local directories
> 2009-05-22 02:47:13,968 INFO org.apache.hadoop.mapred.TaskTracker: Received
> 'KillJobAction' for job: job_200905211749_0444
> 2009-05-22 02:47:13,969 INFO org.apache.hadoop.mapred.TaskRunner:
> attempt_200905211749_0444_m_000000_0 done; removing files.
> 2009-05-22 02:47:18,968 INFO org.apache.hadoop.mapred.TaskTracker:
> org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find
>
> taskTracker/jobcache/job_200905211749_0421/attempt_200905211749_0421_r_000009_0/output/file.out
> in any of the configured local directories
> 2009-05-22 02:48:52,324 INFO org.apache.hadoop.mapred.TaskTracker:
> org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find
>
> taskTracker/jobcache/job_200905211749_0421/attempt_200905211749_0421_r_000009_0/output/file.out
> in any of the configured local directories
> 2009-05-22 02:49:10,779 INFO org.apache.hadoop.mapred.TaskTracker:
> LaunchTaskAction (registerTask): attempt_200905211749_0452_m_000006_0
> task's
> state:UNASSIGNED
> 2009-05-22 02:49:10,779 INFO org.apache.hadoop.mapred.TaskTracker: Trying
> to
> launch : attempt_200905211749_0452_m_000006_0
> 2009-05-22 02:49:10,779 INFO org.apache.hadoop.mapred.TaskTracker: In
> TaskLauncher, current free slots : 4 and trying to launch
> attempt_200905211749_0452_m_000006_0
> 2009-05-22 02:49:15,274 INFO org.apache.hadoop.mapred.JvmManager: JVM
> Runner
> jvm_200905211749_0452_m_1998728288 spawned.
> 2009-05-22 02:49:15,765 INFO org.apache.hadoop.mapred.TaskTracker: JVM with
> ID: jvm_200905211749_0452_m_1998728288 given task:
> attempt_200905211749_0452_m_000006_0
> 2009-05-22 02:49:15,781 INFO org.apache.hadoop.mapred.TaskTracker:
> org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find
>
> taskTracker/jobcache/job_200905211749_0421/attempt_200905211749_0421_r_000009_0/output/file.out
> in any of the configured local directories
> 2009-05-22 02:49:15,781 INFO org.apache.hadoop.mapred.TaskTracker:
> org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find
>
> taskTracker/jobcache/job_200905211749_0452/attempt_200905211749_0452_m_000006_0/output/file.out
> in any of the configured local directories
> 2009-05-22 02:49:19,784 INFO org.apache.hadoop.mapred.TaskTracker:
> attempt_200905211749_0452_m_000006_0 1.0% hdfs://
>
> ec2-75-101-247-52.compute-1.amazonaws.com:54310/paragraphInstances/2009-05-22/rollup#06h#3e04c188-245a-4856-9a54-2fec60e85e3d.seq:0+9674259
> <
> http://ec2-75-101-247-52.compute-1.amazonaws.com:54310/paragraphInstances/2009-05-22/rollup#06h%233e04c188-245a-4856-9a54-2fec60e85e3d.seq:0+9674259
> >
> 2009-05-22 02:49:19,785 INFO org.apache.hadoop.mapred.TaskTracker: Task
> attempt_200905211749_0452_m_000006_0 is done.
> 2009-05-22 02:49:19,785 INFO org.apache.hadoop.mapred.TaskTracker: reported
> output size for attempt_200905211749_0452_m_000006_0  was 0
> 2009-05-22 02:49:19,787 INFO org.apache.hadoop.mapred.TaskTracker:
> addFreeSlot : current free slots : 4
> 2009-05-22 02:49:19,954 INFO org.apache.hadoop.mapred.JvmManager: JVM :
> jvm_200905211749_0452_m_1998728288 exited. Number of tasks it ran: 1
> 2009-05-22 02:59:19,297 INFO org.apache.hadoop.mapred.TaskTracker: Recieved
> RenitTrackerAction from JobTracker
> 2009-05-22 02:59:19,298 ERROR org.apache.hadoop.mapred.TaskTracker: Can not
> start task tracker because java.lang.NullPointerException
>        at
>
> org.apache.hadoop.mapred.TaskTracker$TaskInProgress.kill(TaskTracker.java:2300)
>        at
>
> org.apache.hadoop.mapred.TaskTracker$TaskInProgress.jobHasFinished(TaskTracker.java:2273)
>        at org.apache.hadoop.mapred.TaskTracker.close(TaskTracker.java:840)
>        at org.apache.hadoop.mapred.TaskTracker.run(TaskTracker.java:1728)
>        at org.apache.hadoop.mapred.TaskTracker.main(TaskTracker.java:2785)
>
> 2009-05-22 02:59:19,300 INFO org.apache.hadoop.mapred.TaskTracker:
> SHUTDOWN_MSG:
> /************************************************************
> SHUTDOWN_MSG: Shutting down TaskTracker at domU-12-31-38-01-AD-91/
> 10.253.178.95
> ************************************************************/
>

Reply via email to