Hi Todd, We had looked at that before.. here is the location of the tmp directory:
[dotsp...@domu-12-31-38-00-80-21 hadoop-0.19.1]$ du -sh /dist/app/hadoop-0.19.1/tmp 248G /dist/app/hadoop-0.19.1/tmp There are no cron jobs that would have anything to do with that directory. Here is the /tmp [dotsp...@domu-12-31-38-00-80-21 hadoop-0.19.1]$ du -sh /tmp 204K /tmp Does this look like a disk error? I had seen that the "org.apache.hadoop.util.DiskChecker$DiskErrorException" is bogus. Thanks! Lance On Fri, May 22, 2009 at 9:33 AM, Lance Riedel <la...@dotspots.com> wrote: > Version 19.1 with patches: > 4780-2v19.patch (Jira 4780) > closeAll3.patch (Jira 3998) > I have confirmed that https://issues.apache.org/jira/browse/HADOOP-4924patch > is in, so that is not the fix. > > > We are having task trackers die every night with a null pointer exception. > Usually 2 or so out of 8 (25% each night). > > > Here are the logs: > > Version 19.1 with > 2009-05-22 02:46:49,911 INFO org.apache.hadoop.mapred.TaskTracker: > Received 'KillJobAction' for job: job_200905211749_0451 > 2009-05-22 02:46:49,911 INFO org.apache.hadoop.mapred.TaskRunner: > attempt_200905211749_0451_m_000000_0 done; removing files. > 2009-05-22 02:46:54,911 INFO org.apache.hadoop.mapred.TaskTracker: > org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find > taskTracker/jobcache/job_200905211749_0421/attempt_200905211749_0421_r_000009_0/output/file.out > in any of the configured local directories > 2009-05-22 02:47:13,968 INFO org.apache.hadoop.mapred.TaskTracker: Received > 'KillJobAction' for job: job_200905211749_0444 > 2009-05-22 02:47:13,969 INFO org.apache.hadoop.mapred.TaskRunner: > attempt_200905211749_0444_m_000000_0 done; removing files. > 2009-05-22 02:47:18,968 INFO org.apache.hadoop.mapred.TaskTracker: > org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find > taskTracker/jobcache/job_200905211749_0421/attempt_200905211749_0421_r_000009_0/output/file.out > in any of the configured local directories > 2009-05-22 02:48:52,324 INFO org.apache.hadoop.mapred.TaskTracker: > org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find > taskTracker/jobcache/job_200905211749_0421/attempt_200905211749_0421_r_000009_0/output/file.out > in any of the configured local directories > 2009-05-22 02:49:10,779 INFO org.apache.hadoop.mapred.TaskTracker: > LaunchTaskAction (registerTask): attempt_200905211749_0452_m_000006_0 task's > state:UNASSIGNED > 2009-05-22 02:49:10,779 INFO org.apache.hadoop.mapred.TaskTracker: Trying > to launch : attempt_200905211749_0452_m_000006_0 > 2009-05-22 02:49:10,779 INFO org.apache.hadoop.mapred.TaskTracker: In > TaskLauncher, current free slots : 4 and trying to launch > attempt_200905211749_0452_m_000006_0 > 2009-05-22 02:49:15,274 INFO org.apache.hadoop.mapred.JvmManager: JVM > Runner jvm_200905211749_0452_m_1998728288 spawned. > 2009-05-22 02:49:15,765 INFO org.apache.hadoop.mapred.TaskTracker: JVM with > ID: jvm_200905211749_0452_m_1998728288 given task: > attempt_200905211749_0452_m_000006_0 > 2009-05-22 02:49:15,781 INFO org.apache.hadoop.mapred.TaskTracker: > org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find > taskTracker/jobcache/job_200905211749_0421/attempt_200905211749_0421_r_000009_0/output/file.out > in any of the configured local directories > 2009-05-22 02:49:15,781 INFO org.apache.hadoop.mapred.TaskTracker: > org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find > taskTracker/jobcache/job_200905211749_0452/attempt_200905211749_0452_m_000006_0/output/file.out > in any of the configured local directories > 2009-05-22 02:49:19,784 INFO org.apache.hadoop.mapred.TaskTracker: > attempt_200905211749_0452_m_000006_0 1.0% hdfs:// > ec2-75-101-247-52.compute-1.amazonaws.com:54310/paragraphInstances/2009-05-22/rollup#06h#3e04c188-245a-4856-9a54-2fec60e85e3d.seq:0+9674259<http://ec2-75-101-247-52.compute-1.amazonaws.com:54310/paragraphInstances/2009-05-22/rollup#06h%233e04c188-245a-4856-9a54-2fec60e85e3d.seq:0+9674259> > 2009-05-22 02:49:19,785 INFO org.apache.hadoop.mapred.TaskTracker: Task > attempt_200905211749_0452_m_000006_0 is done. > 2009-05-22 02:49:19,785 INFO org.apache.hadoop.mapred.TaskTracker: reported > output size for attempt_200905211749_0452_m_000006_0 was 0 > 2009-05-22 02:49:19,787 INFO org.apache.hadoop.mapred.TaskTracker: > addFreeSlot : current free slots : 4 > 2009-05-22 02:49:19,954 INFO org.apache.hadoop.mapred.JvmManager: JVM : > jvm_200905211749_0452_m_1998728288 exited. Number of tasks it ran: 1 > 2009-05-22 02:59:19,297 INFO org.apache.hadoop.mapred.TaskTracker: Recieved > RenitTrackerAction from JobTracker > 2009-05-22 02:59:19,298 ERROR org.apache.hadoop.mapred.TaskTracker: Can not > start task tracker because java.lang.NullPointerException > at > org.apache.hadoop.mapred.TaskTracker$TaskInProgress.kill(TaskTracker.java:2300) > at > org.apache.hadoop.mapred.TaskTracker$TaskInProgress.jobHasFinished(TaskTracker.java:2273) > at org.apache.hadoop.mapred.TaskTracker.close(TaskTracker.java:840) > at org.apache.hadoop.mapred.TaskTracker.run(TaskTracker.java:1728) > at org.apache.hadoop.mapred.TaskTracker.main(TaskTracker.java:2785) > > 2009-05-22 02:59:19,300 INFO org.apache.hadoop.mapred.TaskTracker: > SHUTDOWN_MSG: > /************************************************************ > SHUTDOWN_MSG: Shutting down TaskTracker at domU-12-31-38-01-AD-91/ > 10.253.178.95 > ************************************************************/ > >