Here is the point in the logs where the infinite loop begins - see time stamp 2009-05-14 04:03:56,348 : (JobTracker)
2009-05-14 04:03:56,324 INFO org.apache.hadoop.mapred.JobTracker: Removed completed task 'attempt_200905122015_1168_m_000029_0' from 'tracker_domU-12-31-38-01-74-F1.compute-1.internal:localhost.localdomain/ 127.0.0.1:35214' 2009-05-14 04:03:56,326 INFO org.apache.hadoop.mapred.JobTracker: Adding task 'attempt_200905122015_1183_r_000007_0' to tip task_200905122015_1183_r_000007, for tracker 'tracker_domU-12-31-38-00-F0-41.compute-1.internal:localhost.localdomain/ 127.0.0.1:58504' 2009-05-14 04:03:56,327 INFO org.apache.hadoop.mapred.JobTracker: Adding task (cleanup)'attempt_200905122015_1184_m_000000_1' to tip task_200905122015_1184_m_000000, for tracker 'tracker_domU-12-31-38-00-80-21.compute-1.internal:localhost.localdomain/ 127.0.0.1 :57741' 2009-05-14 04:03:56,330 INFO org.apache.hadoop.mapred.JobInProgress: Task 'attempt_200905122015_1182_r_000011_0' has completed task_200905122015_1182_r_000011 successfully. 2009-05-14 04:03:56,330 INFO org.apache.hadoop.mapred.JobTracker: Adding task (cleanup)'attempt_200905122015_1182_r_000010_1' to tip task_200905122015_1182_r_000010, for tracker 'tracker_domU-12-31-38-01-5C-41.compute-1.internal:localhost.localdomain/ 127.0.0.1 :46248' 2009-05-14 04:03:56,331 INFO org.apache.hadoop.mapred.JobTracker: Serious problem. While updating status, cannot find taskid attempt_200905122015_0499_r_000004_1 2009-05-14 04:03:56,331 INFO org.apache.hadoop.mapred.JobInProgress: Task 'attempt_200905122015_1184_m_000004_1' has completed task_200905122015_1184_m_000004 successfully. 2009-05-14 04:03:56,331 INFO org.apache.hadoop.mapred.ResourceEstimator: measured blowup on task_200905122015_1184_m_000004 was 20150008/21581175 = 0.93368447269437372009-05-14 04:03:56,331 INFO org.apache.hadoop.mapred.ResourceEstimator: new estimate is blowup = 0.9152383292400812 2009-05-14 04:03:56,335 INFO org.apache.hadoop.mapred.JobTracker: Adding task 'attempt_200905122015_1183_r_000008_0' to tip task_200905122015_1183_r_000008, for tracker 'tracker_domU-12-31-38-00-80-21.compute-1.internal:localhost.localdomain/ 127.0.0.1:57741' 2009-05-14 04:03:56,336 INFO org.apache.hadoop.mapred.JobTracker: Adding task 'attempt_200905122015_1183_r_000009_0' to tip task_200905122015_1183_r_000009, for tracker 'tracker_domU-12-31-38-01-74-F1.compute-1.internal:localhost.localdomain/ 127.0.0.1:35214' 2009-05-14 04:03:56,336 INFO org.apache.hadoop.mapred.JobTracker: Removed completed task 'attempt_200905122015_1181_r_000009_0' from 'tracker_domU-12-31-38-01-74-F1.compute-1.internal:localhost.localdomain/ 127.0.0.1:35214' 2009-05-14 04:03:56,336 INFO org.apache.hadoop.mapred.JobTracker: Serious problem. While updating status, cannot find taskid attempt_200905122015_0499_r_000004_1 2009-05-14 04:03:56,337 INFO org.apache.hadoop.mapred.JobTracker: Adding task 'attempt_200905122015_1183_r_000010_0' to tip task_200905122015_1183_r_000010, for tracker 'tracker_domU-12-31-38-01-81-31.compute-1.internal:localhost.localdomain/ 127.0.0.1:46518' 2009-05-14 04:03:56,343 INFO org.apache.hadoop.mapred.JobTracker: Serious problem. While updating status, cannot find taskid attempt_200905122015_1070_r_000014_1 2009-05-14 04:03:56,344 INFO org.apache.hadoop.mapred.JobTracker: Adding task 'attempt_200905122015_1183_r_000011_0' to tip task_200905122015_1183_r_000011, for tracker 'tracker_domU-12-31-38-01-AD-91.compute-1.internal:localhost.localdomain/ 127.0.0.1:33929' 2009-05-14 04:03:56,348 INFO org.apache.hadoop.ipc.Server: IPC Server handler 9 on 54311, call heartbeat(org.apache.hadoop.mapred.tasktrackersta...@a4fe4, false, true, 21461) from 10.253.178.95:50709: error: java.io.IOException: java.lang.NullPointerException java.io.IOException: java.lang.NullPointerException at org.apache.hadoop.mapred.JobTracker.getTasksToSave(JobTracker.java:2130) at org.apache.hadoop.mapred.JobTracker.heartbeat(JobTracker.java:1923) at sun.reflect.GeneratedMethodAccessor117.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:481) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:894) 2009-05-14 04:03:56,351 INFO org.apache.hadoop.mapred.JobTracker: attempt_200905122015_1183_r_000007_0 is 24 ms debug. 2009-05-14 04:03:56,419 INFO org.apache.hadoop.mapred.JobTracker: Adding task 'attempt_200905122015_1183_r_000012_0' to tip task_200905122015_1183_r_000012, for tracker 'tracker_domU-12-31-38-00-F0-41.compute-1.internal:localhost.localdomain/ 127.0.0.1:58504' 2009-05-14 04:03:56,459 INFO org.apache.hadoop.mapred.JobTracker: Serious problem. While updating status, cannot find taskid attempt_200905122015_1070_r_000014_1 2009-05-14 04:03:56,459 WARN org.apache.hadoop.mapred.TaskInProgress: Recieved duplicate status update of 'KILLED' for 'attempt_200905122015_1182_r_000009_1' of TIP 'task_200905122015_1182_r_000009' 2009-05-14 04:03:56,460 INFO org.apache.hadoop.mapred.JobTracker: Adding task 'attempt_200905122015_1183_r_000013_0' to tip task_200905122015_1183_r_000013, for tracker 'tracker_domU-12-31-38-01-AD-91.compute-1.internal:localhost.localdomain/ 127.0.0.1:33929' 2009-05-14 04:03:56,460 INFO org.apache.hadoop.ipc.Server: IPC Server handler 5 on 54311, call heartbeat(org.apache.hadoop.mapred.tasktrackersta...@1f1c507, false, true, 21461) from 10.253.178.95:50709: error: java.io.IOException: java.lang.NullPointerExceptio n java.io.IOException: java.lang.NullPointerException at org.apache.hadoop.mapred.JobTracker.getTasksToSave(JobTracker.java:2130) at org.apache.hadoop.mapred.JobTracker.heartbeat(JobTracker.java:1923) at sun.reflect.GeneratedMethodAccessor117.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:481) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:894) 2009-05-14 04:03:56,464 INFO org.apache.hadoop.mapred.JobTracker: Serious problem. While updating status, cannot find taskid attempt_200905122015_1070_r_000014_1 2009-05-14 04:03:56,464 WARN org.apache.hadoop.mapred.TaskInProgress: Recieved duplicate status update of 'KILLED' for 'attempt_200905122015_1182_r_000009_1' of TIP 'task_200905122015_1182_r_000009' 2009-05-14 04:03:56,465 INFO org.apache.hadoop.mapred.JobTracker: Adding task 'attempt_200905122015_1183_r_000014_0' to tip task_200905122015_1183_r_000014, for tracker 'tracker_domU-12-31-38-01-AD-91.compute-1.internal:localhost.localdomain/ 127.0.0.1:33929' 2009-05-14 04:03:56,465 INFO org.apache.hadoop.ipc.Server: IPC Server handler 2 on 54311, call heartbeat(org.apache.hadoop.mapred.tasktrackersta...@1d54fc9, false, true, 21461) from 10.253.178.95:50709: error: java.io.IOException: java.lang.NullPointerExceptio n <and on and on an on for 10 gigs worth in a few hours> On Thu, May 14, 2009 at 8:37 AM, Lance Riedel <la...@dotspots.com> wrote: > Just had another cluster crash with the same issue. This is still a huge > issue for us- still crashing our cluster every other night (actually almost > every night now). > > Should we move to .20? Is there more information i can provide? Is this > related to my other email "Constantly getting DiskErrorExceptions - but > logged as INFO"? I haven't seen responses on that. > > Thanks! > Lance > > On Thu, May 14, 2009 at 7:48 AM, Lance Riedel <la...@dotspots.com> wrote: > >> Here is the latest here.. Haven't heard any more, but every other night we >> get 10 gigs logs and tons of failed tasks and have to restart the cluster >> >> ---------- Forwarded message ---------- >> From: Lance Riedel <la...@dotspots.com> >> Date: Fri, May 8, 2009 at 10:49 AM >> Subject: Re: Infinite Loop Resending status from task tracker >> To: core-user@hadoop.apache.org >> Cc: Brian Long <br...@dotspots.com> >> >> >> Hi Todd, >> Sorry, my response got hung up in my outbox for a couple of days.. arghh >> >> >> >> Confirmed that 1) we are not running out of space and 2) that our >> mapred.local.dir directory is not in /tmp >> >> Not sure if this an ec2 problem with a mounted drive? >> >> We had the same thing happen again, exact same logs and symptoms >> (simultaneous in jobtracker and tasktracker) >> >> Thinking about moving to .20 because of this, any thoughts on that? >> >> Thanks, >> Lance >> >> >> On May 4, 2009, at 4:18 PM, Todd Lipcon wrote: >> >> Hi Lance, >>> >>> Two thoughts here that might be the culprit: >>> >>> 1) Is it possible that the partition that your mapred.local.dir is on is >>> out >>> of space on that task tracker? >>> >>> 2) Is it possible that you're using a directory under /tmp for >>> mapred.local.dir and some system cron script cleared out /tmp? >>> >>> -Todd >>> >>> On Sat, May 2, 2009 at 9:01 AM, Lance Riedel <la...@dotspots.com> wrote: >>> >>> Hi Todd, >>>> Not sure if this is related, but our hadoop cluster in general is >>>> getting >>>> more and more unstable. the logs are full of this error message (but >>>> having >>>> trouble tracking down the root problem): >>>> >>>> 2009-05-02 11:30:39,294 INFO org.apache.hadoop.mapred.TaskTracker: >>>> org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find >>>> >>>> taskTracker/jobcache/job_200904301103_1111/attempt_200904301103_1111_m_000001_1/output/file.out >>>> in any of the configured local directories >>>> 2009-05-02 11:30:39,294 INFO org.apache.hadoop.mapred.TaskTracker: >>>> org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find >>>> >>>> taskTracker/jobcache/job_200904301103_1675/attempt_200904301103_1675_r_000012_1/output/file.out >>>> in any of the configured local directories >>>> 2009-05-02 11:30:44,295 INFO org.apache.hadoop.mapred.TaskTracker: >>>> org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find >>>> >>>> taskTracker/jobcache/job_200904301103_0944/attempt_200904301103_0944_r_000015_0/output/file.out >>>> in any of the configured local directories >>>> 2009-05-02 11:30:44,295 INFO org.apache.hadoop.mapred.TaskTracker: >>>> org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find >>>> >>>> taskTracker/jobcache/job_200904301103_1111/attempt_200904301103_1111_m_000001_1/output/file.out >>>> in any of the configured local directories >>>> 2009-05-02 11:30:44,295 INFO org.apache.hadoop.mapred.TaskTracker: >>>> org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find >>>> >>>> taskTracker/jobcache/job_200904301103_1675/attempt_200904301103_1675_r_000012_1/output/file.out >>>> in any of the configured local directories >>>> 2009-05-02 11:30:49,296 INFO org.apache.hadoop.mapred.TaskTracker: >>>> org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find >>>> >>>> taskTracker/jobcache/job_200904301103_0944/attempt_200904301103_0944_r_000015_0/output/file.out >>>> in any of the configured local directories >>>> 2009-05-02 11:30:49,296 INFO org.apache.hadoop.mapred.TaskTracker: >>>> org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find >>>> >>>> taskTracker/jobcache/job_200904301103_1111/attempt_200904301103_1111_m_000001_1/output/file.out >>>> in any of the configured local directories >>>> 2009-05-02 11:30:49,297 INFO org.apache.hadoop.mapred.TaskTracker: >>>> org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find >>>> >>>> taskTracker/jobcache/job_200904301103_1675/attempt_200904301103_1675_r_000012_1/output/file.out >>>> in any of the configured local directories >>>> 2009-05-02 11:30:54,298 INFO org.apache.hadoop.mapred.TaskTracker: >>>> org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find >>>> >>>> taskTracker/jobcache/job_200904301103_0944/attempt_200904301103_0944_r_000015_0/output/file.out >>>> in any of the configured local directories >>>> >>>> >>>> Lance >>>> >>>> >>>> On Apr 30, 2009, at 12:04 PM, Todd Lipcon wrote: >>>> >>>> Hey Lance, >>>> >>>>> >>>>> Thanks for the logs. They definitely confirmed my suspicion. There are >>>>> two >>>>> problems here: >>>>> >>>>> 1) If the JobTracker throws an exception during processing of a >>>>> heartbeat, >>>>> the tasktracker retries with no delay, since lastHeartbeat isn't >>>>> updated >>>>> in >>>>> TaskTracker.offerService. This is related to HADOOP-3987 >>>>> >>>>> 2) If the TaskTracker sends a task in COMMIT_PENDING state with an >>>>> invalid >>>>> task id, the jobtracker will trigger a NullPointerException in >>>>> JobTracker.getTasksToSave. Instead it should probably create a >>>>> KillTaskAction. I just filed a JIRA to track this issue: >>>>> >>>>> https://issues.apache.org/jira/browse/HADOOP-5761 >>>>> >>>>> 3) The TaskTracker somehow had a task attempt in COMMIT_PENDING state >>>>> that >>>>> the JobTracker didn't know about. How it got there is a separate >>>>> problem >>>>> that's a bit harder to track down. >>>>> >>>>> Thanks >>>>> -Todd >>>>> >>>>> On Thu, Apr 30, 2009 at 11:17 AM, Lance Riedel <la...@dotspots.com> >>>>> wrote: >>>>> >>>>> Here are the job tracker logs from the same time (and yes.. there is >>>>> >>>>>> something there!!): >>>>>> >>>>>> >>>>>> 2009-04-30 02:34:28,484 INFO org.apache.hadoop.mapred.JobTracker: >>>>>> Serious >>>>>> problem. While updating status, cannot find taskid >>>>>> attempt_200904291917_0252_r_000003_0 >>>>>> >>>>>> >>>>>> 2009-04-30 02:34:40,215 INFO org.apache.hadoop.ipc.Server: IPC Server >>>>>> handler 2 on 54311, call >>>>>> heartbeat(org.apache.hadoop.mapred.tasktrackersta...@1a93388, false, >>>>>> true, >>>>>> 5341) from 10.253.134.191:42688: error: java.io.IOException: >>>>>> java.lang.NullPointerException >>>>>> java.io.IOException: java.lang.NullPointerException >>>>>> at >>>>>> >>>>>> org.apache.hadoop.mapred.JobTracker.getTasksToSave(JobTracker.java:2130) >>>>>> at >>>>>> org.apache.hadoop.mapred.JobTracker.heartbeat(JobTracker.java:1923) >>>>>> at sun.reflect.GeneratedMethodAccessor72.invoke(Unknown Source) >>>>>> at >>>>>> >>>>>> >>>>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) >>>>>> at java.lang.reflect.Method.invoke(Method.java:597) >>>>>> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:481) >>>>>> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:894) >>>>>> 2009-04-30 02:34:40,215 INFO org.apache.hadoop.mapred.JobTracker: >>>>>> Serious >>>>>> problem. While updating status, cannot find taskid >>>>>> attempt_200904291917_0296_r_000014_1 >>>>>> 2009-04-30 02:34:40,217 INFO org.apache.hadoop.mapred.JobTracker: >>>>>> Adding >>>>>> task 'attempt_200904291917_0352_r_000013_0' to tip >>>>>> task_200904291917_0352_r_000013, for tracker >>>>>> >>>>>> 'tracker_domU-12-31-38-00-F0-41.compute-1.internal:localhost.localdomain/ >>>>>> 127.0.0.1:42479' >>>>>> 2009-04-30 02:34:40,217 INFO org.apache.hadoop.mapred.JobTracker: >>>>>> Removed >>>>>> completed task 'attempt_200904291917_0343_m_000003_0' from >>>>>> >>>>>> 'tracker_domU-12-31-38-00-F0-41.compute-1.internal:localhost.localdomain/ >>>>>> 127.0.0.1:42479' >>>>>> 2009-04-30 02:34:40,217 INFO org.apache.hadoop.mapred.JobTracker: >>>>>> Removed >>>>>> completed task 'attempt_200904291917_0343_m_000007_0' from >>>>>> >>>>>> 'tracker_domU-12-31-38-00-F0-41.compute-1.internal:localhost.localdomain/ >>>>>> 127.0.0.1:42479' >>>>>> >>>>>> >>>>>> And then.. a LOT more >>>>>> >>>>>> >>>>>> 2009-04-30 02:34:40,433 INFO org.apache.hadoop.mapred.JobTracker: >>>>>> Serious >>>>>> problem. While updating status, cannot find taskid >>>>>> attempt_200904291917_0252_r_000003_0 >>>>>> 2009-04-30 02:34:40,433 WARN org.apache.hadoop.mapred.TaskInProgress: >>>>>> Recieved duplicate status update of 'KILLED' for >>>>>> 'attempt_200904291917_0352_m_000010_1' of TIP >>>>>> 'task_200904291917_0352_m_000010' >>>>>> 2009-04-30 02:34:40,433 INFO org.apache.hadoop.ipc.Server: IPC Server >>>>>> handler 1 on 54311, call >>>>>> heartbeat(org.apache.hadoop.mapred.tasktrackersta...@1b7b4c1, false, >>>>>> true, >>>>>> 5341) from 10.253.134.191:42688: error: java.io.IOException: >>>>>> java.lang.NullPointerException >>>>>> java.io.IOException: java.lang.NullPointerException >>>>>> at >>>>>> >>>>>> org.apache.hadoop.mapred.JobTracker.getTasksToSave(JobTracker.java:2130) >>>>>> at >>>>>> org.apache.hadoop.mapred.JobTracker.heartbeat(JobTracker.java:1923) >>>>>> at sun.reflect.GeneratedMethodAccessor72.invoke(Unknown Source) >>>>>> at >>>>>> >>>>>> >>>>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) >>>>>> at java.lang.reflect.Method.invoke(Method.java:597) >>>>>> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:481) >>>>>> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:894) >>>>>> 2009-04-30 02:34:40,441 INFO org.apache.hadoop.mapred.JobTracker: >>>>>> Serious >>>>>> problem. While updating status, cannot find taskid >>>>>> attempt_200904291917_0252_r_000003_0 >>>>>> 2009-04-30 02:34:40,441 WARN org.apache.hadoop.mapred.TaskInProgress: >>>>>> Recieved duplicate status update of 'KILLED' for >>>>>> 'attempt_200904291917_0352_m_000010_1' of TIP >>>>>> 'task_200904291917_0352_m_000010' >>>>>> 2009-04-30 02:34:40,442 INFO org.apache.hadoop.ipc.Server: IPC Server >>>>>> handler 1 on 54311, call >>>>>> heartbeat(org.apache.hadoop.mapred.tasktrackersta...@1598c57, false, >>>>>> true, >>>>>> 5341) from 10.253.134.191:42688: error: java.io.IOException: >>>>>> java.lang.NullPointerException >>>>>> java.io.IOException: java.lang.NullPointerException >>>>>> at >>>>>> >>>>>> org.apache.hadoop.mapred.JobTracker.getTasksToSave(JobTracker.java:2130) >>>>>> at >>>>>> org.apache.hadoop.mapred.JobTracker.heartbeat(JobTracker.java:1923) >>>>>> at sun.reflect.GeneratedMethodAccessor72.invoke(Unknown Source) >>>>>> at >>>>>> >>>>>> >>>>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) >>>>>> at java.lang.reflect.Method.invoke(Method.java:597) >>>>>> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:481) >>>>>> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:894) >>>>>> 2009-04-30 02:34:40,444 INFO org.apache.hadoop.mapred.JobTracker: >>>>>> Serious >>>>>> problem. While updating status, cannot find taskid >>>>>> attempt_200904291917_0252_r_000003_0 >>>>>> 2009-04-30 02:34:40,444 WARN org.apache.hadoop.mapred.TaskInProgress: >>>>>> Recieved duplicate status update of 'KILLED' for >>>>>> 'attempt_200904291917_0352_m_000010_1' of TIP >>>>>> 'task_200904291917_0352_m_000010' >>>>>> 2009-04-30 02:34:40,444 INFO org.apache.hadoop.ipc.Server: IPC Server >>>>>> handler 7 on 54311, call >>>>>> heartbeat(org.apache.hadoop.mapred.tasktrackersta...@cb3d5, false, >>>>>> true, >>>>>> 5341) from 10.253.134.191:42688: error: java.io.IOException: >>>>>> java.lang.NullPointerException >>>>>> java.io.IOException: java.lang.NullPointerException >>>>>> at >>>>>> >>>>>> org.apache.hadoop.mapred.JobTracker.getTasksToSave(JobTracker.java:2130) >>>>>> at >>>>>> org.apache.hadoop.mapred.JobTracker.heartbeat(JobTracker.java:1923) >>>>>> at sun.reflect.GeneratedMethodAccessor72.invoke(Unknown Source) >>>>>> at >>>>>> >>>>>> >>>>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) >>>>>> at java.lang.reflect.Method.invoke(Method.java:597) >>>>>> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:481) >>>>>> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:894) >>>>>> 2009-04-30 02:34:40,447 INFO org.apache.hadoop.mapred.JobTracker: >>>>>> Serious >>>>>> problem. While updating status, cannot find taskid >>>>>> attempt_200904291917_0252_r_000003_0 >>>>>> 2009-04-30 02:34:40,447 WARN org.apache.hadoop.mapred.TaskInProgress: >>>>>> Recieved duplicate status update of 'KILLED' for >>>>>> 'attempt_200904291917_0352_m_000010_1' of TIP >>>>>> 'task_200904291917_0352_m_000010' >>>>>> 2009-04-30 02:34:40,447 INFO org.apache.hadoop.ipc.Server: IPC Server >>>>>> handler 1 on 54311, call >>>>>> heartbeat(org.apache.hadoop.mapred.tasktrackersta...@1f27d98, false, >>>>>> true, >>>>>> 5341) from 10.253.134.191:42688: error: java.io.IOException: >>>>>> java.lang.NullPointerException >>>>>> java.io.IOException: java.lang.NullPointerException >>>>>> at >>>>>> >>>>>> org.apache.hadoop.mapred.JobTracker.getTasksToSave(JobTracker.java:2130) >>>>>> at >>>>>> org.apache.hadoop.mapred.JobTracker.heartbeat(JobTracker.java:1923) >>>>>> at sun.reflect.GeneratedMethodAccessor72.invoke(Unknown Source) >>>>>> at >>>>>> >>>>>> >>>>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) >>>>>> at java.lang.reflect.Method.invoke(Method.java:597) >>>>>> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:481) >>>>>> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:894) >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> On Apr 30, 2009, at 11:04 AM, Todd Lipcon wrote: >>>>>> >>>>>> Hey Lance, >>>>>> >>>>>> >>>>>>> Did you see any error messages in the JobTracker logs around the time >>>>>>> this >>>>>>> started? I think I understand how this might happen. >>>>>>> >>>>>>> Thanks, >>>>>>> -Todd >>>>>>> >>>>>>> On Thu, Apr 30, 2009 at 10:45 AM, Lance Riedel <la...@dotspots.com> >>>>>>> wrote: >>>>>>> >>>>>>> I have not been able to reproduce. We are using version 19.1 with >>>>>>> the >>>>>>> >>>>>>> following patches: >>>>>>>> 4780-2v19.patch (Jira 4780) >>>>>>>> closeAll3.patch (Jira 3998) >>>>>>>> >>>>>>>> Thanks, >>>>>>>> Lance >>>>>>>> >>>>>>>> >>>>>>>> On Apr 30, 2009, at 10:40 AM, Todd Lipcon wrote: >>>>>>>> >>>>>>>> Hi Lance, >>>>>>>> >>>>>>>> >>>>>>>> Can I ask what version you were running when you saw this? Is it >>>>>>>>> reproducible? I'm trying to look at the code path that might >>>>>>>>> produce >>>>>>>>> such >>>>>>>>> a >>>>>>>>> behavior and want to make sure I'm looking at the right version. >>>>>>>>> >>>>>>>>> Thanks >>>>>>>>> -Todd >>>>>>>>> >>>>>>>>> On Thu, Apr 30, 2009 at 9:33 AM, Lance Riedel <la...@dotspots.com> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>> Has anyone seen this before? Our task tracker produced a 2.7 gig >>>>>>>>> log >>>>>>>>> file >>>>>>>>> >>>>>>>>> in a few hours. The entry is all the same (every 2 ms): >>>>>>>>> >>>>>>>>>> >>>>>>>>>> 2009-04-30 02:34:40,207 INFO org.apache.hadoop.mapred.TaskTracker: >>>>>>>>>> Resending 'status' to 'ec2-xx-xx-xx-xx.compute-1.amazonaws.com' >>>>>>>>>> with >>>>>>>>>> reponseId '5341 >>>>>>>>>> 2009-04-30 02:34:40,398 INFO org.apache.hadoop.mapred.TaskTracker: >>>>>>>>>> Resending 'status' to 'ec2-xx-xx-xx-xx.compute-1.amazonaws.com' >>>>>>>>>> with >>>>>>>>>> reponseId '5341 >>>>>>>>>> 2009-04-30 02:34:40,403 INFO org.apache.hadoop.mapred.TaskTracker: >>>>>>>>>> Resending 'status' to 'ec2-xx-xx-xx-xx.compute-1.amazonaws.com' >>>>>>>>>> with >>>>>>>>>> reponseId '5341 >>>>>>>>>> 2009-04-30 02:34:40,411 INFO org.apache.hadoop.mapred.TaskTracker: >>>>>>>>>> Resending 'status' to 'ec2-xx-xx-xx-xx.compute-1.amazonaws.com' >>>>>>>>>> with >>>>>>>>>> reponseId '5341 >>>>>>>>>> 2009-04-30 02:34:40,414 INFO org.apache.hadoop.mapred.TaskTracker: >>>>>>>>>> Resending 'status' to 'ec2-xx-xx-xx-xx.compute-1.amazonaws.com' >>>>>>>>>> with >>>>>>>>>> reponseId '5341 >>>>>>>>>> 2009-04-30 02:34:40,417 INFO org.apache.hadoop.mapred.TaskTracker: >>>>>>>>>> Resending 'status' to 'ec2-xx-xx-xx-xx.compute-1.amazonaws.com' >>>>>>>>>> with >>>>>>>>>> reponseId '5341 >>>>>>>>>> 2009-04-30 02:34:40,420 INFO org.apache.hadoop.mapred.TaskTracker: >>>>>>>>>> Resending 'status' to 'ec2-xx-xx-xx-xx.compute-1.amazonaws.com' >>>>>>>>>> with >>>>>>>>>> reponseId '5341 >>>>>>>>>> ... (And on and on and on...) >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> These are the few lines before it started: >>>>>>>>>> >>>>>>>>>> 2009-04-30 02:34:29,780 INFO >>>>>>>>>> org.apache.hadoop.mapred.TaskTracker.clienttrace: src: >>>>>>>>>> xxx.xxx.xxx.xxx:50060, dest: 10.253.178.95:40268, bytes: 3341324, >>>>>>>>>> op: >>>>>>>>>> MAPRED_SHUFFLE, cliID: attempt_200904291917_0352_m_000006_0 >>>>>>>>>> 2009-04-30 02:34:31,522 INFO org.apache.hadoop.mapred.TaskTracker: >>>>>>>>>> Sent >>>>>>>>>> out >>>>>>>>>> 418891 bytes for reduce: 12 from map: >>>>>>>>>> attempt_200904291917_0352_m_000007_0 >>>>>>>>>> given 418891/418887 from 4301462 with (22, 171) >>>>>>>>>> 2009-04-30 02:34:31,522 INFO >>>>>>>>>> org.apache.hadoop.mapred.TaskTracker.clienttrace: src: >>>>>>>>>> xxx.xxx.xxx.xxx:50060, dest: xxx.xxx.xxx.xxx:40268, bytes: 418891, >>>>>>>>>> op: >>>>>>>>>> MAPRED_SHUFFLE, cliID: attempt_200904291917_0352_m_000007_0 >>>>>>>>>> 2009-04-30 02:34:35,382 INFO org.apache.hadoop.mapred.TaskTracker: >>>>>>>>>> attempt_200904291917_0352_r_000003_0 0.3030303% reduce > copy (10 >>>>>>>>>> of >>>>>>>>>> 11 >>>>>>>>>> at >>>>>>>>>> 0.32 MB/s) > >>>>>>>>>> 2009-04-30 02:34:38,385 INFO org.apache.hadoop.mapred.TaskTracker: >>>>>>>>>> attempt_200904291917_0352_r_000003_0 0.3030303% reduce > copy (10 >>>>>>>>>> of >>>>>>>>>> 11 >>>>>>>>>> at >>>>>>>>>> 0.32 MB/s) > >>>>>>>>>> 2009-04-30 02:34:40,207 INFO org.apache.hadoop.mapred.TaskTracker: >>>>>>>>>> Resending 'status' to 'ec2-xx-xx-xx-xx.compute-1.amazonaws.com' >>>>>>>>>> with >>>>>>>>>> reponseId '5341 >>>>>>>>>> 2009-04-30 02:34:40,398 INFO org.apache.hadoop.mapred.TaskTracker: >>>>>>>>>> Resending 'status' to 'ec2-xx-xx-xx-xx.compute-1.amazonaws.com' >>>>>>>>>> with >>>>>>>>>> reponseId '5341 >>>>>>>>>> 2009-04-30 02:34:40,403 INFO org.apache.hadoop.mapred.TaskTracker: >>>>>>>>>> Resending 'status' to 'ec2-xx-xx-xx-xx.compute-1.amazonaws.com' >>>>>>>>>> with >>>>>>>>>> reponseId '5341 >>>>>>>>>> 2009-04-30 02:34:40,411 INFO org.apache.hadoop.mapred.TaskTracker: >>>>>>>>>> Resending 'status' to 'ec2-xx-xx-xx-xx.compute-1.amazonaws.com' >>>>>>>>>> with >>>>>>>>>> reponseId '5341 >>>>>>>>>> 2009-04-30 02:34:40,414 INFO org.apache.hadoop.mapred.TaskTracker: >>>>>>>>>> Resending 'status' to 'ec2-xx-xx-xx-xx.compute-1.amazonaws.com' >>>>>>>>>> with >>>>>>>>>> reponseId '5341 >>>>>>>>>> 2009-04-30 02:34:40,417 INFO org.apache.hadoop.mapred.TaskTracker: >>>>>>>>>> Resending 'status' to 'ec2-xx-xx-xx-xx.compute-1.amazonaws.com' >>>>>>>>>> with >>>>>>>>>> reponseId '5341 >>>>>>>>>> 2009-04-30 02:34:40,420 INFO org.apache.hadoop.mapred.TaskTracker: >>>>>>>>>> Resending 'status' to 'ec2-xx-xx-xx-xx.compute-1.amazonaws.com' >>>>>>>>>> with >>>>>>>>>> reponseId '5341 >>>>>>>>>> >>>>>>>>>> --And on for 2+ gigs >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>> >>>>>> >>>> >> >> >