Here is the point in the logs where the infinite loop begins - see time
stamp 2009-05-14 04:03:56,348 : (JobTracker)

2009-05-14 04:03:56,324 INFO org.apache.hadoop.mapred.JobTracker: Removed
completed task 'attempt_200905122015_1168_m_000029_0' from
'tracker_domU-12-31-38-01-74-F1.compute-1.internal:localhost.localdomain/
127.0.0.1:35214'
2009-05-14 04:03:56,326 INFO org.apache.hadoop.mapred.JobTracker: Adding
task 'attempt_200905122015_1183_r_000007_0' to tip
task_200905122015_1183_r_000007, for tracker
'tracker_domU-12-31-38-00-F0-41.compute-1.internal:localhost.localdomain/
127.0.0.1:58504'
2009-05-14 04:03:56,327 INFO org.apache.hadoop.mapred.JobTracker: Adding
task (cleanup)'attempt_200905122015_1184_m_000000_1' to tip
task_200905122015_1184_m_000000, for tracker
'tracker_domU-12-31-38-00-80-21.compute-1.internal:localhost.localdomain/
127.0.0.1
:57741'
2009-05-14 04:03:56,330 INFO org.apache.hadoop.mapred.JobInProgress: Task
'attempt_200905122015_1182_r_000011_0' has completed
task_200905122015_1182_r_000011 successfully.
2009-05-14 04:03:56,330 INFO org.apache.hadoop.mapred.JobTracker: Adding
task (cleanup)'attempt_200905122015_1182_r_000010_1' to tip
task_200905122015_1182_r_000010, for tracker
'tracker_domU-12-31-38-01-5C-41.compute-1.internal:localhost.localdomain/
127.0.0.1
:46248'
2009-05-14 04:03:56,331 INFO org.apache.hadoop.mapred.JobTracker: Serious
problem.  While updating status, cannot find taskid
attempt_200905122015_0499_r_000004_1
2009-05-14 04:03:56,331 INFO org.apache.hadoop.mapred.JobInProgress: Task
'attempt_200905122015_1184_m_000004_1' has completed
task_200905122015_1184_m_000004 successfully.
2009-05-14 04:03:56,331 INFO org.apache.hadoop.mapred.ResourceEstimator:
measured blowup on task_200905122015_1184_m_000004 was 20150008/21581175 =
0.93368447269437372009-05-14 04:03:56,331 INFO
org.apache.hadoop.mapred.ResourceEstimator: new estimate is blowup =
0.9152383292400812
2009-05-14 04:03:56,335 INFO org.apache.hadoop.mapred.JobTracker: Adding
task 'attempt_200905122015_1183_r_000008_0' to tip
task_200905122015_1183_r_000008, for tracker
'tracker_domU-12-31-38-00-80-21.compute-1.internal:localhost.localdomain/
127.0.0.1:57741'
2009-05-14 04:03:56,336 INFO org.apache.hadoop.mapred.JobTracker: Adding
task 'attempt_200905122015_1183_r_000009_0' to tip
task_200905122015_1183_r_000009, for tracker
'tracker_domU-12-31-38-01-74-F1.compute-1.internal:localhost.localdomain/
127.0.0.1:35214'
2009-05-14 04:03:56,336 INFO org.apache.hadoop.mapred.JobTracker: Removed
completed task 'attempt_200905122015_1181_r_000009_0' from
'tracker_domU-12-31-38-01-74-F1.compute-1.internal:localhost.localdomain/
127.0.0.1:35214'
2009-05-14 04:03:56,336 INFO org.apache.hadoop.mapred.JobTracker: Serious
problem.  While updating status, cannot find taskid
attempt_200905122015_0499_r_000004_1
2009-05-14 04:03:56,337 INFO org.apache.hadoop.mapred.JobTracker: Adding
task 'attempt_200905122015_1183_r_000010_0' to tip
task_200905122015_1183_r_000010, for tracker
'tracker_domU-12-31-38-01-81-31.compute-1.internal:localhost.localdomain/
127.0.0.1:46518'
2009-05-14 04:03:56,343 INFO org.apache.hadoop.mapred.JobTracker: Serious
problem.  While updating status, cannot find taskid
attempt_200905122015_1070_r_000014_1
2009-05-14 04:03:56,344 INFO org.apache.hadoop.mapred.JobTracker: Adding
task 'attempt_200905122015_1183_r_000011_0' to tip
task_200905122015_1183_r_000011, for tracker
'tracker_domU-12-31-38-01-AD-91.compute-1.internal:localhost.localdomain/
127.0.0.1:33929'
2009-05-14 04:03:56,348 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 9 on 54311, call
heartbeat(org.apache.hadoop.mapred.tasktrackersta...@a4fe4, false, true,
21461) from 10.253.178.95:50709: error: java.io.IOException:
java.lang.NullPointerException
java.io.IOException: java.lang.NullPointerException
        at
org.apache.hadoop.mapred.JobTracker.getTasksToSave(JobTracker.java:2130)
        at
org.apache.hadoop.mapred.JobTracker.heartbeat(JobTracker.java:1923)
        at sun.reflect.GeneratedMethodAccessor117.invoke(Unknown
Source)        at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:481)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:894)
2009-05-14 04:03:56,351 INFO org.apache.hadoop.mapred.JobTracker:
attempt_200905122015_1183_r_000007_0 is 24 ms debug.
2009-05-14 04:03:56,419 INFO org.apache.hadoop.mapred.JobTracker: Adding
task 'attempt_200905122015_1183_r_000012_0' to tip
task_200905122015_1183_r_000012, for tracker
'tracker_domU-12-31-38-00-F0-41.compute-1.internal:localhost.localdomain/
127.0.0.1:58504'
2009-05-14 04:03:56,459 INFO org.apache.hadoop.mapred.JobTracker: Serious
problem.  While updating status, cannot find taskid
attempt_200905122015_1070_r_000014_1
2009-05-14 04:03:56,459 WARN org.apache.hadoop.mapred.TaskInProgress:
Recieved duplicate status update of 'KILLED' for
'attempt_200905122015_1182_r_000009_1' of TIP
'task_200905122015_1182_r_000009'
2009-05-14 04:03:56,460 INFO org.apache.hadoop.mapred.JobTracker: Adding
task 'attempt_200905122015_1183_r_000013_0' to tip
task_200905122015_1183_r_000013, for tracker
'tracker_domU-12-31-38-01-AD-91.compute-1.internal:localhost.localdomain/
127.0.0.1:33929'
2009-05-14 04:03:56,460 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 5 on 54311, call
heartbeat(org.apache.hadoop.mapred.tasktrackersta...@1f1c507, false, true,
21461) from 10.253.178.95:50709: error: java.io.IOException:
java.lang.NullPointerExceptio
n
java.io.IOException: java.lang.NullPointerException
        at
org.apache.hadoop.mapred.JobTracker.getTasksToSave(JobTracker.java:2130)
at org.apache.hadoop.mapred.JobTracker.heartbeat(JobTracker.java:1923)
        at sun.reflect.GeneratedMethodAccessor117.invoke(Unknown Source)
        at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:481)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:894)
2009-05-14 04:03:56,464 INFO org.apache.hadoop.mapred.JobTracker: Serious
problem.  While updating status, cannot find taskid
attempt_200905122015_1070_r_000014_1
2009-05-14 04:03:56,464 WARN org.apache.hadoop.mapred.TaskInProgress:
Recieved duplicate status update of 'KILLED' for
'attempt_200905122015_1182_r_000009_1' of TIP
'task_200905122015_1182_r_000009'
2009-05-14 04:03:56,465 INFO org.apache.hadoop.mapred.JobTracker: Adding
task 'attempt_200905122015_1183_r_000014_0' to tip
task_200905122015_1183_r_000014, for tracker
'tracker_domU-12-31-38-01-AD-91.compute-1.internal:localhost.localdomain/
127.0.0.1:33929'
2009-05-14 04:03:56,465 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 2 on 54311, call
heartbeat(org.apache.hadoop.mapred.tasktrackersta...@1d54fc9, false, true,
21461) from 10.253.178.95:50709: error: java.io.IOException:
java.lang.NullPointerExceptio
n


<and on and on an on for 10 gigs worth in a few hours>



On Thu, May 14, 2009 at 8:37 AM, Lance Riedel <la...@dotspots.com> wrote:

> Just had another cluster crash with the same issue. This is still a huge
> issue for us- still crashing our cluster every other night (actually almost
> every night now).
>
> Should we move to .20? Is there more information i can provide?  Is this
> related to my other email "Constantly getting DiskErrorExceptions - but
> logged as INFO"? I haven't seen responses on that.
>
> Thanks!
> Lance
>
> On Thu, May 14, 2009 at 7:48 AM, Lance Riedel <la...@dotspots.com> wrote:
>
>> Here is the latest here.. Haven't heard any more, but every other night we
>> get 10 gigs logs and tons of failed tasks and have to restart the cluster
>>
>> ---------- Forwarded message ----------
>> From: Lance Riedel <la...@dotspots.com>
>> Date: Fri, May 8, 2009 at 10:49 AM
>> Subject: Re: Infinite Loop Resending status from task tracker
>> To: core-user@hadoop.apache.org
>> Cc: Brian Long <br...@dotspots.com>
>>
>>
>> Hi Todd,
>> Sorry, my response got hung up in my outbox for a couple of days.. arghh
>>
>>
>>
>> Confirmed that 1) we are not running out of space and 2) that our
>> mapred.local.dir directory is not in /tmp
>>
>> Not sure if this an ec2 problem with a mounted drive?
>>
>> We had the same thing happen again, exact same logs and symptoms
>> (simultaneous in jobtracker and tasktracker)
>>
>> Thinking about moving to .20 because of this, any thoughts on that?
>>
>> Thanks,
>> Lance
>>
>>
>> On May 4, 2009, at 4:18 PM, Todd Lipcon wrote:
>>
>>  Hi Lance,
>>>
>>> Two thoughts here that might be the culprit:
>>>
>>> 1) Is it possible that the partition that your mapred.local.dir is on is
>>> out
>>> of space on that task tracker?
>>>
>>> 2) Is it possible that you're using a directory under /tmp for
>>> mapred.local.dir and some system cron script cleared out /tmp?
>>>
>>> -Todd
>>>
>>> On Sat, May 2, 2009 at 9:01 AM, Lance Riedel <la...@dotspots.com> wrote:
>>>
>>>  Hi Todd,
>>>> Not sure if this is related, but our hadoop cluster in general is
>>>> getting
>>>> more and more unstable.  the logs are full of this error message (but
>>>> having
>>>> trouble tracking down the root problem):
>>>>
>>>> 2009-05-02 11:30:39,294 INFO org.apache.hadoop.mapred.TaskTracker:
>>>> org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find
>>>>
>>>> taskTracker/jobcache/job_200904301103_1111/attempt_200904301103_1111_m_000001_1/output/file.out
>>>> in any of the configured local directories
>>>> 2009-05-02 11:30:39,294 INFO org.apache.hadoop.mapred.TaskTracker:
>>>> org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find
>>>>
>>>> taskTracker/jobcache/job_200904301103_1675/attempt_200904301103_1675_r_000012_1/output/file.out
>>>> in any of the configured local directories
>>>> 2009-05-02 11:30:44,295 INFO org.apache.hadoop.mapred.TaskTracker:
>>>> org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find
>>>>
>>>> taskTracker/jobcache/job_200904301103_0944/attempt_200904301103_0944_r_000015_0/output/file.out
>>>> in any of the configured local directories
>>>> 2009-05-02 11:30:44,295 INFO org.apache.hadoop.mapred.TaskTracker:
>>>> org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find
>>>>
>>>> taskTracker/jobcache/job_200904301103_1111/attempt_200904301103_1111_m_000001_1/output/file.out
>>>> in any of the configured local directories
>>>> 2009-05-02 11:30:44,295 INFO org.apache.hadoop.mapred.TaskTracker:
>>>> org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find
>>>>
>>>> taskTracker/jobcache/job_200904301103_1675/attempt_200904301103_1675_r_000012_1/output/file.out
>>>> in any of the configured local directories
>>>> 2009-05-02 11:30:49,296 INFO org.apache.hadoop.mapred.TaskTracker:
>>>> org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find
>>>>
>>>> taskTracker/jobcache/job_200904301103_0944/attempt_200904301103_0944_r_000015_0/output/file.out
>>>> in any of the configured local directories
>>>> 2009-05-02 11:30:49,296 INFO org.apache.hadoop.mapred.TaskTracker:
>>>> org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find
>>>>
>>>> taskTracker/jobcache/job_200904301103_1111/attempt_200904301103_1111_m_000001_1/output/file.out
>>>> in any of the configured local directories
>>>> 2009-05-02 11:30:49,297 INFO org.apache.hadoop.mapred.TaskTracker:
>>>> org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find
>>>>
>>>> taskTracker/jobcache/job_200904301103_1675/attempt_200904301103_1675_r_000012_1/output/file.out
>>>> in any of the configured local directories
>>>> 2009-05-02 11:30:54,298 INFO org.apache.hadoop.mapred.TaskTracker:
>>>> org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find
>>>>
>>>> taskTracker/jobcache/job_200904301103_0944/attempt_200904301103_0944_r_000015_0/output/file.out
>>>> in any of the configured local directories
>>>>
>>>>
>>>> Lance
>>>>
>>>>
>>>> On Apr 30, 2009, at 12:04 PM, Todd Lipcon wrote:
>>>>
>>>> Hey Lance,
>>>>
>>>>>
>>>>> Thanks for the logs. They definitely confirmed my suspicion. There are
>>>>> two
>>>>> problems here:
>>>>>
>>>>> 1) If the JobTracker throws an exception during processing of a
>>>>> heartbeat,
>>>>> the tasktracker retries with no delay, since lastHeartbeat isn't
>>>>> updated
>>>>> in
>>>>> TaskTracker.offerService. This is related to HADOOP-3987
>>>>>
>>>>> 2) If the TaskTracker sends a task in COMMIT_PENDING state with an
>>>>> invalid
>>>>> task id, the jobtracker will trigger a NullPointerException in
>>>>> JobTracker.getTasksToSave. Instead it should probably create a
>>>>> KillTaskAction. I just filed a JIRA to track this issue:
>>>>>
>>>>> https://issues.apache.org/jira/browse/HADOOP-5761
>>>>>
>>>>> 3) The TaskTracker somehow had a task attempt in COMMIT_PENDING state
>>>>> that
>>>>> the JobTracker didn't know about. How it got there is a separate
>>>>> problem
>>>>> that's a bit harder to track down.
>>>>>
>>>>> Thanks
>>>>> -Todd
>>>>>
>>>>> On Thu, Apr 30, 2009 at 11:17 AM, Lance Riedel <la...@dotspots.com>
>>>>> wrote:
>>>>>
>>>>> Here are the job tracker logs from the same time (and yes.. there is
>>>>>
>>>>>> something there!!):
>>>>>>
>>>>>>
>>>>>> 2009-04-30 02:34:28,484 INFO org.apache.hadoop.mapred.JobTracker:
>>>>>> Serious
>>>>>> problem.  While updating status, cannot find taskid
>>>>>> attempt_200904291917_0252_r_000003_0
>>>>>>
>>>>>>
>>>>>> 2009-04-30 02:34:40,215 INFO org.apache.hadoop.ipc.Server: IPC Server
>>>>>> handler 2 on 54311, call
>>>>>> heartbeat(org.apache.hadoop.mapred.tasktrackersta...@1a93388, false,
>>>>>> true,
>>>>>> 5341) from 10.253.134.191:42688: error: java.io.IOException:
>>>>>> java.lang.NullPointerException
>>>>>> java.io.IOException: java.lang.NullPointerException
>>>>>>    at
>>>>>>
>>>>>> org.apache.hadoop.mapred.JobTracker.getTasksToSave(JobTracker.java:2130)
>>>>>>    at
>>>>>> org.apache.hadoop.mapred.JobTracker.heartbeat(JobTracker.java:1923)
>>>>>>    at sun.reflect.GeneratedMethodAccessor72.invoke(Unknown Source)
>>>>>>    at
>>>>>>
>>>>>>
>>>>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>>>>>    at java.lang.reflect.Method.invoke(Method.java:597)
>>>>>>    at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:481)
>>>>>>    at org.apache.hadoop.ipc.Server$Handler.run(Server.java:894)
>>>>>> 2009-04-30 02:34:40,215 INFO org.apache.hadoop.mapred.JobTracker:
>>>>>> Serious
>>>>>> problem.  While updating status, cannot find taskid
>>>>>> attempt_200904291917_0296_r_000014_1
>>>>>> 2009-04-30 02:34:40,217 INFO org.apache.hadoop.mapred.JobTracker:
>>>>>> Adding
>>>>>> task 'attempt_200904291917_0352_r_000013_0' to tip
>>>>>> task_200904291917_0352_r_000013, for tracker
>>>>>>
>>>>>> 'tracker_domU-12-31-38-00-F0-41.compute-1.internal:localhost.localdomain/
>>>>>> 127.0.0.1:42479'
>>>>>> 2009-04-30 02:34:40,217 INFO org.apache.hadoop.mapred.JobTracker:
>>>>>> Removed
>>>>>> completed task 'attempt_200904291917_0343_m_000003_0' from
>>>>>>
>>>>>> 'tracker_domU-12-31-38-00-F0-41.compute-1.internal:localhost.localdomain/
>>>>>> 127.0.0.1:42479'
>>>>>> 2009-04-30 02:34:40,217 INFO org.apache.hadoop.mapred.JobTracker:
>>>>>> Removed
>>>>>> completed task 'attempt_200904291917_0343_m_000007_0' from
>>>>>>
>>>>>> 'tracker_domU-12-31-38-00-F0-41.compute-1.internal:localhost.localdomain/
>>>>>> 127.0.0.1:42479'
>>>>>>
>>>>>>
>>>>>> And then.. a LOT more
>>>>>>
>>>>>>
>>>>>> 2009-04-30 02:34:40,433 INFO org.apache.hadoop.mapred.JobTracker:
>>>>>> Serious
>>>>>> problem.  While updating status, cannot find taskid
>>>>>> attempt_200904291917_0252_r_000003_0
>>>>>> 2009-04-30 02:34:40,433 WARN org.apache.hadoop.mapred.TaskInProgress:
>>>>>> Recieved duplicate status update of 'KILLED' for
>>>>>> 'attempt_200904291917_0352_m_000010_1' of TIP
>>>>>> 'task_200904291917_0352_m_000010'
>>>>>> 2009-04-30 02:34:40,433 INFO org.apache.hadoop.ipc.Server: IPC Server
>>>>>> handler 1 on 54311, call
>>>>>> heartbeat(org.apache.hadoop.mapred.tasktrackersta...@1b7b4c1, false,
>>>>>> true,
>>>>>> 5341) from 10.253.134.191:42688: error: java.io.IOException:
>>>>>> java.lang.NullPointerException
>>>>>> java.io.IOException: java.lang.NullPointerException
>>>>>>    at
>>>>>>
>>>>>> org.apache.hadoop.mapred.JobTracker.getTasksToSave(JobTracker.java:2130)
>>>>>>    at
>>>>>> org.apache.hadoop.mapred.JobTracker.heartbeat(JobTracker.java:1923)
>>>>>>    at sun.reflect.GeneratedMethodAccessor72.invoke(Unknown Source)
>>>>>>    at
>>>>>>
>>>>>>
>>>>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>>>>>    at java.lang.reflect.Method.invoke(Method.java:597)
>>>>>>    at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:481)
>>>>>>    at org.apache.hadoop.ipc.Server$Handler.run(Server.java:894)
>>>>>> 2009-04-30 02:34:40,441 INFO org.apache.hadoop.mapred.JobTracker:
>>>>>> Serious
>>>>>> problem.  While updating status, cannot find taskid
>>>>>> attempt_200904291917_0252_r_000003_0
>>>>>> 2009-04-30 02:34:40,441 WARN org.apache.hadoop.mapred.TaskInProgress:
>>>>>> Recieved duplicate status update of 'KILLED' for
>>>>>> 'attempt_200904291917_0352_m_000010_1' of TIP
>>>>>> 'task_200904291917_0352_m_000010'
>>>>>> 2009-04-30 02:34:40,442 INFO org.apache.hadoop.ipc.Server: IPC Server
>>>>>> handler 1 on 54311, call
>>>>>> heartbeat(org.apache.hadoop.mapred.tasktrackersta...@1598c57, false,
>>>>>> true,
>>>>>> 5341) from 10.253.134.191:42688: error: java.io.IOException:
>>>>>> java.lang.NullPointerException
>>>>>> java.io.IOException: java.lang.NullPointerException
>>>>>>    at
>>>>>>
>>>>>> org.apache.hadoop.mapred.JobTracker.getTasksToSave(JobTracker.java:2130)
>>>>>>    at
>>>>>> org.apache.hadoop.mapred.JobTracker.heartbeat(JobTracker.java:1923)
>>>>>>    at sun.reflect.GeneratedMethodAccessor72.invoke(Unknown Source)
>>>>>>    at
>>>>>>
>>>>>>
>>>>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>>>>>    at java.lang.reflect.Method.invoke(Method.java:597)
>>>>>>    at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:481)
>>>>>>    at org.apache.hadoop.ipc.Server$Handler.run(Server.java:894)
>>>>>> 2009-04-30 02:34:40,444 INFO org.apache.hadoop.mapred.JobTracker:
>>>>>> Serious
>>>>>> problem.  While updating status, cannot find taskid
>>>>>> attempt_200904291917_0252_r_000003_0
>>>>>> 2009-04-30 02:34:40,444 WARN org.apache.hadoop.mapred.TaskInProgress:
>>>>>> Recieved duplicate status update of 'KILLED' for
>>>>>> 'attempt_200904291917_0352_m_000010_1' of TIP
>>>>>> 'task_200904291917_0352_m_000010'
>>>>>> 2009-04-30 02:34:40,444 INFO org.apache.hadoop.ipc.Server: IPC Server
>>>>>> handler 7 on 54311, call
>>>>>> heartbeat(org.apache.hadoop.mapred.tasktrackersta...@cb3d5, false,
>>>>>> true,
>>>>>> 5341) from 10.253.134.191:42688: error: java.io.IOException:
>>>>>> java.lang.NullPointerException
>>>>>> java.io.IOException: java.lang.NullPointerException
>>>>>>    at
>>>>>>
>>>>>> org.apache.hadoop.mapred.JobTracker.getTasksToSave(JobTracker.java:2130)
>>>>>>    at
>>>>>> org.apache.hadoop.mapred.JobTracker.heartbeat(JobTracker.java:1923)
>>>>>>    at sun.reflect.GeneratedMethodAccessor72.invoke(Unknown Source)
>>>>>>    at
>>>>>>
>>>>>>
>>>>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>>>>>    at java.lang.reflect.Method.invoke(Method.java:597)
>>>>>>    at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:481)
>>>>>>    at org.apache.hadoop.ipc.Server$Handler.run(Server.java:894)
>>>>>> 2009-04-30 02:34:40,447 INFO org.apache.hadoop.mapred.JobTracker:
>>>>>> Serious
>>>>>> problem.  While updating status, cannot find taskid
>>>>>> attempt_200904291917_0252_r_000003_0
>>>>>> 2009-04-30 02:34:40,447 WARN org.apache.hadoop.mapred.TaskInProgress:
>>>>>> Recieved duplicate status update of 'KILLED' for
>>>>>> 'attempt_200904291917_0352_m_000010_1' of TIP
>>>>>> 'task_200904291917_0352_m_000010'
>>>>>> 2009-04-30 02:34:40,447 INFO org.apache.hadoop.ipc.Server: IPC Server
>>>>>> handler 1 on 54311, call
>>>>>> heartbeat(org.apache.hadoop.mapred.tasktrackersta...@1f27d98, false,
>>>>>> true,
>>>>>> 5341) from 10.253.134.191:42688: error: java.io.IOException:
>>>>>> java.lang.NullPointerException
>>>>>> java.io.IOException: java.lang.NullPointerException
>>>>>>    at
>>>>>>
>>>>>> org.apache.hadoop.mapred.JobTracker.getTasksToSave(JobTracker.java:2130)
>>>>>>    at
>>>>>> org.apache.hadoop.mapred.JobTracker.heartbeat(JobTracker.java:1923)
>>>>>>    at sun.reflect.GeneratedMethodAccessor72.invoke(Unknown Source)
>>>>>>    at
>>>>>>
>>>>>>
>>>>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>>>>>    at java.lang.reflect.Method.invoke(Method.java:597)
>>>>>>    at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:481)
>>>>>>    at org.apache.hadoop.ipc.Server$Handler.run(Server.java:894)
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Apr 30, 2009, at 11:04 AM, Todd Lipcon wrote:
>>>>>>
>>>>>> Hey Lance,
>>>>>>
>>>>>>
>>>>>>> Did you see any error messages in the JobTracker logs around the time
>>>>>>> this
>>>>>>> started? I think I understand how this might happen.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> -Todd
>>>>>>>
>>>>>>> On Thu, Apr 30, 2009 at 10:45 AM, Lance Riedel <la...@dotspots.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>> I have not been able to reproduce.  We are using version 19.1 with
>>>>>>> the
>>>>>>>
>>>>>>>  following patches:
>>>>>>>> 4780-2v19.patch (Jira  4780)
>>>>>>>> closeAll3.patch (Jira 3998)
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Lance
>>>>>>>>
>>>>>>>>
>>>>>>>> On Apr 30, 2009, at 10:40 AM, Todd Lipcon wrote:
>>>>>>>>
>>>>>>>> Hi Lance,
>>>>>>>>
>>>>>>>>
>>>>>>>>  Can I ask what version you were running when you saw this? Is it
>>>>>>>>> reproducible? I'm trying to look at the code path that might
>>>>>>>>> produce
>>>>>>>>> such
>>>>>>>>> a
>>>>>>>>> behavior and want to make sure I'm looking at the right version.
>>>>>>>>>
>>>>>>>>> Thanks
>>>>>>>>> -Todd
>>>>>>>>>
>>>>>>>>> On Thu, Apr 30, 2009 at 9:33 AM, Lance Riedel <la...@dotspots.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>> Has anyone seen this before? Our task tracker produced a 2.7 gig
>>>>>>>>> log
>>>>>>>>> file
>>>>>>>>>
>>>>>>>>> in a few hours. The entry is all the same (every 2 ms):
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> 2009-04-30 02:34:40,207 INFO org.apache.hadoop.mapred.TaskTracker:
>>>>>>>>>> Resending 'status' to 'ec2-xx-xx-xx-xx.compute-1.amazonaws.com'
>>>>>>>>>> with
>>>>>>>>>> reponseId '5341
>>>>>>>>>> 2009-04-30 02:34:40,398 INFO org.apache.hadoop.mapred.TaskTracker:
>>>>>>>>>> Resending 'status' to 'ec2-xx-xx-xx-xx.compute-1.amazonaws.com'
>>>>>>>>>> with
>>>>>>>>>> reponseId '5341
>>>>>>>>>> 2009-04-30 02:34:40,403 INFO org.apache.hadoop.mapred.TaskTracker:
>>>>>>>>>> Resending 'status' to 'ec2-xx-xx-xx-xx.compute-1.amazonaws.com'
>>>>>>>>>> with
>>>>>>>>>> reponseId '5341
>>>>>>>>>> 2009-04-30 02:34:40,411 INFO org.apache.hadoop.mapred.TaskTracker:
>>>>>>>>>> Resending 'status' to 'ec2-xx-xx-xx-xx.compute-1.amazonaws.com'
>>>>>>>>>> with
>>>>>>>>>> reponseId '5341
>>>>>>>>>> 2009-04-30 02:34:40,414 INFO org.apache.hadoop.mapred.TaskTracker:
>>>>>>>>>> Resending 'status' to 'ec2-xx-xx-xx-xx.compute-1.amazonaws.com'
>>>>>>>>>> with
>>>>>>>>>> reponseId '5341
>>>>>>>>>> 2009-04-30 02:34:40,417 INFO org.apache.hadoop.mapred.TaskTracker:
>>>>>>>>>> Resending 'status' to 'ec2-xx-xx-xx-xx.compute-1.amazonaws.com'
>>>>>>>>>> with
>>>>>>>>>> reponseId '5341
>>>>>>>>>> 2009-04-30 02:34:40,420 INFO org.apache.hadoop.mapred.TaskTracker:
>>>>>>>>>> Resending 'status' to 'ec2-xx-xx-xx-xx.compute-1.amazonaws.com'
>>>>>>>>>> with
>>>>>>>>>> reponseId '5341
>>>>>>>>>> ... (And on and on and on...)
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> These are the few lines before it started:
>>>>>>>>>>
>>>>>>>>>> 2009-04-30 02:34:29,780 INFO
>>>>>>>>>> org.apache.hadoop.mapred.TaskTracker.clienttrace: src:
>>>>>>>>>> xxx.xxx.xxx.xxx:50060, dest: 10.253.178.95:40268, bytes: 3341324,
>>>>>>>>>> op:
>>>>>>>>>> MAPRED_SHUFFLE, cliID: attempt_200904291917_0352_m_000006_0
>>>>>>>>>> 2009-04-30 02:34:31,522 INFO org.apache.hadoop.mapred.TaskTracker:
>>>>>>>>>> Sent
>>>>>>>>>> out
>>>>>>>>>> 418891 bytes for reduce: 12 from map:
>>>>>>>>>> attempt_200904291917_0352_m_000007_0
>>>>>>>>>> given 418891/418887 from 4301462 with (22, 171)
>>>>>>>>>> 2009-04-30 02:34:31,522 INFO
>>>>>>>>>> org.apache.hadoop.mapred.TaskTracker.clienttrace: src:
>>>>>>>>>> xxx.xxx.xxx.xxx:50060, dest: xxx.xxx.xxx.xxx:40268, bytes: 418891,
>>>>>>>>>> op:
>>>>>>>>>> MAPRED_SHUFFLE, cliID: attempt_200904291917_0352_m_000007_0
>>>>>>>>>> 2009-04-30 02:34:35,382 INFO org.apache.hadoop.mapred.TaskTracker:
>>>>>>>>>> attempt_200904291917_0352_r_000003_0 0.3030303% reduce > copy (10
>>>>>>>>>> of
>>>>>>>>>> 11
>>>>>>>>>> at
>>>>>>>>>> 0.32 MB/s) >
>>>>>>>>>> 2009-04-30 02:34:38,385 INFO org.apache.hadoop.mapred.TaskTracker:
>>>>>>>>>> attempt_200904291917_0352_r_000003_0 0.3030303% reduce > copy (10
>>>>>>>>>> of
>>>>>>>>>> 11
>>>>>>>>>> at
>>>>>>>>>> 0.32 MB/s) >
>>>>>>>>>> 2009-04-30 02:34:40,207 INFO org.apache.hadoop.mapred.TaskTracker:
>>>>>>>>>> Resending 'status' to 'ec2-xx-xx-xx-xx.compute-1.amazonaws.com'
>>>>>>>>>> with
>>>>>>>>>> reponseId '5341
>>>>>>>>>> 2009-04-30 02:34:40,398 INFO org.apache.hadoop.mapred.TaskTracker:
>>>>>>>>>> Resending 'status' to 'ec2-xx-xx-xx-xx.compute-1.amazonaws.com'
>>>>>>>>>> with
>>>>>>>>>> reponseId '5341
>>>>>>>>>> 2009-04-30 02:34:40,403 INFO org.apache.hadoop.mapred.TaskTracker:
>>>>>>>>>> Resending 'status' to 'ec2-xx-xx-xx-xx.compute-1.amazonaws.com'
>>>>>>>>>> with
>>>>>>>>>> reponseId '5341
>>>>>>>>>> 2009-04-30 02:34:40,411 INFO org.apache.hadoop.mapred.TaskTracker:
>>>>>>>>>> Resending 'status' to 'ec2-xx-xx-xx-xx.compute-1.amazonaws.com'
>>>>>>>>>> with
>>>>>>>>>> reponseId '5341
>>>>>>>>>> 2009-04-30 02:34:40,414 INFO org.apache.hadoop.mapred.TaskTracker:
>>>>>>>>>> Resending 'status' to 'ec2-xx-xx-xx-xx.compute-1.amazonaws.com'
>>>>>>>>>> with
>>>>>>>>>> reponseId '5341
>>>>>>>>>> 2009-04-30 02:34:40,417 INFO org.apache.hadoop.mapred.TaskTracker:
>>>>>>>>>> Resending 'status' to 'ec2-xx-xx-xx-xx.compute-1.amazonaws.com'
>>>>>>>>>> with
>>>>>>>>>> reponseId '5341
>>>>>>>>>> 2009-04-30 02:34:40,420 INFO org.apache.hadoop.mapred.TaskTracker:
>>>>>>>>>> Resending 'status' to 'ec2-xx-xx-xx-xx.compute-1.amazonaws.com'
>>>>>>>>>> with
>>>>>>>>>> reponseId '5341
>>>>>>>>>>
>>>>>>>>>> --And on for    2+ gigs
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>
>>>>>>
>>>>
>>
>>
>

Reply via email to