Here are the job tracker logs from the same time (and yes.. there is
something there!!):
2009-04-30 02:34:28,484 INFO org.apache.hadoop.mapred.JobTracker:
Serious
problem. While updating status, cannot find taskid
attempt_200904291917_0252_r_000003_0
2009-04-30 02:34:40,215 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 2 on 54311, call
heartbeat(org.apache.hadoop.mapred.tasktrackersta...@1a93388,
false, true,
5341) from 10.253.134.191:42688: error: java.io.IOException:
java.lang.NullPointerException
java.io.IOException: java.lang.NullPointerException
at
org.apache.hadoop.mapred.JobTracker.getTasksToSave(JobTracker.java:
2130)
at
org.apache.hadoop.mapred.JobTracker.heartbeat(JobTracker.java:1923)
at sun.reflect.GeneratedMethodAccessor72.invoke(Unknown Source)
at
sun
.reflect
.DelegatingMethodAccessorImpl
.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:481)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:894)
2009-04-30 02:34:40,215 INFO org.apache.hadoop.mapred.JobTracker:
Serious
problem. While updating status, cannot find taskid
attempt_200904291917_0296_r_000014_1
2009-04-30 02:34:40,217 INFO org.apache.hadoop.mapred.JobTracker:
Adding
task 'attempt_200904291917_0352_r_000013_0' to tip
task_200904291917_0352_r_000013, for tracker
'tracker_domU-12-31-38-00-
F0-41.compute-1.internal:localhost.localdomain/
127.0.0.1:42479'
2009-04-30 02:34:40,217 INFO org.apache.hadoop.mapred.JobTracker:
Removed
completed task 'attempt_200904291917_0343_m_000003_0' from
'tracker_domU-12-31-38-00-
F0-41.compute-1.internal:localhost.localdomain/
127.0.0.1:42479'
2009-04-30 02:34:40,217 INFO org.apache.hadoop.mapred.JobTracker:
Removed
completed task 'attempt_200904291917_0343_m_000007_0' from
'tracker_domU-12-31-38-00-
F0-41.compute-1.internal:localhost.localdomain/
127.0.0.1:42479'
And then.. a LOT more
2009-04-30 02:34:40,433 INFO org.apache.hadoop.mapred.JobTracker:
Serious
problem. While updating status, cannot find taskid
attempt_200904291917_0252_r_000003_0
2009-04-30 02:34:40,433 WARN org.apache.hadoop.mapred.TaskInProgress:
Recieved duplicate status update of 'KILLED' for
'attempt_200904291917_0352_m_000010_1' of TIP
'task_200904291917_0352_m_000010'
2009-04-30 02:34:40,433 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 1 on 54311, call
heartbeat(org.apache.hadoop.mapred.tasktrackersta...@1b7b4c1,
false, true,
5341) from 10.253.134.191:42688: error: java.io.IOException:
java.lang.NullPointerException
java.io.IOException: java.lang.NullPointerException
at
org.apache.hadoop.mapred.JobTracker.getTasksToSave(JobTracker.java:
2130)
at
org.apache.hadoop.mapred.JobTracker.heartbeat(JobTracker.java:1923)
at sun.reflect.GeneratedMethodAccessor72.invoke(Unknown Source)
at
sun
.reflect
.DelegatingMethodAccessorImpl
.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:481)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:894)
2009-04-30 02:34:40,441 INFO org.apache.hadoop.mapred.JobTracker:
Serious
problem. While updating status, cannot find taskid
attempt_200904291917_0252_r_000003_0
2009-04-30 02:34:40,441 WARN org.apache.hadoop.mapred.TaskInProgress:
Recieved duplicate status update of 'KILLED' for
'attempt_200904291917_0352_m_000010_1' of TIP
'task_200904291917_0352_m_000010'
2009-04-30 02:34:40,442 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 1 on 54311, call
heartbeat(org.apache.hadoop.mapred.tasktrackersta...@1598c57,
false, true,
5341) from 10.253.134.191:42688: error: java.io.IOException:
java.lang.NullPointerException
java.io.IOException: java.lang.NullPointerException
at
org.apache.hadoop.mapred.JobTracker.getTasksToSave(JobTracker.java:
2130)
at
org.apache.hadoop.mapred.JobTracker.heartbeat(JobTracker.java:1923)
at sun.reflect.GeneratedMethodAccessor72.invoke(Unknown Source)
at
sun
.reflect
.DelegatingMethodAccessorImpl
.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:481)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:894)
2009-04-30 02:34:40,444 INFO org.apache.hadoop.mapred.JobTracker:
Serious
problem. While updating status, cannot find taskid
attempt_200904291917_0252_r_000003_0
2009-04-30 02:34:40,444 WARN org.apache.hadoop.mapred.TaskInProgress:
Recieved duplicate status update of 'KILLED' for
'attempt_200904291917_0352_m_000010_1' of TIP
'task_200904291917_0352_m_000010'
2009-04-30 02:34:40,444 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 7 on 54311, call
heartbeat(org.apache.hadoop.mapred.tasktrackersta...@cb3d5, false,
true,
5341) from 10.253.134.191:42688: error: java.io.IOException:
java.lang.NullPointerException
java.io.IOException: java.lang.NullPointerException
at
org.apache.hadoop.mapred.JobTracker.getTasksToSave(JobTracker.java:
2130)
at
org.apache.hadoop.mapred.JobTracker.heartbeat(JobTracker.java:1923)
at sun.reflect.GeneratedMethodAccessor72.invoke(Unknown Source)
at
sun
.reflect
.DelegatingMethodAccessorImpl
.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:481)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:894)
2009-04-30 02:34:40,447 INFO org.apache.hadoop.mapred.JobTracker:
Serious
problem. While updating status, cannot find taskid
attempt_200904291917_0252_r_000003_0
2009-04-30 02:34:40,447 WARN org.apache.hadoop.mapred.TaskInProgress:
Recieved duplicate status update of 'KILLED' for
'attempt_200904291917_0352_m_000010_1' of TIP
'task_200904291917_0352_m_000010'
2009-04-30 02:34:40,447 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 1 on 54311, call
heartbeat(org.apache.hadoop.mapred.tasktrackersta...@1f27d98,
false, true,
5341) from 10.253.134.191:42688: error: java.io.IOException:
java.lang.NullPointerException
java.io.IOException: java.lang.NullPointerException
at
org.apache.hadoop.mapred.JobTracker.getTasksToSave(JobTracker.java:
2130)
at
org.apache.hadoop.mapred.JobTracker.heartbeat(JobTracker.java:1923)
at sun.reflect.GeneratedMethodAccessor72.invoke(Unknown Source)
at
sun
.reflect
.DelegatingMethodAccessorImpl
.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:481)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:894)
On Apr 30, 2009, at 11:04 AM, Todd Lipcon wrote:
Hey Lance,
Did you see any error messages in the JobTracker logs around the
time this
started? I think I understand how this might happen.
Thanks,
-Todd
On Thu, Apr 30, 2009 at 10:45 AM, Lance Riedel <la...@dotspots.com>
wrote:
I have not been able to reproduce. We are using version 19.1 with
the
following patches:
4780-2v19.patch (Jira 4780)
closeAll3.patch (Jira 3998)
Thanks,
Lance
On Apr 30, 2009, at 10:40 AM, Todd Lipcon wrote:
Hi Lance,
Can I ask what version you were running when you saw this? Is it
reproducible? I'm trying to look at the code path that might
produce
such
a
behavior and want to make sure I'm looking at the right version.
Thanks
-Todd
On Thu, Apr 30, 2009 at 9:33 AM, Lance Riedel <la...@dotspots.com>
wrote:
Has anyone seen this before? Our task tracker produced a 2.7 gig
log
file
in a few hours. The entry is all the same (every 2 ms):
2009-04-30 02:34:40,207 INFO
org.apache.hadoop.mapred.TaskTracker:
Resending 'status' to 'ec2-xx-xx-xx-xx.compute-1.amazonaws.com'
with
reponseId '5341
2009-04-30 02:34:40,398 INFO
org.apache.hadoop.mapred.TaskTracker:
Resending 'status' to 'ec2-xx-xx-xx-xx.compute-1.amazonaws.com'
with
reponseId '5341
2009-04-30 02:34:40,403 INFO
org.apache.hadoop.mapred.TaskTracker:
Resending 'status' to 'ec2-xx-xx-xx-xx.compute-1.amazonaws.com'
with
reponseId '5341
2009-04-30 02:34:40,411 INFO
org.apache.hadoop.mapred.TaskTracker:
Resending 'status' to 'ec2-xx-xx-xx-xx.compute-1.amazonaws.com'
with
reponseId '5341
2009-04-30 02:34:40,414 INFO
org.apache.hadoop.mapred.TaskTracker:
Resending 'status' to 'ec2-xx-xx-xx-xx.compute-1.amazonaws.com'
with
reponseId '5341
2009-04-30 02:34:40,417 INFO
org.apache.hadoop.mapred.TaskTracker:
Resending 'status' to 'ec2-xx-xx-xx-xx.compute-1.amazonaws.com'
with
reponseId '5341
2009-04-30 02:34:40,420 INFO
org.apache.hadoop.mapred.TaskTracker:
Resending 'status' to 'ec2-xx-xx-xx-xx.compute-1.amazonaws.com'
with
reponseId '5341
... (And on and on and on...)
These are the few lines before it started:
2009-04-30 02:34:29,780 INFO
org.apache.hadoop.mapred.TaskTracker.clienttrace: src:
xxx.xxx.xxx.xxx:50060, dest: 10.253.178.95:40268, bytes:
3341324, op:
MAPRED_SHUFFLE, cliID: attempt_200904291917_0352_m_000006_0
2009-04-30 02:34:31,522 INFO
org.apache.hadoop.mapred.TaskTracker: Sent
out
418891 bytes for reduce: 12 from map:
attempt_200904291917_0352_m_000007_0
given 418891/418887 from 4301462 with (22, 171)
2009-04-30 02:34:31,522 INFO
org.apache.hadoop.mapred.TaskTracker.clienttrace: src:
xxx.xxx.xxx.xxx:50060, dest: xxx.xxx.xxx.xxx:40268, bytes:
418891, op:
MAPRED_SHUFFLE, cliID: attempt_200904291917_0352_m_000007_0
2009-04-30 02:34:35,382 INFO
org.apache.hadoop.mapred.TaskTracker:
attempt_200904291917_0352_r_000003_0 0.3030303% reduce > copy
(10 of 11
at
0.32 MB/s) >
2009-04-30 02:34:38,385 INFO
org.apache.hadoop.mapred.TaskTracker:
attempt_200904291917_0352_r_000003_0 0.3030303% reduce > copy
(10 of 11
at
0.32 MB/s) >
2009-04-30 02:34:40,207 INFO
org.apache.hadoop.mapred.TaskTracker:
Resending 'status' to 'ec2-xx-xx-xx-xx.compute-1.amazonaws.com'
with
reponseId '5341
2009-04-30 02:34:40,398 INFO
org.apache.hadoop.mapred.TaskTracker:
Resending 'status' to 'ec2-xx-xx-xx-xx.compute-1.amazonaws.com'
with
reponseId '5341
2009-04-30 02:34:40,403 INFO
org.apache.hadoop.mapred.TaskTracker:
Resending 'status' to 'ec2-xx-xx-xx-xx.compute-1.amazonaws.com'
with
reponseId '5341
2009-04-30 02:34:40,411 INFO
org.apache.hadoop.mapred.TaskTracker:
Resending 'status' to 'ec2-xx-xx-xx-xx.compute-1.amazonaws.com'
with
reponseId '5341
2009-04-30 02:34:40,414 INFO
org.apache.hadoop.mapred.TaskTracker:
Resending 'status' to 'ec2-xx-xx-xx-xx.compute-1.amazonaws.com'
with
reponseId '5341
2009-04-30 02:34:40,417 INFO
org.apache.hadoop.mapred.TaskTracker:
Resending 'status' to 'ec2-xx-xx-xx-xx.compute-1.amazonaws.com'
with
reponseId '5341
2009-04-30 02:34:40,420 INFO
org.apache.hadoop.mapred.TaskTracker:
Resending 'status' to 'ec2-xx-xx-xx-xx.compute-1.amazonaws.com'
with
reponseId '5341
--And on for 2+ gigs