Can anyone shed some light on this exception? We are on a 60 node, 260 core, 0.19.0 cluster, and everything hums along fine, but every 1 or two weeks we see a bunch of these in the logs, the first on heartbeat, and then one for submitJob for every job we try to start. The job tracker becomes unresponsive and has to be killed and restarted, but the tasktrackers all appear fine and in fact we never stop/start those during this. After the restart the same job submits without issue. The only thing I noticed in the logs leading up to it was that a job finished just before the job that triggered this error for the first time started up. Here is the full trace leading up to it, and thanks for the help:
2009-06-07 01:12:12,578 INFO org.apache.hadoop.mapred.JobInProgress: Job job_200906022136_0106 has completed successfully. 2009-06-07 01:12:12,599 INFO org.apache.hadoop.ipc.Server: IPC Server handler 3 on 54311, call heartbeat(org.apache.hadoop.mapred.tasktrackersta...@8df3ca7, false, true, -22339) from 172.21.30.46:60840: error: java.io.IOException: java.lang.NullPointerException java.io.IOException: java.lang.NullPointerException at org.apache.hadoop.ipc.Client$Connection.sendParam(Client.java:459) at org.apache.hadoop.ipc.Client.call(Client.java:686) at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:216) at $Proxy4.complete(Unknown Source) at sun.reflect.GeneratedMethodAccessor14.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59) at $Proxy4.complete(Unknown Source) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.closeInternal(DFSClient.java:3129) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.close(DFSClient.java:3053) at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:59) at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:79) at sun.nio.cs.StreamEncoder.implClose(StreamEncoder.java:301) at sun.nio.cs.StreamEncoder.close(StreamEncoder.java:130) at java.io.OutputStreamWriter.close(OutputStreamWriter.java:216) at java.io.BufferedWriter.close(BufferedWriter.java:248) at java.io.PrintWriter.close(PrintWriter.java:295) at org.apache.hadoop.mapred.JobHistory$JobInfo.logFinished(JobHistory.java:1024) at org.apache.hadoop.mapred.JobInProgress.jobComplete(JobInProgress.java:1906) at org.apache.hadoop.mapred.JobInProgress.completedTask(JobInProgress.java:1855) at org.apache.hadoop.mapred.JobInProgress.updateTaskStatus(JobInProgress.java:786) at org.apache.hadoop.mapred.JobTracker.updateTaskStatuses(JobTracker.java:2613) at org.apache.hadoop.mapred.JobTracker.processHeartbeat(JobTracker.java:2056) at org.apache.hadoop.mapred.JobTracker.heartbeat(JobTracker.java:1866) at sun.reflect.GeneratedMethodAccessor23.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:452) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:892) 2009-06-07 01:12:14,632 INFO org.apache.hadoop.mapred.JobTracker: Removed completed task 'attempt_200906022136_0106_r_000028_1' from 'tracker_dup041.iad.hadoop.net:localhost.localdomain/127.0.0.1:56041' 2009-06-07 01:12:15,064 INFO org.apache.hadoop.mapred.JobTracker: Removed completed task 'attempt_200906022136_0106_r_000019_2' from 'tracker_dup022.iad.hadoop.net:localhost.localdomain/127.0.0.1:44525' 2009-06-07 01:12:15,463 INFO org.apache.hadoop.mapred.JobTracker: Removed completed task 'attempt_200906022136_0106_r_000009_2' from 'tracker_dup042.iad.hadoop.net:localhost.localdomain/127.0.0.1:47675' 2009-06-07 01:12:15,927 INFO org.apache.hadoop.mapred.JobTracker: Removed completed task 'attempt_200906022136_0106_r_000016_2' from 'tracker_dup008.iad.hadoop.net:localhost.localdomain/127.0.0.1:46724' 2009-06-07 01:12:15,986 INFO org.apache.hadoop.mapred.JobTracker: Removed completed task 'attempt_200906022136_0106_r_000007_2' from 'tracker_dup003.iad.hadoop.net:localhost.localdomain/127.0.0.1:44460' 2009-06-07 01:12:16,300 INFO org.apache.hadoop.mapred.JobTracker: Removed completed task 'attempt_200906022136_0106_r_000014_2' from 'tracker_dup034.iad.hadoop.net:localhost.localdomain/127.0.0.1:35848' 2009-06-07 01:12:16,400 INFO org.apache.hadoop.mapred.JobTracker: Removed completed task 'attempt_200906022136_0106_r_000025_2' from 'tracker_dup009.iad.hadoop.net:localhost.localdomain/127.0.0.1:53461' 2009-06-07 01:12:16,409 INFO org.apache.hadoop.mapred.JobTracker: Removed completed task 'attempt_200906022136_0106_m_003000_0' from 'tracker_dup046.iad.hadoop.net:localhost.localdomain/127.0.0.1:53340' 2009-06-07 01:12:16,494 INFO org.apache.hadoop.mapred.JobTracker: Removed completed task 'attempt_200906022136_0106_r_000018_2' from 'tracker_dup048.iad.hadoop.net:localhost.localdomain/127.0.0.1:51644' 2009-06-07 01:12:16,773 INFO org.apache.hadoop.mapred.JobTracker: Removed completed task 'attempt_200906022136_0106_r_000005_2' from 'tracker_dup060.iad.hadoop.net:localhost.localdomain/127.0.0.1:52454' 2009-06-07 01:12:19,717 INFO org.apache.hadoop.mapred.JobTracker: Removed completed task 'attempt_200906022136_0106_r_000032_1' from 'tracker_dup016.iad.hadoop.net:localhost.localdomain/127.0.0.1:41496' 2009-06-07 01:12:33,121 INFO org.apache.hadoop.ipc.Server: IPC Server handler 3 on 54311, call submitJob(job_200906022136_0110) from 172.21.30.1:50229: error: java.io.IOException: java.lang.NullPointerException java.io.IOException: java.lang.NullPointerException at org.apache.hadoop.ipc.Client$Connection.sendParam(Client.java:459) at org.apache.hadoop.ipc.Client.call(Client.java:686) at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:216) at $Proxy4.getFileInfo(Unknown Source) at sun.reflect.GeneratedMethodAccessor10.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59) at $Proxy4.getFileInfo(Unknown Source) at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:578) at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:390) at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:192) at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:142) at org.apache.hadoop.fs.FileSystem.copyToLocalFile(FileSystem.java:1214) at org.apache.hadoop.fs.FileSystem.copyToLocalFile(FileSystem.java:1195) at org.apache.hadoop.mapred.JobInProgress.<init>(JobInProgress.java:212) at org.apache.hadoop.mapred.JobTracker.submitJob(JobTracker.java:2230) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:452) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:892) 2009-06-07 02:00:07,348 INFO org.apache.hadoop.ipc.Server: IPC Server handler 1 on 54311, call submitJob(job_200906022136_0111) from 172.21.31.248:39823: error: java.io.IOException: java.lang.NullPointerException java.io.IOException: java.lang.NullPointerException at org.apache.hadoop.ipc.Client$Connection.sendParam(Client.java:459) at org.apache.hadoop.ipc.Client.call(Client.java:686) at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:216) at $Proxy4.getFileInfo(Unknown Source) at sun.reflect.GeneratedMethodAccessor10.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59) at $Proxy4.getFileInfo(Unknown Source) at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:578) at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:390) at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:192) at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:142) at org.apache.hadoop.fs.FileSystem.copyToLocalFile(FileSystem.java:1214) at org.apache.hadoop.fs.FileSystem.copyToLocalFile(FileSystem.java:1195) at org.apache.hadoop.mapred.JobInProgress.<init>(JobInProgress.java:212) at org.apache.hadoop.mapred.JobTracker.submitJob(JobTracker.java:2230) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:452) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:892) 2009-06-07 02:00:20,664 INFO org.apache.hadoop.ipc.Server: IPC Server handler 4 on 54311, call submitJob(job_200906022136_0112) from 172.21.31.249:33716: error: java.io.IOException: java.lang.NullPointerException java.io.IOException: java.lang.NullPointerException at org.apache.hadoop.ipc.Client$Connection.sendParam(Client.java:459) The information transmitted in this email is intended only for the person(s) or entity to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient is prohibited. If you received this email in error, please contact the sender and permanently delete the email from any computer.