Re: tasktracker keep recevied KillJobAction and then delete unknown job while using hive
Hi, + hdfs-user (bcc'd) which jre version u use? - Alex -- Alexander Lorenz http://mapredit.blogspot.com On Feb 1, 2012, at 8:16 AM, Xiaobin She wrote: hi , I'm using hive to do some log analysis, and I have encountered a problem. My cluster have 3 nodes, one for NameNode/JobTracker and the other two for DataNode/TaskTracker One of the tasktracker will repeatedly receive KillJobAction and then delete unknown jobs the logs look like: 2012-01-31 00:35:37,640 INFO org.apache.hadoop.mapred.TaskTracker: Received 'KillJobAction' for job: job_201201301055_0381 2012-01-31 00:35:37,640 WARN org.apache.hadoop.mapred.TaskTracker: Unknown job job_201201301055_0381 being deleted. 2012-01-31 00:36:22,697 INFO org.apache.hadoop.mapred.TaskTracker: Received 'KillJobAction' for job: job_201201301055_0383 2012-01-31 00:36:22,698 WARN org.apache.hadoop.mapred.TaskTracker: Unknown job job_201201301055_0383 being deleted. 2012-01-31 01:05:34,108 INFO org.apache.hadoop.mapred.TaskTracker: Received 'KillJobAction' for job: job_201201301055_0384 2012-01-31 01:05:34,108 WARN org.apache.hadoop.mapred.TaskTracker: Unknown job job_201201301055_0384 being deleted. 2012-01-31 01:07:43,280 INFO org.apache.hadoop.mapred.TaskTracker: Received 'KillJobAction' for job: job_201201301055_0385 2012-01-31 01:07:43,280 WARN org.apache.hadoop.mapred.TaskTracker: Unknown job job_201201301055_0385 being deleted. this happens occasionally, and if this happens, this tasktracker will do notghing but keep receiveing KillJobAction and delete unknown job, and thus the performance will drop down. to solve this problem, I have to restart the cluster. but obviously, this is not a good solution. these jobs eventually will be run on the other tasktracker, and they will run well, the job will success. has anybody have encountered this problem and give me some advices? and occasionally there will be some errlog like: 2012-01-31 13:11:40,183 INFO org.apache.hadoop.ipc.Server: IPC Server listener on 55837: readAndProcess threw exception java.io.IOException: Connection reset by peer. Count of bytes read: 0 java.io.IOException: Connection reset by peer at sun.nio.ch.FileDispatcher.read0(Native Method) at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:21) at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:202) at sun.nio.ch.IOUtil.read(IOUtil.java:175) at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:243) at org.apache.hadoop.ipc.Server.channelRead(Server.java:1211) at org.apache.hadoop.ipc.Server.access$2300(Server.java:77) at org.apache.hadoop.ipc.Server$Connection.readAndProcess(Server.java:799) at org.apache.hadoop.ipc.Server$Listener.doRead(Server.java:419) at org.apache.hadoop.ipc.Server$Listener.run(Server.java:328) 2012-01-31 13:11:40,211 INFO org.apache.hadoop.mapred.JvmManager: JVM : jvm_201201311041_0071_r_-1096994286 exited. Number of tasks it ran: 0 2012-01-31 13:11:40,214 INFO org.apache.hadoop.mapred.TaskTracker: Killing unknown JVM jvm_201201311041_0071_r_-386575334 2012-01-31 13:11:40,221 INFO org.apache.hadoop.ipc.Server: IPC Server listener on 55837: readAndProcess threw exception java.io.IOException: Connection reset by peer. Count of bytes read: 0 java.io.IOException: Connection reset by peer at sun.nio.ch.FileDispatcher.read0(Native Method) at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:21) at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:202) at sun.nio.ch.IOUtil.read(IOUtil.java:175) at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:243) at org.apache.hadoop.ipc.Server.channelRead(Server.java:1211) at org.apache.hadoop.ipc.Server.access$2300(Server.java:77) at org.apache.hadoop.ipc.Server$Connection.readAndProcess(Server.java:799) at org.apache.hadoop.ipc.Server$Listener.doRead(Server.java:419) at org.apache.hadoop.ipc.Server$Listener.run(Server.java:328) Is there some connections between these two errors? thank you very much! xiaobin
Re: tasktracker keep recevied KillJobAction and then delete unknown job while using hive
hi Alex, I'm using jre 1.6.0_24 with hadoop 0.20.0 hive 0.80 thx 2012/2/1 alo alt wget.n...@googlemail.com Hi, + hdfs-user (bcc'd) which jre version u use? - Alex -- Alexander Lorenz http://mapredit.blogspot.com On Feb 1, 2012, at 8:16 AM, Xiaobin She wrote: hi , I'm using hive to do some log analysis, and I have encountered a problem. My cluster have 3 nodes, one for NameNode/JobTracker and the other two for DataNode/TaskTracker One of the tasktracker will repeatedly receive KillJobAction and then delete unknown jobs the logs look like: 2012-01-31 00:35:37,640 INFO org.apache.hadoop.mapred.TaskTracker: Received 'KillJobAction' for job: job_201201301055_0381 2012-01-31 00:35:37,640 WARN org.apache.hadoop.mapred.TaskTracker: Unknown job job_201201301055_0381 being deleted. 2012-01-31 00:36:22,697 INFO org.apache.hadoop.mapred.TaskTracker: Received 'KillJobAction' for job: job_201201301055_0383 2012-01-31 00:36:22,698 WARN org.apache.hadoop.mapred.TaskTracker: Unknown job job_201201301055_0383 being deleted. 2012-01-31 01:05:34,108 INFO org.apache.hadoop.mapred.TaskTracker: Received 'KillJobAction' for job: job_201201301055_0384 2012-01-31 01:05:34,108 WARN org.apache.hadoop.mapred.TaskTracker: Unknown job job_201201301055_0384 being deleted. 2012-01-31 01:07:43,280 INFO org.apache.hadoop.mapred.TaskTracker: Received 'KillJobAction' for job: job_201201301055_0385 2012-01-31 01:07:43,280 WARN org.apache.hadoop.mapred.TaskTracker: Unknown job job_201201301055_0385 being deleted. this happens occasionally, and if this happens, this tasktracker will do notghing but keep receiveing KillJobAction and delete unknown job, and thus the performance will drop down. to solve this problem, I have to restart the cluster. but obviously, this is not a good solution. these jobs eventually will be run on the other tasktracker, and they will run well, the job will success. has anybody have encountered this problem and give me some advices? and occasionally there will be some errlog like: 2012-01-31 13:11:40,183 INFO org.apache.hadoop.ipc.Server: IPC Server listener on 55837: readAndProcess threw exception java.io.IOException: Connection reset by peer. Count of bytes read: 0 java.io.IOException: Connection reset by peer at sun.nio.ch.FileDispatcher.read0(Native Method) at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:21) at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:202) at sun.nio.ch.IOUtil.read(IOUtil.java:175) at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:243) at org.apache.hadoop.ipc.Server.channelRead(Server.java:1211) at org.apache.hadoop.ipc.Server.access$2300(Server.java:77) at org.apache.hadoop.ipc.Server$Connection.readAndProcess(Server.java:799) at org.apache.hadoop.ipc.Server$Listener.doRead(Server.java:419) at org.apache.hadoop.ipc.Server$Listener.run(Server.java:328) 2012-01-31 13:11:40,211 INFO org.apache.hadoop.mapred.JvmManager: JVM : jvm_201201311041_0071_r_-1096994286 exited. Number of tasks it ran: 0 2012-01-31 13:11:40,214 INFO org.apache.hadoop.mapred.TaskTracker: Killing unknown JVM jvm_201201311041_0071_r_-386575334 2012-01-31 13:11:40,221 INFO org.apache.hadoop.ipc.Server: IPC Server listener on 55837: readAndProcess threw exception java.io.IOException: Connection reset by peer. Count of bytes read: 0 java.io.IOException: Connection reset by peer at sun.nio.ch.FileDispatcher.read0(Native Method) at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:21) at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:202) at sun.nio.ch.IOUtil.read(IOUtil.java:175) at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:243) at org.apache.hadoop.ipc.Server.channelRead(Server.java:1211) at org.apache.hadoop.ipc.Server.access$2300(Server.java:77) at org.apache.hadoop.ipc.Server$Connection.readAndProcess(Server.java:799) at org.apache.hadoop.ipc.Server$Listener.doRead(Server.java:419) at org.apache.hadoop.ipc.Server$Listener.run(Server.java:328) Is there some connections between these two errors? thank you very much! xiaobin
Re: tasktracker keep recevied KillJobAction and then delete unknown job while using hive
How much namenode handler (dfs.namenode.handler.count) you have defined for your cluster? - Alex -- Alexander Lorenz http://mapredit.blogspot.com On Feb 1, 2012, at 12:25 PM, Xiaobin She wrote: hi Alex, I'm using jre 1.6.0_24 with hadoop 0.20.0 hive 0.80 thx 2012/2/1 alo alt wget.n...@googlemail.com Hi, + hdfs-user (bcc'd) which jre version u use? - Alex -- Alexander Lorenz http://mapredit.blogspot.com On Feb 1, 2012, at 8:16 AM, Xiaobin She wrote: hi , I'm using hive to do some log analysis, and I have encountered a problem. My cluster have 3 nodes, one for NameNode/JobTracker and the other two for DataNode/TaskTracker One of the tasktracker will repeatedly receive KillJobAction and then delete unknown jobs the logs look like: 2012-01-31 00:35:37,640 INFO org.apache.hadoop.mapred.TaskTracker: Received 'KillJobAction' for job: job_201201301055_0381 2012-01-31 00:35:37,640 WARN org.apache.hadoop.mapred.TaskTracker: Unknown job job_201201301055_0381 being deleted. 2012-01-31 00:36:22,697 INFO org.apache.hadoop.mapred.TaskTracker: Received 'KillJobAction' for job: job_201201301055_0383 2012-01-31 00:36:22,698 WARN org.apache.hadoop.mapred.TaskTracker: Unknown job job_201201301055_0383 being deleted. 2012-01-31 01:05:34,108 INFO org.apache.hadoop.mapred.TaskTracker: Received 'KillJobAction' for job: job_201201301055_0384 2012-01-31 01:05:34,108 WARN org.apache.hadoop.mapred.TaskTracker: Unknown job job_201201301055_0384 being deleted. 2012-01-31 01:07:43,280 INFO org.apache.hadoop.mapred.TaskTracker: Received 'KillJobAction' for job: job_201201301055_0385 2012-01-31 01:07:43,280 WARN org.apache.hadoop.mapred.TaskTracker: Unknown job job_201201301055_0385 being deleted. this happens occasionally, and if this happens, this tasktracker will do notghing but keep receiveing KillJobAction and delete unknown job, and thus the performance will drop down. to solve this problem, I have to restart the cluster. but obviously, this is not a good solution. these jobs eventually will be run on the other tasktracker, and they will run well, the job will success. has anybody have encountered this problem and give me some advices? and occasionally there will be some errlog like: 2012-01-31 13:11:40,183 INFO org.apache.hadoop.ipc.Server: IPC Server listener on 55837: readAndProcess threw exception java.io.IOException: Connection reset by peer. Count of bytes read: 0 java.io.IOException: Connection reset by peer at sun.nio.ch.FileDispatcher.read0(Native Method) at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:21) at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:202) at sun.nio.ch.IOUtil.read(IOUtil.java:175) at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:243) at org.apache.hadoop.ipc.Server.channelRead(Server.java:1211) at org.apache.hadoop.ipc.Server.access$2300(Server.java:77) at org.apache.hadoop.ipc.Server$Connection.readAndProcess(Server.java:799) at org.apache.hadoop.ipc.Server$Listener.doRead(Server.java:419) at org.apache.hadoop.ipc.Server$Listener.run(Server.java:328) 2012-01-31 13:11:40,211 INFO org.apache.hadoop.mapred.JvmManager: JVM : jvm_201201311041_0071_r_-1096994286 exited. Number of tasks it ran: 0 2012-01-31 13:11:40,214 INFO org.apache.hadoop.mapred.TaskTracker: Killing unknown JVM jvm_201201311041_0071_r_-386575334 2012-01-31 13:11:40,221 INFO org.apache.hadoop.ipc.Server: IPC Server listener on 55837: readAndProcess threw exception java.io.IOException: Connection reset by peer. Count of bytes read: 0 java.io.IOException: Connection reset by peer at sun.nio.ch.FileDispatcher.read0(Native Method) at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:21) at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:202) at sun.nio.ch.IOUtil.read(IOUtil.java:175) at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:243) at org.apache.hadoop.ipc.Server.channelRead(Server.java:1211) at org.apache.hadoop.ipc.Server.access$2300(Server.java:77) at org.apache.hadoop.ipc.Server$Connection.readAndProcess(Server.java:799) at org.apache.hadoop.ipc.Server$Listener.doRead(Server.java:419) at org.apache.hadoop.ipc.Server$Listener.run(Server.java:328) Is there some connections between these two errors? thank you very much! xiaobin
Re: tasktracker keep recevied KillJobAction and then delete unknown job while using hive
How much namenode handler (dfs.namenode.handler.count) you have defined for your cluster? - Alex -- Alexander Lorenz http://mapredit.blogspot.com On Feb 1, 2012, at 12:25 PM, Xiaobin She wrote: hi Alex, I'm using jre 1.6.0_24 with hadoop 0.20.0 hive 0.80 thx 2012/2/1 alo alt wget.n...@googlemail.com Hi, + hdfs-user (bcc'd) which jre version u use? - Alex -- Alexander Lorenz http://mapredit.blogspot.com On Feb 1, 2012, at 8:16 AM, Xiaobin She wrote: hi , I'm using hive to do some log analysis, and I have encountered a problem. My cluster have 3 nodes, one for NameNode/JobTracker and the other two for DataNode/TaskTracker One of the tasktracker will repeatedly receive KillJobAction and then delete unknown jobs the logs look like: 2012-01-31 00:35:37,640 INFO org.apache.hadoop.mapred.TaskTracker: Received 'KillJobAction' for job: job_201201301055_0381 2012-01-31 00:35:37,640 WARN org.apache.hadoop.mapred.TaskTracker: Unknown job job_201201301055_0381 being deleted. 2012-01-31 00:36:22,697 INFO org.apache.hadoop.mapred.TaskTracker: Received 'KillJobAction' for job: job_201201301055_0383 2012-01-31 00:36:22,698 WARN org.apache.hadoop.mapred.TaskTracker: Unknown job job_201201301055_0383 being deleted. 2012-01-31 01:05:34,108 INFO org.apache.hadoop.mapred.TaskTracker: Received 'KillJobAction' for job: job_201201301055_0384 2012-01-31 01:05:34,108 WARN org.apache.hadoop.mapred.TaskTracker: Unknown job job_201201301055_0384 being deleted. 2012-01-31 01:07:43,280 INFO org.apache.hadoop.mapred.TaskTracker: Received 'KillJobAction' for job: job_201201301055_0385 2012-01-31 01:07:43,280 WARN org.apache.hadoop.mapred.TaskTracker: Unknown job job_201201301055_0385 being deleted. this happens occasionally, and if this happens, this tasktracker will do notghing but keep receiveing KillJobAction and delete unknown job, and thus the performance will drop down. to solve this problem, I have to restart the cluster. but obviously, this is not a good solution. these jobs eventually will be run on the other tasktracker, and they will run well, the job will success. has anybody have encountered this problem and give me some advices? and occasionally there will be some errlog like: 2012-01-31 13:11:40,183 INFO org.apache.hadoop.ipc.Server: IPC Server listener on 55837: readAndProcess threw exception java.io.IOException: Connection reset by peer. Count of bytes read: 0 java.io.IOException: Connection reset by peer at sun.nio.ch.FileDispatcher.read0(Native Method) at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:21) at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:202) at sun.nio.ch.IOUtil.read(IOUtil.java:175) at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:243) at org.apache.hadoop.ipc.Server.channelRead(Server.java:1211) at org.apache.hadoop.ipc.Server.access$2300(Server.java:77) at org.apache.hadoop.ipc.Server$Connection.readAndProcess(Server.java:799) at org.apache.hadoop.ipc.Server$Listener.doRead(Server.java:419) at org.apache.hadoop.ipc.Server$Listener.run(Server.java:328) 2012-01-31 13:11:40,211 INFO org.apache.hadoop.mapred.JvmManager: JVM : jvm_201201311041_0071_r_-1096994286 exited. Number of tasks it ran: 0 2012-01-31 13:11:40,214 INFO org.apache.hadoop.mapred.TaskTracker: Killing unknown JVM jvm_201201311041_0071_r_-386575334 2012-01-31 13:11:40,221 INFO org.apache.hadoop.ipc.Server: IPC Server listener on 55837: readAndProcess threw exception java.io.IOException: Connection reset by peer. Count of bytes read: 0 java.io.IOException: Connection reset by peer at sun.nio.ch.FileDispatcher.read0(Native Method) at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:21) at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:202) at sun.nio.ch.IOUtil.read(IOUtil.java:175) at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:243) at org.apache.hadoop.ipc.Server.channelRead(Server.java:1211) at org.apache.hadoop.ipc.Server.access$2300(Server.java:77) at org.apache.hadoop.ipc.Server$Connection.readAndProcess(Server.java:799) at org.apache.hadoop.ipc.Server$Listener.doRead(Server.java:419) at org.apache.hadoop.ipc.Server$Listener.run(Server.java:328) Is there some connections between these two errors? thank you very much! xiaobin