hi Alex, I did not set the value of dfs.namenode.handler.count in the config file, so it shoule be the default value, like 10.
I only have two datanodes, 10 is not enough ? And if it is not enough , why the tasktracker will keep receiveing KillJobAction and delete unknown job? thank you very much for your help! 2012/2/1 alo alt <wget.n...@googlemail.com> > How much namenode handler (dfs.namenode.handler.count) you have defined > for your cluster? > > - Alex > > -- > Alexander Lorenz > http://mapredit.blogspot.com > > On Feb 1, 2012, at 12:25 PM, Xiaobin She wrote: > > > > > hi Alex, > > > > I'm using jre 1.6.0_24 > > > > with hadoop 0.20.0 > > hive 0.80 > > > > thx > > > > > > 2012/2/1 alo alt <wget.n...@googlemail.com> > > Hi, > > > > + hdfs-user (bcc'd) > > > > which jre version u use? > > > > - Alex > > > > -- > > Alexander Lorenz > > http://mapredit.blogspot.com > > > > On Feb 1, 2012, at 8:16 AM, Xiaobin She wrote: > > > > > hi , > > > > > > > > > I'm using hive to do some log analysis, and I have encountered a > problem. > > > > > > My cluster have 3 nodes, one for NameNode/JobTracker and the other two > for DataNode/TaskTracker > > > > > > One of the tasktracker will repeatedly receive KillJobAction and then > delete unknown jobs > > > > > > the logs look like: > > > > > > 2012-01-31 00:35:37,640 INFO org.apache.hadoop.mapred.TaskTracker: > Received 'KillJobAction' for job: job_201201301055_0381 > > > 2012-01-31 00:35:37,640 WARN org.apache.hadoop.mapred.TaskTracker: > Unknown job job_201201301055_0381 being deleted. > > > 2012-01-31 00:36:22,697 INFO org.apache.hadoop.mapred.TaskTracker: > Received 'KillJobAction' for job: job_201201301055_0383 > > > 2012-01-31 00:36:22,698 WARN org.apache.hadoop.mapred.TaskTracker: > Unknown job job_201201301055_0383 being deleted. > > > 2012-01-31 01:05:34,108 INFO org.apache.hadoop.mapred.TaskTracker: > Received 'KillJobAction' for job: job_201201301055_0384 > > > 2012-01-31 01:05:34,108 WARN org.apache.hadoop.mapred.TaskTracker: > Unknown job job_201201301055_0384 being deleted. > > > 2012-01-31 01:07:43,280 INFO org.apache.hadoop.mapred.TaskTracker: > Received 'KillJobAction' for job: job_201201301055_0385 > > > 2012-01-31 01:07:43,280 WARN org.apache.hadoop.mapred.TaskTracker: > Unknown job job_201201301055_0385 being deleted. > > > > > > this happens occasionally, and if this happens, this tasktracker will > do notghing but keep receiveing KillJobAction and delete unknown job, and > thus the performance will drop down. > > > > > > to solve this problem, I have to restart the cluster. > > > but obviously, this is not a good solution. > > > > > > these jobs eventually will be run on the other tasktracker, and they > will run well, the job will success. > > > > > > has anybody have encountered this problem and give me some advices? > > > > > > and occasionally there will be some errlog like: > > > > > > 2012-01-31 13:11:40,183 INFO org.apache.hadoop.ipc.Server: IPC Server > listener on 55837: readAndProcess threw exception java.io.IOException: > Connection reset by peer. Count of bytes read: 0 > > > java.io.IOException: Connection reset by peer > > > at sun.nio.ch.FileDispatcher.read0(Native Method) > > > at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:21) > > > at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:202) > > > at sun.nio.ch.IOUtil.read(IOUtil.java:175) > > > at > sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:243) > > > at org.apache.hadoop.ipc.Server.channelRead(Server.java:1211) > > > at org.apache.hadoop.ipc.Server.access$2300(Server.java:77) > > > at > org.apache.hadoop.ipc.Server$Connection.readAndProcess(Server.java:799) > > > at > org.apache.hadoop.ipc.Server$Listener.doRead(Server.java:419) > > > at org.apache.hadoop.ipc.Server$Listener.run(Server.java:328) > > > 2012-01-31 13:11:40,211 INFO org.apache.hadoop.mapred.JvmManager: JVM > : jvm_201201311041_0071_r_-1096994286 exited. Number of tasks it ran: 0 > > > 2012-01-31 13:11:40,214 INFO org.apache.hadoop.mapred.TaskTracker: > Killing unknown JVM jvm_201201311041_0071_r_-386575334 > > > 2012-01-31 13:11:40,221 INFO org.apache.hadoop.ipc.Server: IPC Server > listener on 55837: readAndProcess threw exception java.io.IOException: > Connection reset by peer. Count of bytes read: 0 > > > java.io.IOException: Connection reset by peer > > > at sun.nio.ch.FileDispatcher.read0(Native Method) > > > at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:21) > > > at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:202) > > > at sun.nio.ch.IOUtil.read(IOUtil.java:175) > > > at > sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:243) > > > at org.apache.hadoop.ipc.Server.channelRead(Server.java:1211) > > > at org.apache.hadoop.ipc.Server.access$2300(Server.java:77) > > > at > org.apache.hadoop.ipc.Server$Connection.readAndProcess(Server.java:799) > > > at > org.apache.hadoop.ipc.Server$Listener.doRead(Server.java:419) > > > at org.apache.hadoop.ipc.Server$Listener.run(Server.java:328) > > > > > > Is there some connections between these two errors? > > > > > > thank you very much! > > > > > > xiaobin > > > > > >