Re: tasktracker keep recevied KillJobAction and then delete unknown job while using hive

2012-02-01 Thread alo alt
Hi,

+ hdfs-user (bcc'd)

which jre version u use?

- Alex  

--
Alexander Lorenz
http://mapredit.blogspot.com

On Feb 1, 2012, at 8:16 AM, Xiaobin She wrote:

 hi ,
 
 
 I'm using hive to do some log analysis, and I have encountered a problem.
 
 My cluster have 3 nodes, one for NameNode/JobTracker and the other two for 
 DataNode/TaskTracker
 
 One of the tasktracker will repeatedly receive KillJobAction and then delete 
 unknown jobs
 
 the logs look like:
 
 2012-01-31 00:35:37,640 INFO org.apache.hadoop.mapred.TaskTracker: Received 
 'KillJobAction' for job: job_201201301055_0381
 2012-01-31 00:35:37,640 WARN org.apache.hadoop.mapred.TaskTracker: Unknown 
 job job_201201301055_0381 being deleted.
 2012-01-31 00:36:22,697 INFO org.apache.hadoop.mapred.TaskTracker: Received 
 'KillJobAction' for job: job_201201301055_0383
 2012-01-31 00:36:22,698 WARN org.apache.hadoop.mapred.TaskTracker: Unknown 
 job job_201201301055_0383 being deleted.
 2012-01-31 01:05:34,108 INFO org.apache.hadoop.mapred.TaskTracker: Received 
 'KillJobAction' for job: job_201201301055_0384
 2012-01-31 01:05:34,108 WARN org.apache.hadoop.mapred.TaskTracker: Unknown 
 job job_201201301055_0384 being deleted.
 2012-01-31 01:07:43,280 INFO org.apache.hadoop.mapred.TaskTracker: Received 
 'KillJobAction' for job: job_201201301055_0385
 2012-01-31 01:07:43,280 WARN org.apache.hadoop.mapred.TaskTracker: Unknown 
 job job_201201301055_0385 being deleted.  
 
 this happens occasionally, and if this happens, this tasktracker will do 
 notghing but keep receiveing KillJobAction and delete unknown job, and thus 
 the performance will drop down.
 
 to solve this problem, I have to restart the cluster.
 but obviously, this is not a good solution.
 
 these jobs eventually will be run on the other tasktracker, and they will run 
 well, the job will success.
 
 has anybody have encountered this problem and give me some advices?
 
 and occasionally there will be some errlog like:
 
 2012-01-31 13:11:40,183 INFO org.apache.hadoop.ipc.Server: IPC Server 
 listener on 55837: readAndProcess threw exception java.io.IOException: 
 Connection reset by peer. Count of bytes read: 0
 java.io.IOException: Connection reset by peer
 at sun.nio.ch.FileDispatcher.read0(Native Method)
 at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:21)
 at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:202)
 at sun.nio.ch.IOUtil.read(IOUtil.java:175)
 at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:243)
 at org.apache.hadoop.ipc.Server.channelRead(Server.java:1211)
 at org.apache.hadoop.ipc.Server.access$2300(Server.java:77)
 at 
 org.apache.hadoop.ipc.Server$Connection.readAndProcess(Server.java:799)
 at org.apache.hadoop.ipc.Server$Listener.doRead(Server.java:419)
 at org.apache.hadoop.ipc.Server$Listener.run(Server.java:328)
 2012-01-31 13:11:40,211 INFO org.apache.hadoop.mapred.JvmManager: JVM : 
 jvm_201201311041_0071_r_-1096994286 exited. Number of tasks it ran: 0
 2012-01-31 13:11:40,214 INFO org.apache.hadoop.mapred.TaskTracker: Killing 
 unknown JVM jvm_201201311041_0071_r_-386575334
 2012-01-31 13:11:40,221 INFO org.apache.hadoop.ipc.Server: IPC Server 
 listener on 55837: readAndProcess threw exception java.io.IOException: 
 Connection reset by peer. Count of bytes read: 0
 java.io.IOException: Connection reset by peer
 at sun.nio.ch.FileDispatcher.read0(Native Method)
 at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:21)
 at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:202)
 at sun.nio.ch.IOUtil.read(IOUtil.java:175)
 at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:243)
 at org.apache.hadoop.ipc.Server.channelRead(Server.java:1211)
 at org.apache.hadoop.ipc.Server.access$2300(Server.java:77)
 at 
 org.apache.hadoop.ipc.Server$Connection.readAndProcess(Server.java:799)
 at org.apache.hadoop.ipc.Server$Listener.doRead(Server.java:419)
 at org.apache.hadoop.ipc.Server$Listener.run(Server.java:328)  
 
 Is there some connections between these two errors?
 
 thank you very much!
 
 xiaobin



Re: tasktracker keep recevied KillJobAction and then delete unknown job while using hive

2012-02-01 Thread Xiaobin She
hi Alex,

I'm using jre 1.6.0_24

with hadoop 0.20.0
hive 0.80

thx


2012/2/1 alo alt wget.n...@googlemail.com

 Hi,

 + hdfs-user (bcc'd)

 which jre version u use?

 - Alex

 --
 Alexander Lorenz
 http://mapredit.blogspot.com

 On Feb 1, 2012, at 8:16 AM, Xiaobin She wrote:

  hi ,
 
 
  I'm using hive to do some log analysis, and I have encountered a problem.
 
  My cluster have 3 nodes, one for NameNode/JobTracker and the other two
 for DataNode/TaskTracker
 
  One of the tasktracker will repeatedly receive KillJobAction and then
 delete unknown jobs
 
  the logs look like:
 
  2012-01-31 00:35:37,640 INFO org.apache.hadoop.mapred.TaskTracker:
 Received 'KillJobAction' for job: job_201201301055_0381
  2012-01-31 00:35:37,640 WARN org.apache.hadoop.mapred.TaskTracker:
 Unknown job job_201201301055_0381 being deleted.
  2012-01-31 00:36:22,697 INFO org.apache.hadoop.mapred.TaskTracker:
 Received 'KillJobAction' for job: job_201201301055_0383
  2012-01-31 00:36:22,698 WARN org.apache.hadoop.mapred.TaskTracker:
 Unknown job job_201201301055_0383 being deleted.
  2012-01-31 01:05:34,108 INFO org.apache.hadoop.mapred.TaskTracker:
 Received 'KillJobAction' for job: job_201201301055_0384
  2012-01-31 01:05:34,108 WARN org.apache.hadoop.mapred.TaskTracker:
 Unknown job job_201201301055_0384 being deleted.
  2012-01-31 01:07:43,280 INFO org.apache.hadoop.mapred.TaskTracker:
 Received 'KillJobAction' for job: job_201201301055_0385
  2012-01-31 01:07:43,280 WARN org.apache.hadoop.mapred.TaskTracker:
 Unknown job job_201201301055_0385 being deleted.
 
  this happens occasionally, and if this happens, this tasktracker will do
 notghing but keep receiveing KillJobAction and delete unknown job, and thus
 the performance will drop down.
 
  to solve this problem, I have to restart the cluster.
  but obviously, this is not a good solution.
 
  these jobs eventually will be run on the other tasktracker, and they
 will run well, the job will success.
 
  has anybody have encountered this problem and give me some advices?
 
  and occasionally there will be some errlog like:
 
  2012-01-31 13:11:40,183 INFO org.apache.hadoop.ipc.Server: IPC Server
 listener on 55837: readAndProcess threw exception java.io.IOException:
 Connection reset by peer. Count of bytes read: 0
  java.io.IOException: Connection reset by peer
  at sun.nio.ch.FileDispatcher.read0(Native Method)
  at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:21)
  at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:202)
  at sun.nio.ch.IOUtil.read(IOUtil.java:175)
  at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:243)
  at org.apache.hadoop.ipc.Server.channelRead(Server.java:1211)
  at org.apache.hadoop.ipc.Server.access$2300(Server.java:77)
  at
 org.apache.hadoop.ipc.Server$Connection.readAndProcess(Server.java:799)
  at org.apache.hadoop.ipc.Server$Listener.doRead(Server.java:419)
  at org.apache.hadoop.ipc.Server$Listener.run(Server.java:328)
  2012-01-31 13:11:40,211 INFO org.apache.hadoop.mapred.JvmManager: JVM :
 jvm_201201311041_0071_r_-1096994286 exited. Number of tasks it ran: 0
  2012-01-31 13:11:40,214 INFO org.apache.hadoop.mapred.TaskTracker:
 Killing unknown JVM jvm_201201311041_0071_r_-386575334
  2012-01-31 13:11:40,221 INFO org.apache.hadoop.ipc.Server: IPC Server
 listener on 55837: readAndProcess threw exception java.io.IOException:
 Connection reset by peer. Count of bytes read: 0
  java.io.IOException: Connection reset by peer
  at sun.nio.ch.FileDispatcher.read0(Native Method)
  at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:21)
  at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:202)
  at sun.nio.ch.IOUtil.read(IOUtil.java:175)
  at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:243)
  at org.apache.hadoop.ipc.Server.channelRead(Server.java:1211)
  at org.apache.hadoop.ipc.Server.access$2300(Server.java:77)
  at
 org.apache.hadoop.ipc.Server$Connection.readAndProcess(Server.java:799)
  at org.apache.hadoop.ipc.Server$Listener.doRead(Server.java:419)
  at org.apache.hadoop.ipc.Server$Listener.run(Server.java:328)
 
  Is there some connections between these two errors?
 
  thank you very much!
 
  xiaobin




Re: tasktracker keep recevied KillJobAction and then delete unknown job while using hive

2012-02-01 Thread alo alt
How much namenode handler (dfs.namenode.handler.count) you have defined for 
your cluster?

- Alex

--
Alexander Lorenz
http://mapredit.blogspot.com

On Feb 1, 2012, at 12:25 PM, Xiaobin She wrote:

 
 hi Alex,
 
 I'm using jre 1.6.0_24
 
 with hadoop 0.20.0
 hive 0.80
 
 thx
 
 
 2012/2/1 alo alt wget.n...@googlemail.com
 Hi,
 
 + hdfs-user (bcc'd)
 
 which jre version u use?
 
 - Alex
 
 --
 Alexander Lorenz
 http://mapredit.blogspot.com
 
 On Feb 1, 2012, at 8:16 AM, Xiaobin She wrote:
 
  hi ,
 
 
  I'm using hive to do some log analysis, and I have encountered a problem.
 
  My cluster have 3 nodes, one for NameNode/JobTracker and the other two for 
  DataNode/TaskTracker
 
  One of the tasktracker will repeatedly receive KillJobAction and then 
  delete unknown jobs
 
  the logs look like:
 
  2012-01-31 00:35:37,640 INFO org.apache.hadoop.mapred.TaskTracker: Received 
  'KillJobAction' for job: job_201201301055_0381
  2012-01-31 00:35:37,640 WARN org.apache.hadoop.mapred.TaskTracker: Unknown 
  job job_201201301055_0381 being deleted.
  2012-01-31 00:36:22,697 INFO org.apache.hadoop.mapred.TaskTracker: Received 
  'KillJobAction' for job: job_201201301055_0383
  2012-01-31 00:36:22,698 WARN org.apache.hadoop.mapred.TaskTracker: Unknown 
  job job_201201301055_0383 being deleted.
  2012-01-31 01:05:34,108 INFO org.apache.hadoop.mapred.TaskTracker: Received 
  'KillJobAction' for job: job_201201301055_0384
  2012-01-31 01:05:34,108 WARN org.apache.hadoop.mapred.TaskTracker: Unknown 
  job job_201201301055_0384 being deleted.
  2012-01-31 01:07:43,280 INFO org.apache.hadoop.mapred.TaskTracker: Received 
  'KillJobAction' for job: job_201201301055_0385
  2012-01-31 01:07:43,280 WARN org.apache.hadoop.mapred.TaskTracker: Unknown 
  job job_201201301055_0385 being deleted.
 
  this happens occasionally, and if this happens, this tasktracker will do 
  notghing but keep receiveing KillJobAction and delete unknown job, and thus 
  the performance will drop down.
 
  to solve this problem, I have to restart the cluster.
  but obviously, this is not a good solution.
 
  these jobs eventually will be run on the other tasktracker, and they will 
  run well, the job will success.
 
  has anybody have encountered this problem and give me some advices?
 
  and occasionally there will be some errlog like:
 
  2012-01-31 13:11:40,183 INFO org.apache.hadoop.ipc.Server: IPC Server 
  listener on 55837: readAndProcess threw exception java.io.IOException: 
  Connection reset by peer. Count of bytes read: 0
  java.io.IOException: Connection reset by peer
  at sun.nio.ch.FileDispatcher.read0(Native Method)
  at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:21)
  at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:202)
  at sun.nio.ch.IOUtil.read(IOUtil.java:175)
  at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:243)
  at org.apache.hadoop.ipc.Server.channelRead(Server.java:1211)
  at org.apache.hadoop.ipc.Server.access$2300(Server.java:77)
  at 
  org.apache.hadoop.ipc.Server$Connection.readAndProcess(Server.java:799)
  at org.apache.hadoop.ipc.Server$Listener.doRead(Server.java:419)
  at org.apache.hadoop.ipc.Server$Listener.run(Server.java:328)
  2012-01-31 13:11:40,211 INFO org.apache.hadoop.mapred.JvmManager: JVM : 
  jvm_201201311041_0071_r_-1096994286 exited. Number of tasks it ran: 0
  2012-01-31 13:11:40,214 INFO org.apache.hadoop.mapred.TaskTracker: Killing 
  unknown JVM jvm_201201311041_0071_r_-386575334
  2012-01-31 13:11:40,221 INFO org.apache.hadoop.ipc.Server: IPC Server 
  listener on 55837: readAndProcess threw exception java.io.IOException: 
  Connection reset by peer. Count of bytes read: 0
  java.io.IOException: Connection reset by peer
  at sun.nio.ch.FileDispatcher.read0(Native Method)
  at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:21)
  at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:202)
  at sun.nio.ch.IOUtil.read(IOUtil.java:175)
  at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:243)
  at org.apache.hadoop.ipc.Server.channelRead(Server.java:1211)
  at org.apache.hadoop.ipc.Server.access$2300(Server.java:77)
  at 
  org.apache.hadoop.ipc.Server$Connection.readAndProcess(Server.java:799)
  at org.apache.hadoop.ipc.Server$Listener.doRead(Server.java:419)
  at org.apache.hadoop.ipc.Server$Listener.run(Server.java:328)
 
  Is there some connections between these two errors?
 
  thank you very much!
 
  xiaobin
 
 



Re: tasktracker keep recevied KillJobAction and then delete unknown job while using hive

2012-02-01 Thread alo alt
How much namenode handler (dfs.namenode.handler.count) you have defined for 
your cluster?

- Alex

--
Alexander Lorenz
http://mapredit.blogspot.com

On Feb 1, 2012, at 12:25 PM, Xiaobin She wrote:

 
 hi Alex,
 
 I'm using jre 1.6.0_24
 
 with hadoop 0.20.0
 hive 0.80
 
 thx
 
 
 2012/2/1 alo alt wget.n...@googlemail.com
 Hi,
 
 + hdfs-user (bcc'd)
 
 which jre version u use?
 
 - Alex
 
 --
 Alexander Lorenz
 http://mapredit.blogspot.com
 
 On Feb 1, 2012, at 8:16 AM, Xiaobin She wrote:
 
 hi ,
 
 
 I'm using hive to do some log analysis, and I have encountered a problem.
 
 My cluster have 3 nodes, one for NameNode/JobTracker and the other two for 
 DataNode/TaskTracker
 
 One of the tasktracker will repeatedly receive KillJobAction and then delete 
 unknown jobs
 
 the logs look like:
 
 2012-01-31 00:35:37,640 INFO org.apache.hadoop.mapred.TaskTracker: Received 
 'KillJobAction' for job: job_201201301055_0381
 2012-01-31 00:35:37,640 WARN org.apache.hadoop.mapred.TaskTracker: Unknown 
 job job_201201301055_0381 being deleted.
 2012-01-31 00:36:22,697 INFO org.apache.hadoop.mapred.TaskTracker: Received 
 'KillJobAction' for job: job_201201301055_0383
 2012-01-31 00:36:22,698 WARN org.apache.hadoop.mapred.TaskTracker: Unknown 
 job job_201201301055_0383 being deleted.
 2012-01-31 01:05:34,108 INFO org.apache.hadoop.mapred.TaskTracker: Received 
 'KillJobAction' for job: job_201201301055_0384
 2012-01-31 01:05:34,108 WARN org.apache.hadoop.mapred.TaskTracker: Unknown 
 job job_201201301055_0384 being deleted.
 2012-01-31 01:07:43,280 INFO org.apache.hadoop.mapred.TaskTracker: Received 
 'KillJobAction' for job: job_201201301055_0385
 2012-01-31 01:07:43,280 WARN org.apache.hadoop.mapred.TaskTracker: Unknown 
 job job_201201301055_0385 being deleted.
 
 this happens occasionally, and if this happens, this tasktracker will do 
 notghing but keep receiveing KillJobAction and delete unknown job, and thus 
 the performance will drop down.
 
 to solve this problem, I have to restart the cluster.
 but obviously, this is not a good solution.
 
 these jobs eventually will be run on the other tasktracker, and they will 
 run well, the job will success.
 
 has anybody have encountered this problem and give me some advices?
 
 and occasionally there will be some errlog like:
 
 2012-01-31 13:11:40,183 INFO org.apache.hadoop.ipc.Server: IPC Server 
 listener on 55837: readAndProcess threw exception java.io.IOException: 
 Connection reset by peer. Count of bytes read: 0
 java.io.IOException: Connection reset by peer
at sun.nio.ch.FileDispatcher.read0(Native Method)
at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:21)
at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:202)
at sun.nio.ch.IOUtil.read(IOUtil.java:175)
at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:243)
at org.apache.hadoop.ipc.Server.channelRead(Server.java:1211)
at org.apache.hadoop.ipc.Server.access$2300(Server.java:77)
at 
 org.apache.hadoop.ipc.Server$Connection.readAndProcess(Server.java:799)
at org.apache.hadoop.ipc.Server$Listener.doRead(Server.java:419)
at org.apache.hadoop.ipc.Server$Listener.run(Server.java:328)
 2012-01-31 13:11:40,211 INFO org.apache.hadoop.mapred.JvmManager: JVM : 
 jvm_201201311041_0071_r_-1096994286 exited. Number of tasks it ran: 0
 2012-01-31 13:11:40,214 INFO org.apache.hadoop.mapred.TaskTracker: Killing 
 unknown JVM jvm_201201311041_0071_r_-386575334
 2012-01-31 13:11:40,221 INFO org.apache.hadoop.ipc.Server: IPC Server 
 listener on 55837: readAndProcess threw exception java.io.IOException: 
 Connection reset by peer. Count of bytes read: 0
 java.io.IOException: Connection reset by peer
at sun.nio.ch.FileDispatcher.read0(Native Method)
at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:21)
at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:202)
at sun.nio.ch.IOUtil.read(IOUtil.java:175)
at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:243)
at org.apache.hadoop.ipc.Server.channelRead(Server.java:1211)
at org.apache.hadoop.ipc.Server.access$2300(Server.java:77)
at 
 org.apache.hadoop.ipc.Server$Connection.readAndProcess(Server.java:799)
at org.apache.hadoop.ipc.Server$Listener.doRead(Server.java:419)
at org.apache.hadoop.ipc.Server$Listener.run(Server.java:328)
 
 Is there some connections between these two errors?
 
 thank you very much!
 
 xiaobin