reduce takes too long time

2010-04-01 Thread Zheng Lv
Hello Everyone,
One of our job's has 4 reduce tasks, but we find that one of them runs
normally, and others takes too long time.
Following is the normal task's log:
2010-04-01 15:01:48,596 INFO org.apache.hadoop.mapred.Merger: Merging 1
sorted segments
2010-04-01 15:01:48,601 INFO org.apache.hadoop.mapred.Merger: Down to the
last merge-pass, with 1 segments left of total size: 9907055 bytes
2010-04-01 15:01:48,605 WARN org.apache.hadoop.mapred.JobConf: The variable
mapred.task.maxvmem is no longer used. Instead use mapred.job.map.memory.mb
and mapred.job.reduce.memory.mb
2010-04-01 15:01:48,622 WARN org.apache.hadoop.mapred.JobConf: The variable
mapred.task.maxvmem is no longer used. Instead use mapred.job.map.memory.mb
and mapred.job.reduce.memory.mb
2010-04-01 15:01:48,672 INFO org.apache.hadoop.io.compress.CodecPool: Got
brand-new compressor
2010-04-01 15:02:03,744 INFO org.apache.hadoop.mapred.TaskRunner:
Task:attempt_201003301656_0139_r_01_0 is done. And is in the process of
commiting
2010-04-01 15:02:05,756 INFO org.apache.hadoop.mapred.TaskRunner: Task
attempt_201003301656_0139_r_01_0 is allowed to commit now
2010-04-01 15:02:05,762 INFO
org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter: Saved output of
task 'attempt_201003301656_0139_r_01_0' to
/user/root/nginxlog/sessionjob/output/20100401140001-20100401150001
2010-04-01 15:02:05,765 INFO org.apache.hadoop.mapred.TaskRunner: Task
'attempt_201003301656_0139_r_01_0' done.

And following is one of others:
2010-04-01 15:01:49,549 INFO org.apache.hadoop.mapred.Merger: Merging 1
sorted segments
2010-04-01 15:01:49,554 INFO org.apache.hadoop.mapred.Merger: Down to the
last merge-pass, with 1 segments left of total size: 9793700 bytes
2010-04-01 15:01:49,563 WARN org.apache.hadoop.mapred.JobConf: The variable
mapred.task.maxvmem is no longer used. Instead use mapred.job.map.memory.mb
and mapred.job.reduce.memory.mb
2010-04-01 15:01:49,582 WARN org.apache.hadoop.mapred.JobConf: The variable
mapred.task.maxvmem is no longer used. Instead use mapred.job.map.memory.mb
and mapred.job.reduce.memory.mb
2010-04-01 15:04:49,690 INFO org.apache.hadoop.io.compress.CodecPool: Got
brand-new compressor
2010-04-01 15:05:07,103 INFO org.apache.hadoop.mapred.TaskRunner:
Task:attempt_201003301656_0139_r_00_0 is done. And is in the process of
commiting
2010-04-01 15:05:09,114 INFO org.apache.hadoop.mapred.TaskRunner: Task
attempt_201003301656_0139_r_00_0 is allowed to commit now
2010-04-01 15:05:09,120 INFO
org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter: Saved output of
task 'attempt_201003301656_0139_r_00_0' to
/user/root/nginxlog/sessionjob/output/20100401140001-20100401150001
2010-04-01 15:05:09,123 INFO org.apache.hadoop.mapred.TaskRunner: Task
'attempt_201003301656_0139_r_00_0' done.

   It looks like sth is waiting before 2010-04-01 15:05:07,103 INFO
org.apache.hadoop.mapred.TaskRunner:
Task:attempt_201003301656_0139_r_00_0 is done. And is in the process of
commiting.Any suggestion?
   Regards,
LvZheng


error when starup with two dfs.name.dir

2010-03-03 Thread Zheng Lv
Hello Everyone,
  I added a NFS mount point to the dfs.name.dir configuration option, but
after that when I restarted the hadoop cluster I got these:
/
2010-03-03 18:32:59,708 INFO org.apache.hadoop.ipc.metrics.RpcMetrics:
Initializing RPC Metrics with hostName=NameNode, port=9000
2010-03-03 18:32:59,714 INFO
org.apache.hadoop.hdfs.server.namenode.NameNode: Namenode up at: cactus207/
172.16.1.207:9000
2010-03-03 18:32:59,723 INFO org.apache.hadoop.metrics.jvm.JvmMetrics:
Initializing JVM Metrics with processName=NameNode, sessionId=null
2010-03-03 18:32:59,724 INFO
org.apache.hadoop.hdfs.server.namenode.metrics.NameNodeMetrics: Initializing
NameNodeMeterics using context object:org.apache.had
oop.metrics.spi.NullContext
2010-03-03 18:32:59,798 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem:
fsOwner=root,root,bin,daemon,sys,adm,disk,wheel
2010-03-03 18:32:59,798 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: supergroup=supergroup
2010-03-03 18:32:59,798 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem:
isPermissionEnabled=true
2010-03-03 18:32:59,805 INFO
org.apache.hadoop.hdfs.server.namenode.metrics.FSNamesystemMetrics:
Initializing FSNamesystemMetrics using context object:org.apa
che.hadoop.metrics.spi.NullContext
2010-03-03 18:32:59,806 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Registered
FSNamesystemStatusMBean
2010-03-03 18:33:29,856 INFO org.apache.hadoop.hdfs.server.common.Storage:
java.io.IOException: No locks available
at sun.nio.ch.FileChannelImpl.lock0(Native Method)
at sun.nio.ch.FileChannelImpl.tryLock(FileChannelImpl.java:879)
at java.nio.channels.FileChannel.tryLock(FileChannel.java:962)
at
org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.tryLock(Storage.java:527)
at
org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.lock(Storage.java:505)
at
org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.analyzeStorage(Storage.java:363)
at
org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:285)
at
org.apache.hadoop.hdfs.server.namenode.FSDirectory.loadFSImage(FSDirectory.java:87)
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initialize(FSNamesystem.java:311)
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.init(FSNamesystem.java:292)
at
org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:201)
at
org.apache.hadoop.hdfs.server.namenode.NameNode.init(NameNode.java:279)
at
org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:956)
at
org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:965)
2010-03-03 18:33:29,858 ERROR
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: FSNamesystem
initialization failed.
java.io.IOException: No locks available
at sun.nio.ch.FileChannelImpl.lock0(Native Method)
at sun.nio.ch.FileChannelImpl.tryLock(FileChannelImpl.java:879)
at java.nio.channels.FileChannel.tryLock(FileChannel.java:962)
at
org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.tryLock(Storage.java:527)
at
org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.lock(Storage.java:505)
at
org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.analyzeStorage(Storage.java:363)
at
org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:285)
at
org.apache.hadoop.hdfs.server.namenode.FSDirectory.loadFSImage(FSDirectory.java:87)
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initialize(FSNamesystem.java:311)
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.init(FSNamesystem.java:292)
at
org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:201)
at
org.apache.hadoop.hdfs.server.namenode.NameNode.init(NameNode.java:279)
at
org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:956)
at
org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:965)
2010-03-03 18:33:29,859 INFO org.apache.hadoop.ipc.Server: Stopping server
on 9000
2010-03-03 18:33:29,859 ERROR
org.apache.hadoop.hdfs.server.namenode.NameNode: java.io.IOException: No
locks available
at sun.nio.ch.FileChannelImpl.lock0(Native Method)
at sun.nio.ch.FileChannelImpl.tryLock(FileChannelImpl.java:879)
at java.nio.channels.FileChannel.tryLock(FileChannel.java:962)
at
org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.tryLock(Storage.java:527)
at
org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.lock(Storage.java:505)
at
org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.analyzeStorage(Storage.java:363)
at
org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:285)
at

Re: Many child processes dont exit

2010-02-23 Thread Zheng Lv
Thank you Jason, your reply is helpful.

2010/2/23 Jason Venner jason.had...@gmail.com

 Someone is using a threadpool that does not have daemon priority
 threads, and that is not shutdown before the main method returns.
 The non daemon threads prevent the jvm from exiting.
 We had this problem for a while and modified the Child.main to exit,
 rather than trying to work out and fix the third party library that
 ran the thread pool. This thecnique does of  prevent jvm reuse.

 On Sat, Feb 20, 2010 at 6:49 AM, Ted Yu yuzhih...@gmail.com wrote:
  Do you have System.exit() as the last line in your main() ?
 Job job = createSubmittableJob(conf, otherArgs);
 System.exit(job.waitForCompletion(true)? 0 : 1);
 
 
  On Sat, Feb 20, 2010 at 12:32 AM, Zheng Lv lvzheng19800...@gmail.com
 wrote:
 
  Hello Ted,
   Yes. Every hour a job will be created and started, and when it
 finished,
  it will maintain. The logs looks like normal, do you know what can lead
 to
  this happen?Thank you.
 LvZheng
  2010/2/20 Ted Yu yuzhih...@gmail.com
 
   Did the number of child processes increase over time ?
  
   On Friday, February 19, 2010, Zheng Lv lvzheng19800...@gmail.com
  wrote:
Hello Edson,
  Thank you for your reply. I don't want to kill them, I want to
 know
  why
these child processes don't exit, and to know how to make them exit
successfully when they finish. Any ideas? Thank you.
LvZheng
   
2010/2/18 Edson Ramiro erlfi...@gmail.com
   
Do you want to kill them ?
   
if yes, you can use
   
 ./bin/slaves.sh pkill java
   
but it will kill the datanode and tasktracker processes
in all slaves and you'll need to start these processes again.
   
Edson Ramiro
   
   
On 14 February 2010 22:09, Zheng Lv lvzheng19800...@gmail.com
  wrote:
   
 any idea?

 2010/2/11 Zheng Lv lvzheng19800...@gmail.com

  Hello Everyone,
We often find many child processes in datanodes, which have
   already
  finished for long time. And following are the jstack log:
Full thread dump Java HotSpot(TM) 64-Bit Server VM (14.3-b01
  mixed
 mode):
  DestroyJavaVM prio=10 tid=0x2aaac8019800 nid=0x2422
 waiting
  on
  condition [0x]
 java.lang.Thread.State: RUNNABLE
  NioProcessor-31 prio=10 tid=0x439fa000 nid=0x2826
  runnable
  [0x4100a000]
 java.lang.Thread.State: RUNNABLE
  at sun.nio.ch.EPollArrayWrapper.epollWait(Native
 Method)
  at
sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:215)
  at
 sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65)
  at
sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69)
  - locked 0x2aaab9b5f6f8 (a sun.nio.ch.Util$1)
  - locked 0x2aaab9b5f710 (a
  java.util.Collections$UnmodifiableSet)
  - locked 0x2aaab9b5f680 (a
   sun.nio.ch.EPollSelectorImpl)
  at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80)
  at
 

   
  
 
 org.apache.mina.transport.socket.nio.NioProcessor.select(NioProcessor.java:65)
  at
 

   
  
 
 org.apache.mina.common.AbstractPollingIoProcessor$Worker.run(AbstractPollingIoProcessor.java:672)
  at
 

   
  
 
 org.apache.mina.util.NamePreservingRunnable.run(NamePreservingRunnable.java:51)
  at
 

   
  
 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
  at
 

   
  
 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
  at java.lang.Thread.run(Thread.java:619)
  pool-15-thread-1 prio=10 tid=0x2aaac802d000 nid=0x2825
  waiting
   on
  condition [0x41604000]
 java.lang.Thread.State: WAITING (parking)
  at sun.misc.Unsafe.park(Native Method)
  - parking to wait for  0x2aaab9b61620 (a
 
   java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
  at
 
 java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
  at
 

   
  
 
 java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1925)
  at
 

   
  
 
 java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:358)
  at
 

   
  
 
 java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:947)
  at
 

java.util.concurrent.ThreadPoolExecutor$Worker.run(Threa
  
 
 



 --
 Pro Hadoop, a book to guide you from beginner to hadoop mastery,
 http://www.amazon.com/dp/1430219424?tag=jewlerymall
 www.prohadoopbook.com a community for Hadoop Professionals



Re: Many child processes dont exit

2010-02-20 Thread Zheng Lv
Hello Ted,
  Yes. Every hour a job will be created and started, and when it finished,
it will maintain. The logs looks like normal, do you know what can lead to
this happen?Thank you.
LvZheng
2010/2/20 Ted Yu yuzhih...@gmail.com

 Did the number of child processes increase over time ?

 On Friday, February 19, 2010, Zheng Lv lvzheng19800...@gmail.com wrote:
  Hello Edson,
Thank you for your reply. I don't want to kill them, I want to know why
  these child processes don't exit, and to know how to make them exit
  successfully when they finish. Any ideas? Thank you.
  LvZheng
 
  2010/2/18 Edson Ramiro erlfi...@gmail.com
 
  Do you want to kill them ?
 
  if yes, you can use
 
   ./bin/slaves.sh pkill java
 
  but it will kill the datanode and tasktracker processes
  in all slaves and you'll need to start these processes again.
 
  Edson Ramiro
 
 
  On 14 February 2010 22:09, Zheng Lv lvzheng19800...@gmail.com wrote:
 
   any idea?
  
   2010/2/11 Zheng Lv lvzheng19800...@gmail.com
  
Hello Everyone,
  We often find many child processes in datanodes, which have
 already
finished for long time. And following are the jstack log:
  Full thread dump Java HotSpot(TM) 64-Bit Server VM (14.3-b01 mixed
   mode):
DestroyJavaVM prio=10 tid=0x2aaac8019800 nid=0x2422 waiting on
condition [0x]
   java.lang.Thread.State: RUNNABLE
NioProcessor-31 prio=10 tid=0x439fa000 nid=0x2826 runnable
[0x4100a000]
   java.lang.Thread.State: RUNNABLE
at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
at
  sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:215)
at
   sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65)
at
  sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69)
- locked 0x2aaab9b5f6f8 (a sun.nio.ch.Util$1)
- locked 0x2aaab9b5f710 (a
java.util.Collections$UnmodifiableSet)
- locked 0x2aaab9b5f680 (a
 sun.nio.ch.EPollSelectorImpl)
at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80)
at
   
  
 
 org.apache.mina.transport.socket.nio.NioProcessor.select(NioProcessor.java:65)
at
   
  
 
 org.apache.mina.common.AbstractPollingIoProcessor$Worker.run(AbstractPollingIoProcessor.java:672)
at
   
  
 
 org.apache.mina.util.NamePreservingRunnable.run(NamePreservingRunnable.java:51)
at
   
  
 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
   
  
 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:619)
pool-15-thread-1 prio=10 tid=0x2aaac802d000 nid=0x2825 waiting
 on
condition [0x41604000]
   java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for  0x2aaab9b61620 (a
   
 java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at
java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
at
   
  
 
 java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1925)
at
   
  
 
 java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:358)
at
   
  
 
 java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:947)
at
   
  
  java.util.concurrent.ThreadPoolExecutor$Worker.run(Threa



Re: Many child processes dont exit

2010-02-19 Thread Zheng Lv
Hello Edson,
  Thank you for your reply. I don't want to kill them, I want to know why
these child processes don't exit, and to know how to make them exit
successfully when they finish. Any ideas? Thank you.
LvZheng

2010/2/18 Edson Ramiro erlfi...@gmail.com

 Do you want to kill them ?

 if yes, you can use

  ./bin/slaves.sh pkill java

 but it will kill the datanode and tasktracker processes
 in all slaves and you'll need to start these processes again.

 Edson Ramiro


 On 14 February 2010 22:09, Zheng Lv lvzheng19800...@gmail.com wrote:

  any idea?
 
  2010/2/11 Zheng Lv lvzheng19800...@gmail.com
 
   Hello Everyone,
 We often find many child processes in datanodes, which have already
   finished for long time. And following are the jstack log:
 Full thread dump Java HotSpot(TM) 64-Bit Server VM (14.3-b01 mixed
  mode):
   DestroyJavaVM prio=10 tid=0x2aaac8019800 nid=0x2422 waiting on
   condition [0x]
  java.lang.Thread.State: RUNNABLE
   NioProcessor-31 prio=10 tid=0x439fa000 nid=0x2826 runnable
   [0x4100a000]
  java.lang.Thread.State: RUNNABLE
   at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
   at
 sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:215)
   at
  sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65)
   at
 sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69)
   - locked 0x2aaab9b5f6f8 (a sun.nio.ch.Util$1)
   - locked 0x2aaab9b5f710 (a
   java.util.Collections$UnmodifiableSet)
   - locked 0x2aaab9b5f680 (a sun.nio.ch.EPollSelectorImpl)
   at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80)
   at
  
 
 org.apache.mina.transport.socket.nio.NioProcessor.select(NioProcessor.java:65)
   at
  
 
 org.apache.mina.common.AbstractPollingIoProcessor$Worker.run(AbstractPollingIoProcessor.java:672)
   at
  
 
 org.apache.mina.util.NamePreservingRunnable.run(NamePreservingRunnable.java:51)
   at
  
 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
   at
  
 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
   at java.lang.Thread.run(Thread.java:619)
   pool-15-thread-1 prio=10 tid=0x2aaac802d000 nid=0x2825 waiting on
   condition [0x41604000]
  java.lang.Thread.State: WAITING (parking)
   at sun.misc.Unsafe.park(Native Method)
   - parking to wait for  0x2aaab9b61620 (a
   java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
   at
   java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
   at
  
 
 java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1925)
   at
  
 
 java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:358)
   at
  
 
 java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:947)
   at
  
 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:907)
   at java.lang.Thread.run(Thread.java:619)
   Attach Listener daemon prio=10 tid=0x43a22800 nid=0x2608
  waiting
   on condition [0x]
  java.lang.Thread.State: RUNNABLE
   pool-3-thread-1 prio=10 tid=0x438ad000 nid=0x243f waiting on
   condition [0x42575000]
  java.lang.Thread.State: TIMED_WAITING (parking)
   at sun.misc.Unsafe.park(Native Method)
   - parking to wait for  0x2aaab9ae8770 (a
   java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
   at
   java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:198)
   at
  
 
 java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:1963)
   at java.util.concurrent.DelayQueue.take(DelayQueue.java:164)
   at
  
 
 java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:583)
   at
  
 
 java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:576)
   at
  
 
 java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:947)
   at
  
 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:907)
   at java.lang.Thread.run(Thread.java:619)
   Thread for syncLogs daemon prio=10 tid=0x2aaac4446800 nid=0x2437
   waiting on condition [0x40f09000]
  java.lang.Thread.State: TIMED_WAITING (sleeping)
   at java.lang.Thread.sleep(Native Method)
   at org.apache.hadoop.mapred.Child$2.run(Child.java:87)
   Low Memory Detector daemon prio=10 tid=0x43771800 nid=0x242d
   runnable [0x]
  java.lang.Thread.State: RUNNABLE
   CompilerThread1 daemon prio=10 tid=0x4376e800 nid=0x242c
  waiting

Re: Many child processes dont exit

2010-02-14 Thread Zheng Lv
any idea?

2010/2/11 Zheng Lv lvzheng19800...@gmail.com

 Hello Everyone,
   We often find many child processes in datanodes, which have already
 finished for long time. And following are the jstack log:
   Full thread dump Java HotSpot(TM) 64-Bit Server VM (14.3-b01 mixed mode):
 DestroyJavaVM prio=10 tid=0x2aaac8019800 nid=0x2422 waiting on
 condition [0x]
java.lang.Thread.State: RUNNABLE
 NioProcessor-31 prio=10 tid=0x439fa000 nid=0x2826 runnable
 [0x4100a000]
java.lang.Thread.State: RUNNABLE
 at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
 at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:215)
 at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65)
 at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69)
 - locked 0x2aaab9b5f6f8 (a sun.nio.ch.Util$1)
 - locked 0x2aaab9b5f710 (a
 java.util.Collections$UnmodifiableSet)
 - locked 0x2aaab9b5f680 (a sun.nio.ch.EPollSelectorImpl)
 at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80)
 at
 org.apache.mina.transport.socket.nio.NioProcessor.select(NioProcessor.java:65)
 at
 org.apache.mina.common.AbstractPollingIoProcessor$Worker.run(AbstractPollingIoProcessor.java:672)
 at
 org.apache.mina.util.NamePreservingRunnable.run(NamePreservingRunnable.java:51)
 at
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
 at
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
 at java.lang.Thread.run(Thread.java:619)
 pool-15-thread-1 prio=10 tid=0x2aaac802d000 nid=0x2825 waiting on
 condition [0x41604000]
java.lang.Thread.State: WAITING (parking)
 at sun.misc.Unsafe.park(Native Method)
 - parking to wait for  0x2aaab9b61620 (a
 java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
 at
 java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
 at
 java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1925)
 at
 java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:358)
 at
 java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:947)
 at
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:907)
 at java.lang.Thread.run(Thread.java:619)
 Attach Listener daemon prio=10 tid=0x43a22800 nid=0x2608 waiting
 on condition [0x]
java.lang.Thread.State: RUNNABLE
 pool-3-thread-1 prio=10 tid=0x438ad000 nid=0x243f waiting on
 condition [0x42575000]
java.lang.Thread.State: TIMED_WAITING (parking)
 at sun.misc.Unsafe.park(Native Method)
 - parking to wait for  0x2aaab9ae8770 (a
 java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
 at
 java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:198)
 at
 java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:1963)
 at java.util.concurrent.DelayQueue.take(DelayQueue.java:164)
 at
 java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:583)
 at
 java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:576)
 at
 java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:947)
 at
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:907)
 at java.lang.Thread.run(Thread.java:619)
 Thread for syncLogs daemon prio=10 tid=0x2aaac4446800 nid=0x2437
 waiting on condition [0x40f09000]
java.lang.Thread.State: TIMED_WAITING (sleeping)
 at java.lang.Thread.sleep(Native Method)
 at org.apache.hadoop.mapred.Child$2.run(Child.java:87)
 Low Memory Detector daemon prio=10 tid=0x43771800 nid=0x242d
 runnable [0x]
java.lang.Thread.State: RUNNABLE
 CompilerThread1 daemon prio=10 tid=0x4376e800 nid=0x242c waiting
 on condition [0x]
java.lang.Thread.State: RUNNABLE
 CompilerThread0 daemon prio=10 tid=0x4376a800 nid=0x242b waiting
 on condition [0x]
java.lang.Thread.State: RUNNABLE
 Signal Dispatcher daemon prio=10 tid=0x43768800 nid=0x242a
 runnable [0x]
java.lang.Thread.State: RUNNABLE
 Finalizer daemon prio=10 tid=0x43746000 nid=0x2429 in
 Object.wait() [0x42171000]
java.lang.Thread.State: WAITING (on object monitor)
 at java.lang.Object.wait(Native Method)
 - waiting on 0x2aaab9a8afe0 (a
 java.lang.ref.ReferenceQueue$Lock)
 at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:118)
 - locked 0x2aaab9a8afe0

Exception when using SequenceFile

2010-01-17 Thread Zheng Lv
Hello Everyone,
We have been using SequenceFile(GZIP) to save some data in our product,
but we got the following exceptions recently, which never occurred before:

java.io.EOFException at
java.io.DataInputStream.readFully(DataInputStream.java:197) at
java.io.DataInputStream.readFully(DataInputStream.java:169) at
org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1450) at
org.apache.hadoop.io.SequenceFile$Reader.(SequenceFile.java:1428) at
org.apache.hadoop.io.SequenceFile$Reader.(SequenceFile.java:1417) at
org.apache.hadoop.io.SequenceFile$Reader.(SequenceFile.java:1412) at
org.apache.hadoop.mapreduce.lib.input.SequenceFileRecordReader.initialize(SequenceFileRecordReader.java:50)
at
org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.initialize(MapTask.java:414)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:517) at
org.apache.hadoop.mapred.MapTask.run(MapTask.java:303) at
org.apache.hadoop.mapred.Child.main(Child.java:170)

  We googled it, but got nothing. Now we restart the hadoop and the
application and until now the exception doesn't occur. But we want to know
why the exception happened and how to avoid it.
  Any suggestion will help, thanks a lot.
  LvZheng.


Re: Exceptions when starting hadoop

2009-09-27 Thread Zheng Lv
 org.apache.hadoop.hdfs.StateChange: BLOCK*
NameSystem.addStoredBlock: blockMap updated: 192.168.33.5:50010 is added to
blk_-69940
8630358636044_1002 size 4
2009-09-27 12:00:02,644 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* ask
192.168.33.8:50010 to delete  blk_2342352795116225569_1001
2009-09-27 12:00:05,644 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* ask
192.168.33.6:50010 to delete  blk_2342352795116225569_1001
2009-09-27 12:03:49,176 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit:
ugi=cyd,cyd,adm,dialout,cdrom,plugdev,lpadmin,sambashare,admin  ip=/19
2.168.33.7  cmd=listStatus  src=/   dst=nullperm=null



2009/9/26 Starry SHI starr...@gmail.com

 Is that the first time you start your cluster? My experience is that, when
 you start the cluster once, then change the conf (say, add another slave),
 and restart your cluster, it sometimes generate some IPC issues (like the
 timeout in the namenode log). This change will cause the filesystem into
 safe mode so there will be the exceptions.

 The exception in your NN log is caused when the cluster was trying to add a
 new node but reached a timeout. If you have changed your cluster conf, you
 can use hadoop namenode format to format the filesystem and then start,
 then your cluster should be started.

 Hope this can help.

 Best regards,
 Starry

 /* Tomorrow is another day. So is today. */


 On Sat, Sep 26, 2009 at 09:03, Zheng Lv lvzheng19800...@gmail.com wrote:

  No one can give some suggestions?
 
  2009/9/25 Zheng Lv lvzheng19800...@gmail.com
 
   Hello Everyone,
   We have a cluster with one namenode and three datanodes. And we got
   these logs when starting hadoop0.20. Is it normal?
   2009-09-25 10:45:00,616 INFO
   org.apache.hadoop.hdfs.server.namenode.NameNode: STARTUP_MSG:
   /
   STARTUP_MSG: Starting NameNode
   STARTUP_MSG:   host = ubuntu6/192.168.33.7
   STARTUP_MSG:   args = []
   STARTUP_MSG:   version = 0.20.1-dev
   STARTUP_MSG:   build =  -r ; compiled by 'cyd' on 2009?ê 08?? 07??
 ??
   15:20:16 CST
   /
   2009-09-25 10:45:00,721 INFO org.apache.hadoop.ipc.metrics.RpcMetrics:
   Initializing RPC Metrics with hostName=NameNode, port=9000
   2009-09-25 10:45:00,767 INFO
   org.apache.hadoop.hdfs.server.namenode.NameNode: Namenode up at:
 ubuntu6/
   192.168.33.7:9000
   2009-09-25 10:45:00,769 INFO org.apache.hadoop.metrics.jvm.JvmMetrics:
   Initializing JVM Metrics with processName=NameNode, sessionId=null
   2009-09-25 10:45:00,771 INFO
   org.apache.hadoop.hdfs.server.namenode.metrics.NameNodeMetrics:
  Initializing
   NameNodeMeterics using context
   object:org.apache.hadoop.metrics.spi.NullContext
   2009-09-25 10:45:00,835 INFO
   org.apache.hadoop.hdfs.server.namenode.FSNamesystem:
   fsOwner=cyd,cyd,adm,dialout,cdrom,plugdev,lpadmin,sambashare,admin
   2009-09-25 10:45:00,835 INFO
   org.apache.hadoop.hdfs.server.namenode.FSNamesystem:
  supergroup=supergroup
   2009-09-25 10:45:00,835 INFO
   org.apache.hadoop.hdfs.server.namenode.FSNamesystem:
   isPermissionEnabled=true
   2009-09-25 10:45:00,842 INFO
   org.apache.hadoop.hdfs.server.namenode.metrics.FSNamesystemMetrics:
   Initializing FSNamesystemMetrics using context
   object:org.apache.hadoop.metrics.spi.NullContext
   2009-09-25 10:45:00,843 INFO
   org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Registered
   FSNamesystemStatusMBean
   2009-09-25 10:45:00,871 INFO
  org.apache.hadoop.hdfs.server.common.Storage:
   Number of files = 2834
   2009-09-25 10:45:01,412 INFO
  org.apache.hadoop.hdfs.server.common.Storage:
   Number of files under construction = 1
   2009-09-25 10:45:01,416 INFO
  org.apache.hadoop.hdfs.server.common.Storage:
   Image file of size 329240 loaded in 0 seconds.
   2009-09-25 10:45:01,417 INFO
  org.apache.hadoop.hdfs.server.common.Storage:
   Edits file /home/cyd/hdfs2/name/current/edits of size 4 edits # 0
 loaded
  in
   0 seconds.
   2009-09-25 10:45:01,683 INFO
  org.apache.hadoop.hdfs.server.common.Storage:
   Image file of size 329240 saved in 0 seconds.
   2009-09-25 10:45:01,805 INFO
   org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Finished loading
   FSImage in 1007 msecs
   2009-09-25 10:45:01,821 INFO org.apache.hadoop.hdfs.StateChange: STATE*
   Safe mode ON.
   The ratio of reported blocks 0. has not reached the threshold
 0.9990.
   Safe mode will be turned off automatically.
   2009-09-25 10:45:07,162 INFO org.mortbay.log: Logging to
   org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via
   org.mortbay.log.Slf4jLog
   2009-09-25 10:45:07,228 INFO org.apache.hadoop.http.HttpServer: Jetty
  bound
   to port 50070
   2009-09-25 10:45:07,229 INFO org.mortbay.log: jetty-6.1.14
   2009-09-25 10:47:34,305 INFO org.mortbay.log: Started
   selectchannelconnec...@0.0.0.0:50070
   2009-09-25 10:47:34,319 INFO
   org.apache.hadoop.hdfs.server.namenode.NameNode: Web

Exceptions when starting hadoop

2009-09-24 Thread Zheng Lv
Hello Everyone,
We have a cluster with one namenode and three datanodes. And we got
these logs when starting hadoop0.20. Is it normal?
2009-09-25 10:45:00,616 INFO
org.apache.hadoop.hdfs.server.namenode.NameNode: STARTUP_MSG:
/
STARTUP_MSG: Starting NameNode
STARTUP_MSG:   host = ubuntu6/192.168.33.7
STARTUP_MSG:   args = []
STARTUP_MSG:   version = 0.20.1-dev
STARTUP_MSG:   build =  -r ; compiled by 'cyd' on 2009?ê 08?? 07?? ??
15:20:16 CST
/
2009-09-25 10:45:00,721 INFO org.apache.hadoop.ipc.metrics.RpcMetrics:
Initializing RPC Metrics with hostName=NameNode, port=9000
2009-09-25 10:45:00,767 INFO
org.apache.hadoop.hdfs.server.namenode.NameNode: Namenode up at: ubuntu6/
192.168.33.7:9000
2009-09-25 10:45:00,769 INFO org.apache.hadoop.metrics.jvm.JvmMetrics:
Initializing JVM Metrics with processName=NameNode, sessionId=null
2009-09-25 10:45:00,771 INFO
org.apache.hadoop.hdfs.server.namenode.metrics.NameNodeMetrics: Initializing
NameNodeMeterics using context
object:org.apache.hadoop.metrics.spi.NullContext
2009-09-25 10:45:00,835 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem:
fsOwner=cyd,cyd,adm,dialout,cdrom,plugdev,lpadmin,sambashare,admin
2009-09-25 10:45:00,835 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: supergroup=supergroup
2009-09-25 10:45:00,835 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem:
isPermissionEnabled=true
2009-09-25 10:45:00,842 INFO
org.apache.hadoop.hdfs.server.namenode.metrics.FSNamesystemMetrics:
Initializing FSNamesystemMetrics using context
object:org.apache.hadoop.metrics.spi.NullContext
2009-09-25 10:45:00,843 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Registered
FSNamesystemStatusMBean
2009-09-25 10:45:00,871 INFO org.apache.hadoop.hdfs.server.common.Storage:
Number of files = 2834
2009-09-25 10:45:01,412 INFO org.apache.hadoop.hdfs.server.common.Storage:
Number of files under construction = 1
2009-09-25 10:45:01,416 INFO org.apache.hadoop.hdfs.server.common.Storage:
Image file of size 329240 loaded in 0 seconds.
2009-09-25 10:45:01,417 INFO org.apache.hadoop.hdfs.server.common.Storage:
Edits file /home/cyd/hdfs2/name/current/edits of size 4 edits # 0 loaded in
0 seconds.
2009-09-25 10:45:01,683 INFO org.apache.hadoop.hdfs.server.common.Storage:
Image file of size 329240 saved in 0 seconds.
2009-09-25 10:45:01,805 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Finished loading
FSImage in 1007 msecs
2009-09-25 10:45:01,821 INFO org.apache.hadoop.hdfs.StateChange: STATE* Safe
mode ON.
The ratio of reported blocks 0. has not reached the threshold 0.9990.
Safe mode will be turned off automatically.
2009-09-25 10:45:07,162 INFO org.mortbay.log: Logging to
org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via
org.mortbay.log.Slf4jLog
2009-09-25 10:45:07,228 INFO org.apache.hadoop.http.HttpServer: Jetty bound
to port 50070
2009-09-25 10:45:07,229 INFO org.mortbay.log: jetty-6.1.14
2009-09-25 10:47:34,305 INFO org.mortbay.log: Started
selectchannelconnec...@0.0.0.0:50070
2009-09-25 10:47:34,319 INFO
org.apache.hadoop.hdfs.server.namenode.NameNode: Web-server up at:
0.0.0.0:50070
2009-09-25 10:47:34,320 INFO org.apache.hadoop.ipc.Server: IPC Server
Responder: starting
2009-09-25 10:47:34,321 INFO org.apache.hadoop.ipc.Server: IPC Server
listener on 9000: starting
2009-09-25 10:47:34,355 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 0 on 9000: starting
2009-09-25 10:47:34,355 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 1 on 9000: starting
2009-09-25 10:47:34,356 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 2 on 9000: starting
2009-09-25 10:47:34,356 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 3 on 9000: starting
2009-09-25 10:47:34,384 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 5 on 9000: starting
2009-09-25 10:47:34,397 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 4 on 9000: starting
2009-09-25 10:47:34,398 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 6 on 9000: starting
2009-09-25 10:47:34,400 WARN org.apache.hadoop.ipc.Server: IPC Server
Responder, call
getProtocolVersion(org.apache.hadoop.hdfs.protocol.ClientProtocol, 41) from
192.168.33.7:54977: output error
2009-09-25 10:47:34,401 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 7 on 9000: starting
2009-09-25 10:47:34,402 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 8 on 9000: starting
2009-09-25 10:47:34,404 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 9 on 9000: starting
2009-09-25 10:47:34,433 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 1 on 9000 caught: java.nio.channels.ClosedChannelException
at
sun.nio.ch.SocketChannelImpl.ensureWriteOpen(SocketChannelImpl.java:126)
at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:324)
at org.apache.hadoop.ipc.Server.channelWrite(Server.java:1195)
at 

Exception when starting namenode

2009-08-20 Thread Zheng Lv
Hello,

I got these exceptions when I started the cluster, any suggestions?
I used hadoop 0.15.2.
2009-08-21 12:12:53,463 ERROR org.apache.hadoop.dfs.NameNode:
java.io.EOFException
at java.io.DataInputStream.readInt(DataInputStream.java:375)
at org.apache.hadoop.dfs.FSImage.loadFSImage(FSImage.java:650)
at org.apache.hadoop.dfs.FSImage.loadFSImage(FSImage.java:614)
at
org.apache.hadoop.dfs.FSImage.recoverTransitionRead(FSImage.java:222)
at
org.apache.hadoop.dfs.FSDirectory.loadFSImage(FSDirectory.java:76)
at org.apache.hadoop.dfs.FSNamesystem.init(FSNamesystem.java:221)
at org.apache.hadoop.dfs.NameNode.init(NameNode.java:130)
at org.apache.hadoop.dfs.NameNode.init(NameNode.java:168)
at org.apache.hadoop.dfs.NameNode.createNameNode(NameNode.java:804)
at org.apache.hadoop.dfs.NameNode.main(NameNode.java:813)
Thank you,
LvZheng