reduce takes too long time
Hello Everyone, One of our job's has 4 reduce tasks, but we find that one of them runs normally, and others takes too long time. Following is the normal task's log: 2010-04-01 15:01:48,596 INFO org.apache.hadoop.mapred.Merger: Merging 1 sorted segments 2010-04-01 15:01:48,601 INFO org.apache.hadoop.mapred.Merger: Down to the last merge-pass, with 1 segments left of total size: 9907055 bytes 2010-04-01 15:01:48,605 WARN org.apache.hadoop.mapred.JobConf: The variable mapred.task.maxvmem is no longer used. Instead use mapred.job.map.memory.mb and mapred.job.reduce.memory.mb 2010-04-01 15:01:48,622 WARN org.apache.hadoop.mapred.JobConf: The variable mapred.task.maxvmem is no longer used. Instead use mapred.job.map.memory.mb and mapred.job.reduce.memory.mb 2010-04-01 15:01:48,672 INFO org.apache.hadoop.io.compress.CodecPool: Got brand-new compressor 2010-04-01 15:02:03,744 INFO org.apache.hadoop.mapred.TaskRunner: Task:attempt_201003301656_0139_r_01_0 is done. And is in the process of commiting 2010-04-01 15:02:05,756 INFO org.apache.hadoop.mapred.TaskRunner: Task attempt_201003301656_0139_r_01_0 is allowed to commit now 2010-04-01 15:02:05,762 INFO org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter: Saved output of task 'attempt_201003301656_0139_r_01_0' to /user/root/nginxlog/sessionjob/output/20100401140001-20100401150001 2010-04-01 15:02:05,765 INFO org.apache.hadoop.mapred.TaskRunner: Task 'attempt_201003301656_0139_r_01_0' done. And following is one of others: 2010-04-01 15:01:49,549 INFO org.apache.hadoop.mapred.Merger: Merging 1 sorted segments 2010-04-01 15:01:49,554 INFO org.apache.hadoop.mapred.Merger: Down to the last merge-pass, with 1 segments left of total size: 9793700 bytes 2010-04-01 15:01:49,563 WARN org.apache.hadoop.mapred.JobConf: The variable mapred.task.maxvmem is no longer used. Instead use mapred.job.map.memory.mb and mapred.job.reduce.memory.mb 2010-04-01 15:01:49,582 WARN org.apache.hadoop.mapred.JobConf: The variable mapred.task.maxvmem is no longer used. Instead use mapred.job.map.memory.mb and mapred.job.reduce.memory.mb 2010-04-01 15:04:49,690 INFO org.apache.hadoop.io.compress.CodecPool: Got brand-new compressor 2010-04-01 15:05:07,103 INFO org.apache.hadoop.mapred.TaskRunner: Task:attempt_201003301656_0139_r_00_0 is done. And is in the process of commiting 2010-04-01 15:05:09,114 INFO org.apache.hadoop.mapred.TaskRunner: Task attempt_201003301656_0139_r_00_0 is allowed to commit now 2010-04-01 15:05:09,120 INFO org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter: Saved output of task 'attempt_201003301656_0139_r_00_0' to /user/root/nginxlog/sessionjob/output/20100401140001-20100401150001 2010-04-01 15:05:09,123 INFO org.apache.hadoop.mapred.TaskRunner: Task 'attempt_201003301656_0139_r_00_0' done. It looks like sth is waiting before 2010-04-01 15:05:07,103 INFO org.apache.hadoop.mapred.TaskRunner: Task:attempt_201003301656_0139_r_00_0 is done. And is in the process of commiting.Any suggestion? Regards, LvZheng
error when starup with two dfs.name.dir
Hello Everyone, I added a NFS mount point to the dfs.name.dir configuration option, but after that when I restarted the hadoop cluster I got these: / 2010-03-03 18:32:59,708 INFO org.apache.hadoop.ipc.metrics.RpcMetrics: Initializing RPC Metrics with hostName=NameNode, port=9000 2010-03-03 18:32:59,714 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: Namenode up at: cactus207/ 172.16.1.207:9000 2010-03-03 18:32:59,723 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: Initializing JVM Metrics with processName=NameNode, sessionId=null 2010-03-03 18:32:59,724 INFO org.apache.hadoop.hdfs.server.namenode.metrics.NameNodeMetrics: Initializing NameNodeMeterics using context object:org.apache.had oop.metrics.spi.NullContext 2010-03-03 18:32:59,798 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: fsOwner=root,root,bin,daemon,sys,adm,disk,wheel 2010-03-03 18:32:59,798 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: supergroup=supergroup 2010-03-03 18:32:59,798 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: isPermissionEnabled=true 2010-03-03 18:32:59,805 INFO org.apache.hadoop.hdfs.server.namenode.metrics.FSNamesystemMetrics: Initializing FSNamesystemMetrics using context object:org.apa che.hadoop.metrics.spi.NullContext 2010-03-03 18:32:59,806 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Registered FSNamesystemStatusMBean 2010-03-03 18:33:29,856 INFO org.apache.hadoop.hdfs.server.common.Storage: java.io.IOException: No locks available at sun.nio.ch.FileChannelImpl.lock0(Native Method) at sun.nio.ch.FileChannelImpl.tryLock(FileChannelImpl.java:879) at java.nio.channels.FileChannel.tryLock(FileChannel.java:962) at org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.tryLock(Storage.java:527) at org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.lock(Storage.java:505) at org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.analyzeStorage(Storage.java:363) at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:285) at org.apache.hadoop.hdfs.server.namenode.FSDirectory.loadFSImage(FSDirectory.java:87) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initialize(FSNamesystem.java:311) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.init(FSNamesystem.java:292) at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:201) at org.apache.hadoop.hdfs.server.namenode.NameNode.init(NameNode.java:279) at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:956) at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:965) 2010-03-03 18:33:29,858 ERROR org.apache.hadoop.hdfs.server.namenode.FSNamesystem: FSNamesystem initialization failed. java.io.IOException: No locks available at sun.nio.ch.FileChannelImpl.lock0(Native Method) at sun.nio.ch.FileChannelImpl.tryLock(FileChannelImpl.java:879) at java.nio.channels.FileChannel.tryLock(FileChannel.java:962) at org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.tryLock(Storage.java:527) at org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.lock(Storage.java:505) at org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.analyzeStorage(Storage.java:363) at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:285) at org.apache.hadoop.hdfs.server.namenode.FSDirectory.loadFSImage(FSDirectory.java:87) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initialize(FSNamesystem.java:311) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.init(FSNamesystem.java:292) at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:201) at org.apache.hadoop.hdfs.server.namenode.NameNode.init(NameNode.java:279) at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:956) at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:965) 2010-03-03 18:33:29,859 INFO org.apache.hadoop.ipc.Server: Stopping server on 9000 2010-03-03 18:33:29,859 ERROR org.apache.hadoop.hdfs.server.namenode.NameNode: java.io.IOException: No locks available at sun.nio.ch.FileChannelImpl.lock0(Native Method) at sun.nio.ch.FileChannelImpl.tryLock(FileChannelImpl.java:879) at java.nio.channels.FileChannel.tryLock(FileChannel.java:962) at org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.tryLock(Storage.java:527) at org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.lock(Storage.java:505) at org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.analyzeStorage(Storage.java:363) at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:285) at
Re: Many child processes dont exit
Thank you Jason, your reply is helpful. 2010/2/23 Jason Venner jason.had...@gmail.com Someone is using a threadpool that does not have daemon priority threads, and that is not shutdown before the main method returns. The non daemon threads prevent the jvm from exiting. We had this problem for a while and modified the Child.main to exit, rather than trying to work out and fix the third party library that ran the thread pool. This thecnique does of prevent jvm reuse. On Sat, Feb 20, 2010 at 6:49 AM, Ted Yu yuzhih...@gmail.com wrote: Do you have System.exit() as the last line in your main() ? Job job = createSubmittableJob(conf, otherArgs); System.exit(job.waitForCompletion(true)? 0 : 1); On Sat, Feb 20, 2010 at 12:32 AM, Zheng Lv lvzheng19800...@gmail.com wrote: Hello Ted, Yes. Every hour a job will be created and started, and when it finished, it will maintain. The logs looks like normal, do you know what can lead to this happen?Thank you. LvZheng 2010/2/20 Ted Yu yuzhih...@gmail.com Did the number of child processes increase over time ? On Friday, February 19, 2010, Zheng Lv lvzheng19800...@gmail.com wrote: Hello Edson, Thank you for your reply. I don't want to kill them, I want to know why these child processes don't exit, and to know how to make them exit successfully when they finish. Any ideas? Thank you. LvZheng 2010/2/18 Edson Ramiro erlfi...@gmail.com Do you want to kill them ? if yes, you can use ./bin/slaves.sh pkill java but it will kill the datanode and tasktracker processes in all slaves and you'll need to start these processes again. Edson Ramiro On 14 February 2010 22:09, Zheng Lv lvzheng19800...@gmail.com wrote: any idea? 2010/2/11 Zheng Lv lvzheng19800...@gmail.com Hello Everyone, We often find many child processes in datanodes, which have already finished for long time. And following are the jstack log: Full thread dump Java HotSpot(TM) 64-Bit Server VM (14.3-b01 mixed mode): DestroyJavaVM prio=10 tid=0x2aaac8019800 nid=0x2422 waiting on condition [0x] java.lang.Thread.State: RUNNABLE NioProcessor-31 prio=10 tid=0x439fa000 nid=0x2826 runnable [0x4100a000] java.lang.Thread.State: RUNNABLE at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method) at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:215) at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65) at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69) - locked 0x2aaab9b5f6f8 (a sun.nio.ch.Util$1) - locked 0x2aaab9b5f710 (a java.util.Collections$UnmodifiableSet) - locked 0x2aaab9b5f680 (a sun.nio.ch.EPollSelectorImpl) at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80) at org.apache.mina.transport.socket.nio.NioProcessor.select(NioProcessor.java:65) at org.apache.mina.common.AbstractPollingIoProcessor$Worker.run(AbstractPollingIoProcessor.java:672) at org.apache.mina.util.NamePreservingRunnable.run(NamePreservingRunnable.java:51) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:619) pool-15-thread-1 prio=10 tid=0x2aaac802d000 nid=0x2825 waiting on condition [0x41604000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for 0x2aaab9b61620 (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1925) at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:358) at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:947) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Threa -- Pro Hadoop, a book to guide you from beginner to hadoop mastery, http://www.amazon.com/dp/1430219424?tag=jewlerymall www.prohadoopbook.com a community for Hadoop Professionals
Re: Many child processes dont exit
Hello Ted, Yes. Every hour a job will be created and started, and when it finished, it will maintain. The logs looks like normal, do you know what can lead to this happen?Thank you. LvZheng 2010/2/20 Ted Yu yuzhih...@gmail.com Did the number of child processes increase over time ? On Friday, February 19, 2010, Zheng Lv lvzheng19800...@gmail.com wrote: Hello Edson, Thank you for your reply. I don't want to kill them, I want to know why these child processes don't exit, and to know how to make them exit successfully when they finish. Any ideas? Thank you. LvZheng 2010/2/18 Edson Ramiro erlfi...@gmail.com Do you want to kill them ? if yes, you can use ./bin/slaves.sh pkill java but it will kill the datanode and tasktracker processes in all slaves and you'll need to start these processes again. Edson Ramiro On 14 February 2010 22:09, Zheng Lv lvzheng19800...@gmail.com wrote: any idea? 2010/2/11 Zheng Lv lvzheng19800...@gmail.com Hello Everyone, We often find many child processes in datanodes, which have already finished for long time. And following are the jstack log: Full thread dump Java HotSpot(TM) 64-Bit Server VM (14.3-b01 mixed mode): DestroyJavaVM prio=10 tid=0x2aaac8019800 nid=0x2422 waiting on condition [0x] java.lang.Thread.State: RUNNABLE NioProcessor-31 prio=10 tid=0x439fa000 nid=0x2826 runnable [0x4100a000] java.lang.Thread.State: RUNNABLE at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method) at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:215) at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65) at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69) - locked 0x2aaab9b5f6f8 (a sun.nio.ch.Util$1) - locked 0x2aaab9b5f710 (a java.util.Collections$UnmodifiableSet) - locked 0x2aaab9b5f680 (a sun.nio.ch.EPollSelectorImpl) at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80) at org.apache.mina.transport.socket.nio.NioProcessor.select(NioProcessor.java:65) at org.apache.mina.common.AbstractPollingIoProcessor$Worker.run(AbstractPollingIoProcessor.java:672) at org.apache.mina.util.NamePreservingRunnable.run(NamePreservingRunnable.java:51) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:619) pool-15-thread-1 prio=10 tid=0x2aaac802d000 nid=0x2825 waiting on condition [0x41604000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for 0x2aaab9b61620 (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1925) at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:358) at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:947) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Threa
Re: Many child processes dont exit
Hello Edson, Thank you for your reply. I don't want to kill them, I want to know why these child processes don't exit, and to know how to make them exit successfully when they finish. Any ideas? Thank you. LvZheng 2010/2/18 Edson Ramiro erlfi...@gmail.com Do you want to kill them ? if yes, you can use ./bin/slaves.sh pkill java but it will kill the datanode and tasktracker processes in all slaves and you'll need to start these processes again. Edson Ramiro On 14 February 2010 22:09, Zheng Lv lvzheng19800...@gmail.com wrote: any idea? 2010/2/11 Zheng Lv lvzheng19800...@gmail.com Hello Everyone, We often find many child processes in datanodes, which have already finished for long time. And following are the jstack log: Full thread dump Java HotSpot(TM) 64-Bit Server VM (14.3-b01 mixed mode): DestroyJavaVM prio=10 tid=0x2aaac8019800 nid=0x2422 waiting on condition [0x] java.lang.Thread.State: RUNNABLE NioProcessor-31 prio=10 tid=0x439fa000 nid=0x2826 runnable [0x4100a000] java.lang.Thread.State: RUNNABLE at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method) at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:215) at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65) at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69) - locked 0x2aaab9b5f6f8 (a sun.nio.ch.Util$1) - locked 0x2aaab9b5f710 (a java.util.Collections$UnmodifiableSet) - locked 0x2aaab9b5f680 (a sun.nio.ch.EPollSelectorImpl) at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80) at org.apache.mina.transport.socket.nio.NioProcessor.select(NioProcessor.java:65) at org.apache.mina.common.AbstractPollingIoProcessor$Worker.run(AbstractPollingIoProcessor.java:672) at org.apache.mina.util.NamePreservingRunnable.run(NamePreservingRunnable.java:51) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:619) pool-15-thread-1 prio=10 tid=0x2aaac802d000 nid=0x2825 waiting on condition [0x41604000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for 0x2aaab9b61620 (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1925) at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:358) at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:947) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:907) at java.lang.Thread.run(Thread.java:619) Attach Listener daemon prio=10 tid=0x43a22800 nid=0x2608 waiting on condition [0x] java.lang.Thread.State: RUNNABLE pool-3-thread-1 prio=10 tid=0x438ad000 nid=0x243f waiting on condition [0x42575000] java.lang.Thread.State: TIMED_WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for 0x2aaab9ae8770 (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:198) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:1963) at java.util.concurrent.DelayQueue.take(DelayQueue.java:164) at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:583) at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:576) at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:947) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:907) at java.lang.Thread.run(Thread.java:619) Thread for syncLogs daemon prio=10 tid=0x2aaac4446800 nid=0x2437 waiting on condition [0x40f09000] java.lang.Thread.State: TIMED_WAITING (sleeping) at java.lang.Thread.sleep(Native Method) at org.apache.hadoop.mapred.Child$2.run(Child.java:87) Low Memory Detector daemon prio=10 tid=0x43771800 nid=0x242d runnable [0x] java.lang.Thread.State: RUNNABLE CompilerThread1 daemon prio=10 tid=0x4376e800 nid=0x242c waiting
Re: Many child processes dont exit
any idea? 2010/2/11 Zheng Lv lvzheng19800...@gmail.com Hello Everyone, We often find many child processes in datanodes, which have already finished for long time. And following are the jstack log: Full thread dump Java HotSpot(TM) 64-Bit Server VM (14.3-b01 mixed mode): DestroyJavaVM prio=10 tid=0x2aaac8019800 nid=0x2422 waiting on condition [0x] java.lang.Thread.State: RUNNABLE NioProcessor-31 prio=10 tid=0x439fa000 nid=0x2826 runnable [0x4100a000] java.lang.Thread.State: RUNNABLE at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method) at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:215) at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65) at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69) - locked 0x2aaab9b5f6f8 (a sun.nio.ch.Util$1) - locked 0x2aaab9b5f710 (a java.util.Collections$UnmodifiableSet) - locked 0x2aaab9b5f680 (a sun.nio.ch.EPollSelectorImpl) at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80) at org.apache.mina.transport.socket.nio.NioProcessor.select(NioProcessor.java:65) at org.apache.mina.common.AbstractPollingIoProcessor$Worker.run(AbstractPollingIoProcessor.java:672) at org.apache.mina.util.NamePreservingRunnable.run(NamePreservingRunnable.java:51) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:619) pool-15-thread-1 prio=10 tid=0x2aaac802d000 nid=0x2825 waiting on condition [0x41604000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for 0x2aaab9b61620 (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1925) at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:358) at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:947) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:907) at java.lang.Thread.run(Thread.java:619) Attach Listener daemon prio=10 tid=0x43a22800 nid=0x2608 waiting on condition [0x] java.lang.Thread.State: RUNNABLE pool-3-thread-1 prio=10 tid=0x438ad000 nid=0x243f waiting on condition [0x42575000] java.lang.Thread.State: TIMED_WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for 0x2aaab9ae8770 (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:198) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:1963) at java.util.concurrent.DelayQueue.take(DelayQueue.java:164) at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:583) at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:576) at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:947) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:907) at java.lang.Thread.run(Thread.java:619) Thread for syncLogs daemon prio=10 tid=0x2aaac4446800 nid=0x2437 waiting on condition [0x40f09000] java.lang.Thread.State: TIMED_WAITING (sleeping) at java.lang.Thread.sleep(Native Method) at org.apache.hadoop.mapred.Child$2.run(Child.java:87) Low Memory Detector daemon prio=10 tid=0x43771800 nid=0x242d runnable [0x] java.lang.Thread.State: RUNNABLE CompilerThread1 daemon prio=10 tid=0x4376e800 nid=0x242c waiting on condition [0x] java.lang.Thread.State: RUNNABLE CompilerThread0 daemon prio=10 tid=0x4376a800 nid=0x242b waiting on condition [0x] java.lang.Thread.State: RUNNABLE Signal Dispatcher daemon prio=10 tid=0x43768800 nid=0x242a runnable [0x] java.lang.Thread.State: RUNNABLE Finalizer daemon prio=10 tid=0x43746000 nid=0x2429 in Object.wait() [0x42171000] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on 0x2aaab9a8afe0 (a java.lang.ref.ReferenceQueue$Lock) at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:118) - locked 0x2aaab9a8afe0
Exception when using SequenceFile
Hello Everyone, We have been using SequenceFile(GZIP) to save some data in our product, but we got the following exceptions recently, which never occurred before: java.io.EOFException at java.io.DataInputStream.readFully(DataInputStream.java:197) at java.io.DataInputStream.readFully(DataInputStream.java:169) at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1450) at org.apache.hadoop.io.SequenceFile$Reader.(SequenceFile.java:1428) at org.apache.hadoop.io.SequenceFile$Reader.(SequenceFile.java:1417) at org.apache.hadoop.io.SequenceFile$Reader.(SequenceFile.java:1412) at org.apache.hadoop.mapreduce.lib.input.SequenceFileRecordReader.initialize(SequenceFileRecordReader.java:50) at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.initialize(MapTask.java:414) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:517) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:303) at org.apache.hadoop.mapred.Child.main(Child.java:170) We googled it, but got nothing. Now we restart the hadoop and the application and until now the exception doesn't occur. But we want to know why the exception happened and how to avoid it. Any suggestion will help, thanks a lot. LvZheng.
Re: Exceptions when starting hadoop
org.apache.hadoop.hdfs.StateChange: BLOCK* NameSystem.addStoredBlock: blockMap updated: 192.168.33.5:50010 is added to blk_-69940 8630358636044_1002 size 4 2009-09-27 12:00:02,644 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* ask 192.168.33.8:50010 to delete blk_2342352795116225569_1001 2009-09-27 12:00:05,644 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* ask 192.168.33.6:50010 to delete blk_2342352795116225569_1001 2009-09-27 12:03:49,176 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: ugi=cyd,cyd,adm,dialout,cdrom,plugdev,lpadmin,sambashare,admin ip=/19 2.168.33.7 cmd=listStatus src=/ dst=nullperm=null 2009/9/26 Starry SHI starr...@gmail.com Is that the first time you start your cluster? My experience is that, when you start the cluster once, then change the conf (say, add another slave), and restart your cluster, it sometimes generate some IPC issues (like the timeout in the namenode log). This change will cause the filesystem into safe mode so there will be the exceptions. The exception in your NN log is caused when the cluster was trying to add a new node but reached a timeout. If you have changed your cluster conf, you can use hadoop namenode format to format the filesystem and then start, then your cluster should be started. Hope this can help. Best regards, Starry /* Tomorrow is another day. So is today. */ On Sat, Sep 26, 2009 at 09:03, Zheng Lv lvzheng19800...@gmail.com wrote: No one can give some suggestions? 2009/9/25 Zheng Lv lvzheng19800...@gmail.com Hello Everyone, We have a cluster with one namenode and three datanodes. And we got these logs when starting hadoop0.20. Is it normal? 2009-09-25 10:45:00,616 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: STARTUP_MSG: / STARTUP_MSG: Starting NameNode STARTUP_MSG: host = ubuntu6/192.168.33.7 STARTUP_MSG: args = [] STARTUP_MSG: version = 0.20.1-dev STARTUP_MSG: build = -r ; compiled by 'cyd' on 2009?ê 08?? 07?? ?? 15:20:16 CST / 2009-09-25 10:45:00,721 INFO org.apache.hadoop.ipc.metrics.RpcMetrics: Initializing RPC Metrics with hostName=NameNode, port=9000 2009-09-25 10:45:00,767 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: Namenode up at: ubuntu6/ 192.168.33.7:9000 2009-09-25 10:45:00,769 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: Initializing JVM Metrics with processName=NameNode, sessionId=null 2009-09-25 10:45:00,771 INFO org.apache.hadoop.hdfs.server.namenode.metrics.NameNodeMetrics: Initializing NameNodeMeterics using context object:org.apache.hadoop.metrics.spi.NullContext 2009-09-25 10:45:00,835 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: fsOwner=cyd,cyd,adm,dialout,cdrom,plugdev,lpadmin,sambashare,admin 2009-09-25 10:45:00,835 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: supergroup=supergroup 2009-09-25 10:45:00,835 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: isPermissionEnabled=true 2009-09-25 10:45:00,842 INFO org.apache.hadoop.hdfs.server.namenode.metrics.FSNamesystemMetrics: Initializing FSNamesystemMetrics using context object:org.apache.hadoop.metrics.spi.NullContext 2009-09-25 10:45:00,843 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Registered FSNamesystemStatusMBean 2009-09-25 10:45:00,871 INFO org.apache.hadoop.hdfs.server.common.Storage: Number of files = 2834 2009-09-25 10:45:01,412 INFO org.apache.hadoop.hdfs.server.common.Storage: Number of files under construction = 1 2009-09-25 10:45:01,416 INFO org.apache.hadoop.hdfs.server.common.Storage: Image file of size 329240 loaded in 0 seconds. 2009-09-25 10:45:01,417 INFO org.apache.hadoop.hdfs.server.common.Storage: Edits file /home/cyd/hdfs2/name/current/edits of size 4 edits # 0 loaded in 0 seconds. 2009-09-25 10:45:01,683 INFO org.apache.hadoop.hdfs.server.common.Storage: Image file of size 329240 saved in 0 seconds. 2009-09-25 10:45:01,805 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Finished loading FSImage in 1007 msecs 2009-09-25 10:45:01,821 INFO org.apache.hadoop.hdfs.StateChange: STATE* Safe mode ON. The ratio of reported blocks 0. has not reached the threshold 0.9990. Safe mode will be turned off automatically. 2009-09-25 10:45:07,162 INFO org.mortbay.log: Logging to org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via org.mortbay.log.Slf4jLog 2009-09-25 10:45:07,228 INFO org.apache.hadoop.http.HttpServer: Jetty bound to port 50070 2009-09-25 10:45:07,229 INFO org.mortbay.log: jetty-6.1.14 2009-09-25 10:47:34,305 INFO org.mortbay.log: Started selectchannelconnec...@0.0.0.0:50070 2009-09-25 10:47:34,319 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: Web
Exceptions when starting hadoop
Hello Everyone, We have a cluster with one namenode and three datanodes. And we got these logs when starting hadoop0.20. Is it normal? 2009-09-25 10:45:00,616 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: STARTUP_MSG: / STARTUP_MSG: Starting NameNode STARTUP_MSG: host = ubuntu6/192.168.33.7 STARTUP_MSG: args = [] STARTUP_MSG: version = 0.20.1-dev STARTUP_MSG: build = -r ; compiled by 'cyd' on 2009?ê 08?? 07?? ?? 15:20:16 CST / 2009-09-25 10:45:00,721 INFO org.apache.hadoop.ipc.metrics.RpcMetrics: Initializing RPC Metrics with hostName=NameNode, port=9000 2009-09-25 10:45:00,767 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: Namenode up at: ubuntu6/ 192.168.33.7:9000 2009-09-25 10:45:00,769 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: Initializing JVM Metrics with processName=NameNode, sessionId=null 2009-09-25 10:45:00,771 INFO org.apache.hadoop.hdfs.server.namenode.metrics.NameNodeMetrics: Initializing NameNodeMeterics using context object:org.apache.hadoop.metrics.spi.NullContext 2009-09-25 10:45:00,835 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: fsOwner=cyd,cyd,adm,dialout,cdrom,plugdev,lpadmin,sambashare,admin 2009-09-25 10:45:00,835 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: supergroup=supergroup 2009-09-25 10:45:00,835 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: isPermissionEnabled=true 2009-09-25 10:45:00,842 INFO org.apache.hadoop.hdfs.server.namenode.metrics.FSNamesystemMetrics: Initializing FSNamesystemMetrics using context object:org.apache.hadoop.metrics.spi.NullContext 2009-09-25 10:45:00,843 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Registered FSNamesystemStatusMBean 2009-09-25 10:45:00,871 INFO org.apache.hadoop.hdfs.server.common.Storage: Number of files = 2834 2009-09-25 10:45:01,412 INFO org.apache.hadoop.hdfs.server.common.Storage: Number of files under construction = 1 2009-09-25 10:45:01,416 INFO org.apache.hadoop.hdfs.server.common.Storage: Image file of size 329240 loaded in 0 seconds. 2009-09-25 10:45:01,417 INFO org.apache.hadoop.hdfs.server.common.Storage: Edits file /home/cyd/hdfs2/name/current/edits of size 4 edits # 0 loaded in 0 seconds. 2009-09-25 10:45:01,683 INFO org.apache.hadoop.hdfs.server.common.Storage: Image file of size 329240 saved in 0 seconds. 2009-09-25 10:45:01,805 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Finished loading FSImage in 1007 msecs 2009-09-25 10:45:01,821 INFO org.apache.hadoop.hdfs.StateChange: STATE* Safe mode ON. The ratio of reported blocks 0. has not reached the threshold 0.9990. Safe mode will be turned off automatically. 2009-09-25 10:45:07,162 INFO org.mortbay.log: Logging to org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via org.mortbay.log.Slf4jLog 2009-09-25 10:45:07,228 INFO org.apache.hadoop.http.HttpServer: Jetty bound to port 50070 2009-09-25 10:45:07,229 INFO org.mortbay.log: jetty-6.1.14 2009-09-25 10:47:34,305 INFO org.mortbay.log: Started selectchannelconnec...@0.0.0.0:50070 2009-09-25 10:47:34,319 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: Web-server up at: 0.0.0.0:50070 2009-09-25 10:47:34,320 INFO org.apache.hadoop.ipc.Server: IPC Server Responder: starting 2009-09-25 10:47:34,321 INFO org.apache.hadoop.ipc.Server: IPC Server listener on 9000: starting 2009-09-25 10:47:34,355 INFO org.apache.hadoop.ipc.Server: IPC Server handler 0 on 9000: starting 2009-09-25 10:47:34,355 INFO org.apache.hadoop.ipc.Server: IPC Server handler 1 on 9000: starting 2009-09-25 10:47:34,356 INFO org.apache.hadoop.ipc.Server: IPC Server handler 2 on 9000: starting 2009-09-25 10:47:34,356 INFO org.apache.hadoop.ipc.Server: IPC Server handler 3 on 9000: starting 2009-09-25 10:47:34,384 INFO org.apache.hadoop.ipc.Server: IPC Server handler 5 on 9000: starting 2009-09-25 10:47:34,397 INFO org.apache.hadoop.ipc.Server: IPC Server handler 4 on 9000: starting 2009-09-25 10:47:34,398 INFO org.apache.hadoop.ipc.Server: IPC Server handler 6 on 9000: starting 2009-09-25 10:47:34,400 WARN org.apache.hadoop.ipc.Server: IPC Server Responder, call getProtocolVersion(org.apache.hadoop.hdfs.protocol.ClientProtocol, 41) from 192.168.33.7:54977: output error 2009-09-25 10:47:34,401 INFO org.apache.hadoop.ipc.Server: IPC Server handler 7 on 9000: starting 2009-09-25 10:47:34,402 INFO org.apache.hadoop.ipc.Server: IPC Server handler 8 on 9000: starting 2009-09-25 10:47:34,404 INFO org.apache.hadoop.ipc.Server: IPC Server handler 9 on 9000: starting 2009-09-25 10:47:34,433 INFO org.apache.hadoop.ipc.Server: IPC Server handler 1 on 9000 caught: java.nio.channels.ClosedChannelException at sun.nio.ch.SocketChannelImpl.ensureWriteOpen(SocketChannelImpl.java:126) at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:324) at org.apache.hadoop.ipc.Server.channelWrite(Server.java:1195) at
Exception when starting namenode
Hello, I got these exceptions when I started the cluster, any suggestions? I used hadoop 0.15.2. 2009-08-21 12:12:53,463 ERROR org.apache.hadoop.dfs.NameNode: java.io.EOFException at java.io.DataInputStream.readInt(DataInputStream.java:375) at org.apache.hadoop.dfs.FSImage.loadFSImage(FSImage.java:650) at org.apache.hadoop.dfs.FSImage.loadFSImage(FSImage.java:614) at org.apache.hadoop.dfs.FSImage.recoverTransitionRead(FSImage.java:222) at org.apache.hadoop.dfs.FSDirectory.loadFSImage(FSDirectory.java:76) at org.apache.hadoop.dfs.FSNamesystem.init(FSNamesystem.java:221) at org.apache.hadoop.dfs.NameNode.init(NameNode.java:130) at org.apache.hadoop.dfs.NameNode.init(NameNode.java:168) at org.apache.hadoop.dfs.NameNode.createNameNode(NameNode.java:804) at org.apache.hadoop.dfs.NameNode.main(NameNode.java:813) Thank you, LvZheng