unable to run wordcount example on two node cluster
Hi All, I have the following setup ( Node 1, 2 are redhat linux 4; Node 3,4 are redhat linux 3) Node1 - namenode Node2 - job tracker Node3 - slave (data node) Node4 - slave (data node) was able to input some data to the datanodes and then could find that the data is properly getting stored on data nodes. [NODE1]$ bin/hadoop dfs -ls Found 3 items drwxr-xr-x - user1 supergroup 0 2008-07-04 07:10 /user/user1/input drwxr-xr-x - user1 supergroup 0 2008-07-04 09:17 /user/user1/test3 -rw-r--r-- 3 user1 supergroup 3951 2008-07-04 07:10 /user/user1/wordcount.jar Now, am trying to run wordcount example mentioned in the link http://hadoop.apache.org/core/docs/r0.17.0/mapred_tutorial.html Followed steps: 1)[NODE1]$ javac -classpath ${HADOOP_HOME}/ hadoop-0.17.1-core.jar -d wordcount_classes WordCount.java 2) [NODE1]$jar -cvf wordcount.jar -C wordcount_classes/ . 3)[NODE1]$ bin/hadoop dfs -copyFromLocal wordcount.jar wordcount.jar 4) [NODE1]$ bin/hadoop jar wordcount.jar org.myorg.WordCount input output The output is as follows [NODE1]$ bin/hadoop jar wordcount.jar org.myorg.WordCount input output2 08/07/06 03:10:23 INFO mapred.FileInputFormat: Total input paths to process : 3 08/07/06 03:10:23 INFO mapred.FileInputFormat: Total input paths to process : 3 08/07/06 03:10:24 INFO mapred.JobClient: Running job: job_200806290715_0027 08/07/06 03:10:25 INFO mapred.JobClient: map 0% reduce 0% *** It hangs here forever The log file on the node1 (namenode) says ... java.io.IOException: Inconsistent checkpoint fileds. LV = -16 namespaceID = 315235321 cTime = 0. Expecting respectively: -16; 902613609; 0 at org.apache.hadoop.dfs.CheckpointSignature.validateStorageInfo(CheckpointSignature.java:65) at org.apache.hadoop.dfs.SecondaryNameNode$CheckpointStorage.doMerge(SecondaryNameNode.java:568) at org.apache.hadoop.dfs.SecondaryNameNode$CheckpointStorage.access$000(SecondaryNameNode.java:464) at org.apache.hadoop.dfs.SecondaryNameNode.doMerge(SecondaryNameNode.java:341) at org.apache.hadoop.dfs.SecondaryNameNode.doCheckpoint(SecondaryNameNode.java:305) at org.apache.hadoop.dfs.SecondaryNameNode.run(SecondaryNameNode.java:216) hadoop-suravako-secondarynamenode-stapj13.out 15744L, 1409088C - Please let me know if i am missing something and please help me resolve the above issue. I shall provide any specific log info if required. Thankyou Srilatha
Re: unable to run wordcount example on two node cluster
See: http://www.michael-noll.com/wiki/Running_Hadoop_On_Ubuntu_Linux_(Multi-Node_Cluster) java.io.IOException: Incompatible namespaceIDs us latha schrieb: Please let me know if i am missing something and please help me resolve the above issue. I shall provide any specific log info if required. Thankyou Srilatha
Hadoop 0.17.0 - lots of I/O problems and can't run small datasets?
Hi All: I've got 0.17.0 set up on a 7 node grid (6 slaves w/datanodes, 1 master running namenode). I'm trying to process a small (180G) dataset. I've done this succesfully and painlessly running 0.15.0. When I run 0.17.0 with the same data and same code (w/API changes for 0.17.0 and recompiled, of course), I get a ton of failures. I've increased the number of namenode threads trying to resolve this, but that doesn't seem to help. The errors are of the following flavor: java.io.IOException: Could not get block locations. Aborting... java.io.IOException: All datanodes 10.2.11.2:50010 are bad. Aborting... Exception in thread Thread-2 java.util.ConcurrentModificationException Exception closing file /blah/_temporary/_task_200807052311_0001_r_ 04_0/baz/part-x As things stand right now, I can't deploy to 0.17.0 (or 0.16.4 or 0.17.1). I am wondering if anybody can shed some light on this, or if others are having similar problems. Any thoughts, insights, etc. would be greatly appreciated. Thanks, C G Here's an ugly trace: 08/07/06 01:43:29 INFO mapred.JobClient: map 100% reduce 93% 08/07/06 01:43:29 INFO mapred.JobClient: Task Id : task_200807052311_0001_r_03_0, Status : FAILED java.io.IOException: Could not get block locations. Aborting... at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.processDatanodeError(DFSClient.java:2080) at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access$1300(DFSClient.java:1702) at org.apache.hadoop.dfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:1818) task_200807052311_0001_r_03_0: Exception closing file /output/_temporary/_task_200807052311_0001_r_ 03_0/a/b/part-3 task_200807052311_0001_r_03_0: java.io.IOException: All datanodes 10.2.11.2:50010 are bad. Aborting... task_200807052311_0001_r_03_0: at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.processDatanodeError(DFSClient.ja va:2095) task_200807052311_0001_r_03_0: at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access$1300(DFSClient.java:1702) task_200807052311_0001_r_03_0: at org.apache.hadoop.dfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:1 818) task_200807052311_0001_r_03_0: Exception in thread Thread-2 java.util..ConcurrentModificationException task_200807052311_0001_r_03_0: at java.util.TreeMap$PrivateEntryIterator.nextEntry(TreeMap.java:1100) task_200807052311_0001_r_03_0: at java.util.TreeMap$KeyIterator.next(TreeMap.java:1154) task_200807052311_0001_r_03_0: at org.apache.hadoop.dfs.DFSClient.close(DFSClient.java:217) task_200807052311_0001_r_03_0: at org.apache.hadoop.dfs.DistributedFileSystem.close(DistributedFileSystem.java:214) task_200807052311_0001_r_03_0: at org.apache.hadoop.fs.FileSystem$Cache.closeAll(FileSystem.java:1324) task_200807052311_0001_r_03_0: at org.apache.hadoop.fs.FileSystem.closeAll(FileSystem.java:224) task_200807052311_0001_r_03_0: at org.apache.hadoop.fs.FileSystem$ClientFinalizer.run(FileSystem.java:209) 08/07/06 01:44:32 INFO mapred.JobClient: map 100% reduce 74% 08/07/06 01:44:32 INFO mapred.JobClient: Task Id : task_200807052311_0001_r_01_0, Status : FAILED java.io.IOException: Could not get block locations. Aborting... at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.processDatanodeError(DFSClient.java:2080) at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access$1300(DFSClient.java:1702) at org.apache.hadoop.dfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:1818) task_200807052311_0001_r_01_0: Exception in thread Thread-2 java.util..ConcurrentModificationException task_200807052311_0001_r_01_0: at java.util.TreeMap$PrivateEntryIterator.nextEntry(TreeMap.java:1100) task_200807052311_0001_r_01_0: at java.util.TreeMap$KeyIterator.next(TreeMap.java:1154) task_200807052311_0001_r_01_0: at org.apache.hadoop.dfs.DFSClient.close(DFSClient.java:217) task_200807052311_0001_r_01_0: at org.apache.hadoop.dfs.DistributedFileSystem.close(DistributedFileSystem.java:214) task_200807052311_0001_r_01_0: at org.apache.hadoop.fs.FileSystem$Cache.closeAll(FileSystem.java:1324) task_200807052311_0001_r_01_0: at org.apache.hadoop.fs.FileSystem.closeAll(FileSystem.java:224) task_200807052311_0001_r_01_0: at org.apache.hadoop.fs.FileSystem$ClientFinalizer.run(FileSystem.java:209) 08/07/06 01:44:45 INFO mapred.JobClient: map 100% reduce 54%
Re: ERROR dfs.NameNode - java.io.EOFException
Hi, (lohit has been helping me with this and asked me to send this detailed message to the list for others to see and possibly help) Right, I saw the permission reference, too. I assume(d) the permission information is stored inside the edits file and that that part of the edits is corrupt and causing the exception. I'm using 0.16.2 (with Nutch). I'm not upgrading. Hadoop died while I wasn't watching it, while running the usual Nutch MapReduce jobs. I'm running everything with the same user. I'm really not making any changes. Hadoop just died and when I tried restarting it, NN would not start. The first errors from the logs when things went bad look like this: Task task_200806101759_0344_r_01_0 failed to report status for 601 seconds. Killing! Task task_200806101759_0344_r_09_0 failed to report status for 600 seconds. Killing! Task task_200806101759_0344_r_07_0 failed to report status for 602 seconds. Killing! Task task_200806101759_0344_r_08_0 failed to report status for 601 seconds. Killing! Error initializing task_200806101759_0344_r_09_1: org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find any valid local directory for taskTracker/jobcache/job_200806101759_0344/job.xml at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:313) at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:124) at org.apache.hadoop.mapred.TaskTracker.localizeJob(TaskTracker.java:639) at org.apache.hadoop.mapred.TaskTracker.startNewTask(TaskTracker.java:1282) at org.apache.hadoop.mapred.TaskTracker.offerService(TaskTracker.java:923) at org.apache.hadoop.mapred.TaskTracker.run(TaskTracker.java:1318) at org.apache.hadoop.mapred.TaskTracker.main(TaskTracker.java:2210) ... ... From the task name, it looks like it was started on 2008-06-10. But the errors above happened on June 23rd - 13 days later, and I am quite sure my jobs were not taking 13 days to complete! Then a little later in the log I see: java.io.IOException: task_200806101759_0352_r_05_0The reduce copier failed at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:260) at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2084) And then a little later I see this: FSError: java.io.IOException: No space left on device task_200806101759_0370_m_40_0: log4j:WARN No appenders could be found for logger (org.apache.hadoop.mapred.TaskRunner). task_200806101759_0370_m_40_0: log4j:WARN Please initialize the log4j system properly. FSError: java.io.IOException: No space left on device FSError: java.io.IOException: No space left on device task_200806101759_0370_m_76_0: Exception in thread SortSpillThread org.apache.hadoop.fs.FSError: java.io.IOException: No space left on device task_200806101759_0370_m_76_0: at org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.write(RawLocalFileSystem.java:171) task_200806101759_0370_m_76_0: at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65) task_200806101759_0370_m_76_0: at java.io.BufferedOutputStream.write(BufferedOutputStream.java:109) task_200806101759_0370_m_76_0: at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:41) task_200806101759_0370_m_76_0: at java.io.DataOutputStream.write(DataOutputStream.java:90) task_200806101759_0370_m_76_0: at org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSOutputSummer.writeChunk(ChecksumFileSystem.java:339) task_200806101759_0370_m_76_0: at org.apache.hadoop.fs.FSOutputSummer.writeChecksumChunk(FSOutputSummer.java:141) task_200806101759_0370_m_76_0: at org.apache.hadoop.fs.FSOutputSummer.flushBuffer(FSOutputSummer.java:124) task_200806101759_0370_m_76_0: at org.apache.hadoop.fs.FSOutputSummer.write(FSOutputSummer.java:58) task_200806101759_0370_m_76_0: at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:36) task_200806101759_0370_m_76_0: at java.io.DataOutputStream.writeInt(DataOutputStream.java:183) task_200806101759_0370_m_76_0: at org.apache.hadoop.io.SequenceFile$Writer.sync(SequenceFile.java:927) task_200806101759_0370_m_76_0: at org.apache.hadoop.io.SequenceFile$Writer.checkAndWriteSync(SequenceFile.java:954) task_200806101759_0370_m_76_0: at org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:987) task_200806101759_0370_m_76_0: at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.spill(MapTask.java:555) task_200806101759_0370_m_76_0: at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpillToDisk(MapTask.java:497) task_200806101759_0370_m_76_0: at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.access$200(MapTask.java:264) task_200806101759_0370_m_76_0:
Re: Combiner is optional though it is specified?
I am guessing it as a bug in Hadoop-17. Because I am able to reproduce the problem. But, I am not able to figure where exactly this can happen. Can some one please help me on this? Thanks novice user wrote: To my surprise, only one output value of mapper is not reaching combiner. and It is consistent when I repeated the experimentation. Same point directly reaches reducer without going thru the combiner. I am surprised how can this happen? novice user wrote: Regarding the conclusion, I am parsing the inputs in combiner and reducer differently. For example the output value of mapper is s:d where as the output value of combiner is s,d. So, in reducer, I am assuming the input as s,d and trying to parse it. There I got the exception because it got input as s:d. I am using hadoop-17. Icouldn't get exactly what you meant by no guarantee on the number of times a combiner is run. Can you please elaborate a bit on this? Thanks Arun C Murthy-2 wrote: On Jul 1, 2008, at 4:04 AM, novice user wrote: Hi all, I have a query regarding the functionality of combiner. Is it possible to ignore combiner code for some of the outputs of mapper and directly being sent to reducer though combiner is specified in job configuration? Because, I figured out that, when I am running on large amounts of data, some of the mapper output is directly reached reducer. I am wondering how can this be possible when I have specified combiner in the job configuration. Can any one please let me know if this thing happens? Can you elaborate on how you reached the conclusion that the output of some maps isn't going through the combiner? Also, what version of hadoop are you using? hadoop-0.18 onwards there aren't guarantees on the number of times a combiner is run... Arun -- View this message in context: http://www.nabble.com/Combiner-is- optional-though-it-is-specified--tp18213887p18213887.html Sent from the Hadoop core-user mailing list archive at Nabble.com. -- View this message in context: http://www.nabble.com/Combiner-is-optional-though-it-is-specified--tp18213887p18310279.html Sent from the Hadoop core-user mailing list archive at Nabble.com.