date:20080706

unable to run wordcount example on two node cluster

2008-07-06 Thread us latha

Hi All,

I have the following setup
( Node 1, 2 are redhat linux 4;  Node 3,4 are redhat linux 3)

Node1 - namenode
Node2 - job tracker
Node3 - slave (data node)
Node4 - slave (data node)

was able to input some data to the datanodes and then
could find that  the data is properly getting stored on data nodes.

[NODE1]$ bin/hadoop dfs -ls

Found 3 items
drwxr-xr-x   - user1 supergroup  0 2008-07-04 07:10
/user/user1/input
drwxr-xr-x   - user1 supergroup  0 2008-07-04 09:17
/user/user1/test3
-rw-r--r-- 3 user1 supergroup   3951 2008-07-04 07:10
/user/user1/wordcount.jar


Now, am trying to run wordcount example mentioned in the link
http://hadoop.apache.org/core/docs/r0.17.0/mapred_tutorial.html

Followed steps:

1)[NODE1]$ javac -classpath ${HADOOP_HOME}/ hadoop-0.17.1-core.jar -d
wordcount_classes WordCount.java

2) [NODE1]$jar -cvf wordcount.jar -C wordcount_classes/ .

3)[NODE1]$ bin/hadoop dfs -copyFromLocal wordcount.jar wordcount.jar

4) [NODE1]$ bin/hadoop jar wordcount.jar org.myorg.WordCount input output

The output is as follows

[NODE1]$ bin/hadoop jar wordcount.jar  org.myorg.WordCount input output2
08/07/06 03:10:23 INFO mapred.FileInputFormat: Total input paths to process
: 3
08/07/06 03:10:23 INFO mapred.FileInputFormat: Total input paths to process
: 3
08/07/06 03:10:24 INFO mapred.JobClient: Running job: job_200806290715_0027
08/07/06 03:10:25 INFO mapred.JobClient:  map 0% reduce 0%

*** It hangs here forever

The log file on the node1 (namenode) says ...

java.io.IOException: Inconsistent checkpoint fileds. LV = -16 namespaceID =
315235321 cTime = 0. Expecting respectively: -16; 902613609; 0
at
org.apache.hadoop.dfs.CheckpointSignature.validateStorageInfo(CheckpointSignature.java:65)
at
org.apache.hadoop.dfs.SecondaryNameNode$CheckpointStorage.doMerge(SecondaryNameNode.java:568)
at
org.apache.hadoop.dfs.SecondaryNameNode$CheckpointStorage.access$000(SecondaryNameNode.java:464)
at
org.apache.hadoop.dfs.SecondaryNameNode.doMerge(SecondaryNameNode.java:341)
at
org.apache.hadoop.dfs.SecondaryNameNode.doCheckpoint(SecondaryNameNode.java:305)
at
org.apache.hadoop.dfs.SecondaryNameNode.run(SecondaryNameNode.java:216)
hadoop-suravako-secondarynamenode-stapj13.out 15744L, 1409088C

-

Please let me know if i am missing something and please help me resolve the
above issue.

I shall provide any specific log info if required.

Thankyou

Srilatha

Re: unable to run wordcount example on two node cluster

2008-07-06 Thread Marc Hofer

See: 
http://www.michael-noll.com/wiki/Running_Hadoop_On_Ubuntu_Linux_(Multi-Node_Cluster)

java.io.IOException: Incompatible namespaceIDs

us latha schrieb:

Please let me know if i am missing something and please help me resolve the
above issue.

I shall provide any specific log info if required.

Thankyou

Srilatha

Hadoop 0.17.0 - lots of I/O problems and can't run small datasets?

2008-07-06 Thread C G

Hi All:
 
I've got 0.17.0 set up on a 7 node grid (6 slaves w/datanodes, 1 master running 
namenode).  I'm trying to process a small (180G) dataset.  I've done this 
succesfully and painlessly running 0.15.0.  When I run 0.17.0 with the same 
data and same code (w/API changes for 0.17.0 and recompiled, of course), I get 
a ton of failures.  I've increased the number of namenode threads trying to 
resolve this, but that doesn't seem to help.  The errors are of the following 
flavor:
 
java.io.IOException: Could not get block locations. Aborting...
java.io.IOException: All datanodes 10.2.11.2:50010 are bad. Aborting...
Exception in thread Thread-2 java.util.ConcurrentModificationException
Exception closing file /blah/_temporary/_task_200807052311_0001_r_
04_0/baz/part-x
 
As things stand right now, I can't deploy to 0.17.0 (or 0.16.4 or 0.17.1).  I 
am wondering if anybody can shed some light on this, or if others are having 
similar problems.  
 
Any thoughts, insights, etc. would be greatly appreciated.
 
Thanks,
C G
 
Here's an ugly trace:
08/07/06 01:43:29 INFO mapred.JobClient:  map 100% reduce 93%
08/07/06 01:43:29 INFO mapred.JobClient: Task Id : 
task_200807052311_0001_r_03_0, Status : FAILED
java.io.IOException: Could not get block locations. Aborting...
    at 
org.apache.hadoop.dfs.DFSClient$DFSOutputStream.processDatanodeError(DFSClient.java:2080)
    at 
org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access$1300(DFSClient.java:1702)
    at 
org.apache.hadoop.dfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:1818)
task_200807052311_0001_r_03_0: Exception closing file 
/output/_temporary/_task_200807052311_0001_r_
03_0/a/b/part-3
task_200807052311_0001_r_03_0: java.io.IOException: All datanodes 
10.2.11.2:50010 are bad. Aborting...
task_200807052311_0001_r_03_0:  at 
org.apache.hadoop.dfs.DFSClient$DFSOutputStream.processDatanodeError(DFSClient.ja
va:2095)
task_200807052311_0001_r_03_0:  at 
org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access$1300(DFSClient.java:1702)
task_200807052311_0001_r_03_0:  at 
org.apache.hadoop.dfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:1
818)
task_200807052311_0001_r_03_0: Exception in thread Thread-2 
java.util..ConcurrentModificationException
task_200807052311_0001_r_03_0:  at 
java.util.TreeMap$PrivateEntryIterator.nextEntry(TreeMap.java:1100)
task_200807052311_0001_r_03_0:  at 
java.util.TreeMap$KeyIterator.next(TreeMap.java:1154)
task_200807052311_0001_r_03_0:  at 
org.apache.hadoop.dfs.DFSClient.close(DFSClient.java:217)
task_200807052311_0001_r_03_0:  at 
org.apache.hadoop.dfs.DistributedFileSystem.close(DistributedFileSystem.java:214)
task_200807052311_0001_r_03_0:  at 
org.apache.hadoop.fs.FileSystem$Cache.closeAll(FileSystem.java:1324)
task_200807052311_0001_r_03_0:  at 
org.apache.hadoop.fs.FileSystem.closeAll(FileSystem.java:224)
task_200807052311_0001_r_03_0:  at 
org.apache.hadoop.fs.FileSystem$ClientFinalizer.run(FileSystem.java:209)
08/07/06 01:44:32 INFO mapred.JobClient:  map 100% reduce 74%
08/07/06 01:44:32 INFO mapred.JobClient: Task Id : 
task_200807052311_0001_r_01_0, Status : FAILED
java.io.IOException: Could not get block locations. Aborting...
    at 
org.apache.hadoop.dfs.DFSClient$DFSOutputStream.processDatanodeError(DFSClient.java:2080)
    at 
org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access$1300(DFSClient.java:1702)
    at 
org.apache.hadoop.dfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:1818)
task_200807052311_0001_r_01_0: Exception in thread Thread-2 
java.util..ConcurrentModificationException
task_200807052311_0001_r_01_0:  at 
java.util.TreeMap$PrivateEntryIterator.nextEntry(TreeMap.java:1100)
task_200807052311_0001_r_01_0:  at 
java.util.TreeMap$KeyIterator.next(TreeMap.java:1154)
task_200807052311_0001_r_01_0:  at 
org.apache.hadoop.dfs.DFSClient.close(DFSClient.java:217)
task_200807052311_0001_r_01_0:  at 
org.apache.hadoop.dfs.DistributedFileSystem.close(DistributedFileSystem.java:214)
task_200807052311_0001_r_01_0:  at 
org.apache.hadoop.fs.FileSystem$Cache.closeAll(FileSystem.java:1324)
task_200807052311_0001_r_01_0:  at 
org.apache.hadoop.fs.FileSystem.closeAll(FileSystem.java:224)
task_200807052311_0001_r_01_0:  at 
org.apache.hadoop.fs.FileSystem$ClientFinalizer.run(FileSystem.java:209)
08/07/06 01:44:45 INFO mapred.JobClient:  map 100% reduce 54%

Re: ERROR dfs.NameNode - java.io.EOFException

2008-07-06 Thread Otis Gospodnetic

Hi,
(lohit has been helping me with this and asked me to send this detailed message 
to the list for others to see and possibly help)

Right, I saw the permission reference, too.  I assume(d) the permission
information is stored inside the edits file and that that part of the
edits is corrupt and causing the exception.

I'm using 0.16.2
(with Nutch).  I'm not upgrading.  Hadoop died while I wasn't watching
it, while running the usual Nutch MapReduce jobs.  I'm running
everything with the same user.  I'm really not making any changes. 
Hadoop just died and when I tried restarting it, NN would not start.

The first errors from the logs when things went bad look like this:

Task task_200806101759_0344_r_01_0 failed to report status for 601 seconds. 
Killing!
Task task_200806101759_0344_r_09_0 failed to report status for 600 seconds. 
Killing!
Task task_200806101759_0344_r_07_0 failed to report status for 602 seconds. 
Killing!
Task task_200806101759_0344_r_08_0 failed to report status for 601 seconds. 
Killing!
Error initializing task_200806101759_0344_r_09_1:
org.apache.hadoop.util.DiskChecker$DiskErrorException:
Could not find any valid local directory for
taskTracker/jobcache/job_200806101759_0344/job.xml
at 
org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:313)
at 
org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:124)
at 
org.apache.hadoop.mapred.TaskTracker.localizeJob(TaskTracker.java:639)
at 
org.apache.hadoop.mapred.TaskTracker.startNewTask(TaskTracker.java:1282)
at 
org.apache.hadoop.mapred.TaskTracker.offerService(TaskTracker.java:923)
at org.apache.hadoop.mapred.TaskTracker.run(TaskTracker.java:1318)
at org.apache.hadoop.mapred.TaskTracker.main(TaskTracker.java:2210)
...
...
From
the task name, it looks like it was started on 2008-06-10.  But the
errors above happened on June 23rd - 13 days later, and I am quite sure
my jobs were not taking 13 days to complete!

Then a little later in the log I see:

java.io.IOException: task_200806101759_0352_r_05_0The reduce copier failed
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:260)
at 
org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2084)


And then a little later I see this:

FSError: java.io.IOException: No space left on device
task_200806101759_0370_m_40_0: log4j:WARN No appenders could be found for 
logger (org.apache.hadoop.mapred.TaskRunner).
task_200806101759_0370_m_40_0: log4j:WARN Please initialize the log4j 
system properly.
FSError: java.io.IOException: No space left on device
FSError: java.io.IOException: No space left on device
task_200806101759_0370_m_76_0:
Exception in thread SortSpillThread org.apache.hadoop.fs.FSError:
java.io.IOException: No space left on device
task_200806101759_0370_m_76_0: 
at
org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.write(RawLocalFileSystem.java:171)
task_200806101759_0370_m_76_0:  at 
java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
task_200806101759_0370_m_76_0:  at 
java.io.BufferedOutputStream.write(BufferedOutputStream.java:109)
task_200806101759_0370_m_76_0: 
at
org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:41)
task_200806101759_0370_m_76_0:  at 
java.io.DataOutputStream.write(DataOutputStream.java:90)
task_200806101759_0370_m_76_0: 
at
org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSOutputSummer.writeChunk(ChecksumFileSystem.java:339)
task_200806101759_0370_m_76_0:  at 
org.apache.hadoop.fs.FSOutputSummer.writeChecksumChunk(FSOutputSummer.java:141)
task_200806101759_0370_m_76_0:  at 
org.apache.hadoop.fs.FSOutputSummer.flushBuffer(FSOutputSummer.java:124)
task_200806101759_0370_m_76_0:  at 
org.apache.hadoop.fs.FSOutputSummer.write(FSOutputSummer.java:58)
task_200806101759_0370_m_76_0: 
at
org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:36)
task_200806101759_0370_m_76_0:  at 
java.io.DataOutputStream.writeInt(DataOutputStream.java:183)
task_200806101759_0370_m_76_0:  at 
org.apache.hadoop.io.SequenceFile$Writer.sync(SequenceFile.java:927)
task_200806101759_0370_m_76_0:  at 
org.apache.hadoop.io.SequenceFile$Writer.checkAndWriteSync(SequenceFile.java:954)
task_200806101759_0370_m_76_0:  at 
org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:987)
task_200806101759_0370_m_76_0:  at 
org.apache.hadoop.mapred.MapTask$MapOutputBuffer.spill(MapTask.java:555)
task_200806101759_0370_m_76_0:  at 
org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpillToDisk(MapTask.java:497)
task_200806101759_0370_m_76_0:  at 
org.apache.hadoop.mapred.MapTask$MapOutputBuffer.access$200(MapTask.java:264)
task_200806101759_0370_m_76_0:

Re: Combiner is optional though it is specified?

2008-07-06 Thread novice user

I am guessing it as a bug in Hadoop-17. Because I am able to reproduce the
problem. But, I am not able to figure where exactly this can happen.
Can some one please help me on this?

Thanks

novice user wrote:

To my surprise, only one output value of mapper is not reaching combiner.
and It is consistent when I repeated the experimentation. Same point
directly reaches reducer without going thru the combiner. I am surprised
how can this happen?

novice user wrote:

Regarding the conclusion,
I am parsing the inputs in combiner and reducer differently. For
example the output value of mapper is s:d where as the output value of
combiner is s,d. So, in reducer, I am assuming the input as s,d and
trying to parse it. There I got the exception because it got input as
s:d.

I am using hadoop-17.

Icouldn't get exactly what you meant by no guarantee on the number of
times a combiner is run. Can you please elaborate a bit on this?

Thanks

Arun C Murthy-2 wrote:

On Jul 1, 2008, at 4:04 AM, novice user wrote:

Hi all,
I have a query regarding the functionality of combiner.
Is it possible to ignore combiner code for some of the outputs of
mapper and
directly being sent to reducer though combiner is specified in job
configuration?
Because, I figured out that, when I am running on large amounts of
data,
some of the mapper output is directly reached reducer. I am
wondering how
can this be possible when I have specified combiner in the job
configuration. Can any one please let me know if this thing happens?

Can you elaborate on how you reached the conclusion that the output
of some maps isn't going through the combiner?

Also, what version of hadoop are you using? hadoop-0.18 onwards there
aren't guarantees on the number of times a combiner is run...

Arun

--
View this message in context: http://www.nabble.com/Combiner-is-
optional-though-it-is-specified--tp18213887p18213887.html
Sent from the Hadoop core-user mailing list archive at Nabble.com.

--
View this message in context:
http://www.nabble.com/Combiner-is-optional-though-it-is-specified--tp18213887p18310279.html
Sent from the Hadoop core-user mailing list archive at Nabble.com.

unable to run wordcount example on two node cluster

Re: unable to run wordcount example on two node cluster

Hadoop 0.17.0 - lots of I/O problems and can't run small datasets?

Re: ERROR dfs.NameNode - java.io.EOFException

Re: Combiner is optional though it is specified?

5 matches

Site Navigation

Mail list logo

Footer information