I really don't know if there is any more I can do over email.  You might want 
to look at the metrics to see if anything out of the ordinary is happening on 
these nodes just before or just after the error happens.  Is there anything 
else in the logs that looks a little bit odd compared to the other jobs.  I 
know 10 hours of logs is a lot to go through but I really cannot think of 
anything else that could be causing this.

--Bobby Evans

On 11/7/11 2:03 PM, "Steve Lewis" <lordjoe2...@gmail.com> wrote:

I suspect that HDFS and/or its local disk may be full or sick

The problem occurs after a job has been running at least 10 hours -
I am too new at this to know too much about where to look to see how bad hdfs 
is and could use some pointers.

There are points in the job where the reducer writes to hdfs but I believe 
these are later and  one reduce tasks owns each file written. There is a demon 
which clears out logs but I saw the error many times and at times the demon did 
not run.

Any suggestions would be useful

This is the same machine but not when I am seeing these errors
bin/hadoop dfsadmin -report
Configured Capacity: 23115117404160 (21.02 TB)
Present Capacity: 21918568804352 (19.93 TB)
DFS Remaining: 20307713318912 (18.47 TB)
DFS Used: 1610855485440 (1.47 TB)
DFS Used%: 7.35%
Under replicated blocks: 6
Blocks with corrupt replicas: 0
Missing blocks: 0

-------------------------------------------------
Datanodes available: 6 (10 total, 4 dead)

Name: 10.2.4.30:50010 <http://10.2.4.30:50010>
Decommission Status : Normal
Configured Capacity: 3852519567360 (3.5 TB)
DFS Used: 266415693824 (248.12 GB)
Non DFS Used: 199377129472 (185.68 GB)
DFS Remaining: 3386726744064(3.08 TB)
DFS Used%: 6.92%
DFS Remaining%: 87.91%
Last contact: Mon Nov 07 11:57:35 PST 2011



On Mon, Nov 7, 2011 at 9:20 AM, Robert Evans <ev...@yahoo-inc.com> wrote:
Did you mean 0.20.2?
If so then Wow, that is a bit of a stumper.  Line 200 of BZip2Codec.java is the 
following

196:    public void write(int b) throws IOException {
197:      if (needsReset) {
198:        internalReset();
199:      }
200:      this.output.write(b);
201:   }

So it must be that the output stream itself(this.output) is null (or this is 
null which would mean that java itself has something very wrong with it).  So 
it looks like for some reason the output stream for the spill file is coming 
back as null, but if I look at the code for IFile, where the output stream is 
created
     ...
      this.checksumOut = new IFileOutputStream(out);
     ...
      if (codec != null) {
        this.compressor = CodecPool.getCompressor(codec);
        this.compressor.reset();
        this.compressedOut = codec.createOutputStream(checksumOut, compressor);
    ...

I don't see any way that checksumOut could be null.  There may have been some 
sort of an optimization with in IFileOutputStream, but I really don't see how.

You might want to look at how full the disks are on the nodes that it is 
failing on.  You might also want to check to see if any records were output by 
these mappers at all, because this is failing on close, and it would be very 
interesting to see if anything else was output to the IFile before this?

--Bobby Evans


On 11/7/11 10:36 AM, "Steve Lewis" <lordjoe2...@gmail.com 
<http://lordjoe2...@gmail.com> > wrote:

0.202 and using that API  -

On Mon, Nov 7, 2011 at 8:27 AM, Robert Evans <ev...@yahoo-inc.com 
<http://ev...@yahoo-inc.com> > wrote:
What version of Hadoop are you using?



On 11/5/11 11:09 AM, "Steve Lewis" <lordjoe2...@gmail.com 
<http://lordjoe2...@gmail.com>  <http://lordjoe2...@gmail.com> > wrote:

My job is dying during a  map task write. This happened in enough task to kill 
the job although most tasks succeeded -

Any ideas as to where to start diagnosing the problem



Caused by: java.lang.NullPointerException
 at 
org.apache.hadoop.io.compress.BZip2Codec$BZip2CompressionOutputStream.write(BZip2Codec.java:200)
 at 
org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:41)
 at java.io.DataOutputStream.writeByte(DataOutputStream.java:136)
 at org.apache.hadoop.io.WritableUtils.writeVLong(WritableUtils.java:263)
 at org.apache.hadoop.io.WritableUtils.writeVInt(WritableUtils.java:243)
 at org.apache.hadoop.mapred.IFile$Writer.close(IFile.java:126)
 at 
org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1242)
 at 
org.apache.hadoop.mapred.MapTask$MapOutputBuffer.access$1800(MapTask.java:648)
 at 
org.apache.hadoop.mapred.MapTask$MapOutputBuffer$SpillThread.run(MapTask.java:1135)





Reply via email to