I suspect that HDFS and/or its local disk may be full or sick

The problem occurs after a job has been running at least 10 hours -
I am too new at this to know too much about where to look to see how bad
hdfs is and could use some pointers.

There are points in the job where the reducer writes to hdfs but I believe
these are later and  one reduce tasks owns each file written. There is a
demon which clears out logs but I saw the error many times and at times the
demon did not run.

Any suggestions would be useful

This is the same machine but not when I am seeing these errors
bin/hadoop dfsadmin -report
Configured Capacity: 23115117404160 (21.02 TB)
Present Capacity: 21918568804352 (19.93 TB)
DFS Remaining: 20307713318912 (18.47 TB)
DFS Used: 1610855485440 (1.47 TB)
DFS Used%: 7.35%
Under replicated blocks: 6
Blocks with corrupt replicas: 0
Missing blocks: 0

-------------------------------------------------
Datanodes available: 6 (10 total, 4 dead)

Name: 10.2.4.30:50010
Decommission Status : Normal
Configured Capacity: 3852519567360 (3.5 TB)
DFS Used: 266415693824 (248.12 GB)
Non DFS Used: 199377129472 (185.68 GB)
DFS Remaining: 3386726744064(3.08 TB)
DFS Used%: 6.92%
DFS Remaining%: 87.91%
Last contact: Mon Nov 07 11:57:35 PST 2011



On Mon, Nov 7, 2011 at 9:20 AM, Robert Evans <ev...@yahoo-inc.com> wrote:

>  Did you mean 0.20.2?
> If so then Wow, that is a bit of a stumper.  Line 200 of BZip2Codec.java
> is the following
>
> 196:    public void write(int b) throws IOException {
> 197:      if (needsReset) {
> 198:        internalReset();
> 199:      }
> 200:      this.output.write(b);
> 201:   }
>
> So it must be that the output stream itself(this.output) is null (or this
> is null which would mean that java itself has something very wrong with
> it).  So it looks like for some reason the output stream for the spill file
> is coming back as null, but if I look at the code for IFile, where the
> output stream is created
>      ...
>       this.checksumOut = new IFileOutputStream(out);
>      ...
>       if (codec != null) {
>         this.compressor = CodecPool.getCompressor(codec);
>         this.compressor.reset();
>         this.compressedOut = codec.createOutputStream(checksumOut,
> compressor);
>     ...
>
> I don’t see any way that checksumOut could be null.  There may have been
> some sort of an optimization with in IFileOutputStream, but I really don’t
> see how.
>
> You might want to look at how full the disks are on the nodes that it is
> failing on.  You might also want to check to see if any records were output
> by these mappers at all, because this is failing on close, and it would be
> very interesting to see if anything else was output to the IFile before
> this?
>
> --Bobby Evans
>
>
> On 11/7/11 10:36 AM, "Steve Lewis" <lordjoe2...@gmail.com> wrote:
>
> 0.202 and using that API  -
>
> On Mon, Nov 7, 2011 at 8:27 AM, Robert Evans <ev...@yahoo-inc.com> wrote:
>
> What version of Hadoop are you using?
>
>
>
> On 11/5/11 11:09 AM, "Steve Lewis" <lordjoe2...@gmail.com <
> http://lordjoe2...@gmail.com> > wrote:
>
> My job is dying during a  map task write. This happened in enough task to
> kill the job although most tasks succeeded -
>
> Any ideas as to where to start diagnosing the problem
>
>
>
> Caused by: java.lang.NullPointerException
>  at
> org.apache.hadoop.io.compress.BZip2Codec$BZip2CompressionOutputStream.write(BZip2Codec.java:200)
>  at
> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:41)
>  at java.io.DataOutputStream.writeByte(DataOutputStream.java:136)
>  at org.apache.hadoop.io.WritableUtils.writeVLong(WritableUtils.java:263)
>  at org.apache.hadoop.io.WritableUtils.writeVInt(WritableUtils.java:243)
>  at org.apache.hadoop.mapred.IFile$Writer.close(IFile.java:126)
>  at
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1242)
>  at
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.access$1800(MapTask.java:648)
>  at
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer$SpillThread.run(MapTask.java:1135)
>
>
>
>


-- 
Steven M. Lewis PhD
4221 105th Ave NE
Kirkland, WA 98033
206-384-1340 (cell)
Skype lordjoe_com

Reply via email to