I suspect that HDFS and/or its local disk may be full or sick The problem occurs after a job has been running at least 10 hours - I am too new at this to know too much about where to look to see how bad hdfs is and could use some pointers.
There are points in the job where the reducer writes to hdfs but I believe these are later and one reduce tasks owns each file written. There is a demon which clears out logs but I saw the error many times and at times the demon did not run. Any suggestions would be useful This is the same machine but not when I am seeing these errors bin/hadoop dfsadmin -report Configured Capacity: 23115117404160 (21.02 TB) Present Capacity: 21918568804352 (19.93 TB) DFS Remaining: 20307713318912 (18.47 TB) DFS Used: 1610855485440 (1.47 TB) DFS Used%: 7.35% Under replicated blocks: 6 Blocks with corrupt replicas: 0 Missing blocks: 0 ------------------------------------------------- Datanodes available: 6 (10 total, 4 dead) Name: 10.2.4.30:50010 Decommission Status : Normal Configured Capacity: 3852519567360 (3.5 TB) DFS Used: 266415693824 (248.12 GB) Non DFS Used: 199377129472 (185.68 GB) DFS Remaining: 3386726744064(3.08 TB) DFS Used%: 6.92% DFS Remaining%: 87.91% Last contact: Mon Nov 07 11:57:35 PST 2011 On Mon, Nov 7, 2011 at 9:20 AM, Robert Evans <ev...@yahoo-inc.com> wrote: > Did you mean 0.20.2? > If so then Wow, that is a bit of a stumper. Line 200 of BZip2Codec.java > is the following > > 196: public void write(int b) throws IOException { > 197: if (needsReset) { > 198: internalReset(); > 199: } > 200: this.output.write(b); > 201: } > > So it must be that the output stream itself(this.output) is null (or this > is null which would mean that java itself has something very wrong with > it). So it looks like for some reason the output stream for the spill file > is coming back as null, but if I look at the code for IFile, where the > output stream is created > ... > this.checksumOut = new IFileOutputStream(out); > ... > if (codec != null) { > this.compressor = CodecPool.getCompressor(codec); > this.compressor.reset(); > this.compressedOut = codec.createOutputStream(checksumOut, > compressor); > ... > > I don’t see any way that checksumOut could be null. There may have been > some sort of an optimization with in IFileOutputStream, but I really don’t > see how. > > You might want to look at how full the disks are on the nodes that it is > failing on. You might also want to check to see if any records were output > by these mappers at all, because this is failing on close, and it would be > very interesting to see if anything else was output to the IFile before > this? > > --Bobby Evans > > > On 11/7/11 10:36 AM, "Steve Lewis" <lordjoe2...@gmail.com> wrote: > > 0.202 and using that API - > > On Mon, Nov 7, 2011 at 8:27 AM, Robert Evans <ev...@yahoo-inc.com> wrote: > > What version of Hadoop are you using? > > > > On 11/5/11 11:09 AM, "Steve Lewis" <lordjoe2...@gmail.com < > http://lordjoe2...@gmail.com> > wrote: > > My job is dying during a map task write. This happened in enough task to > kill the job although most tasks succeeded - > > Any ideas as to where to start diagnosing the problem > > > > Caused by: java.lang.NullPointerException > at > org.apache.hadoop.io.compress.BZip2Codec$BZip2CompressionOutputStream.write(BZip2Codec.java:200) > at > org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:41) > at java.io.DataOutputStream.writeByte(DataOutputStream.java:136) > at org.apache.hadoop.io.WritableUtils.writeVLong(WritableUtils.java:263) > at org.apache.hadoop.io.WritableUtils.writeVInt(WritableUtils.java:243) > at org.apache.hadoop.mapred.IFile$Writer.close(IFile.java:126) > at > org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1242) > at > org.apache.hadoop.mapred.MapTask$MapOutputBuffer.access$1800(MapTask.java:648) > at > org.apache.hadoop.mapred.MapTask$MapOutputBuffer$SpillThread.run(MapTask.java:1135) > > > > -- Steven M. Lewis PhD 4221 105th Ave NE Kirkland, WA 98033 206-384-1340 (cell) Skype lordjoe_com