Looks like it is something to do with the new checksum patch (hadoop-928). I may be wrong but I think it is worth taking a look at that patch.
> -----Original Message----- > From: Devaraj Das [mailto:[EMAIL PROTECTED] > Sent: Thursday, March 01, 2007 9:26 AM > To: '[email protected]' > Subject: RE: some reducers stock in copying stage > > Weird! This looks like some other problem which happened while merging the > outputs at the Reduce task. The copying stage went through fine. This > requires some more analysis. > > > -----Original Message----- > > From: Mike Smith [mailto:[EMAIL PROTECTED] > > Sent: Thursday, March 01, 2007 3:44 AM > > To: [email protected] > > Subject: Re: some reducers stock in copying stage > > > > Devaraj, > > > > After applying patch 1043 the copying problem is solved. But, I am > > getting new exceptions, but, the tasks will be finished after > reassigning > > to > > another tasktracker. So, the job gets done eventually. But, I never had > > this > > exception before applying this patch (or could it be because of chaning > > back-off time to 5 sec?): > > > > java.lang.NullPointerException > > at > > > org.apache.hadoop.fs.FSDataInputStream$Buffer.seek(FSDataInputStream.java > > :74) > > at > org.apache.hadoop.fs.FSDataInputStream.seek(FSDataInputStream.java:121) > > at org.apache.hadoop.fs.ChecksumFileSystem$FSInputChecker.readBuffer( > > ChecksumFileSystem.java:217) > > at org.apache.hadoop.fs.ChecksumFileSystem$FSInputChecker.read( > > ChecksumFileSystem.java:163) > > at org.apache.hadoop.fs.FSDataInputStream$PositionCache.read( > > FSDataInputStream.java:41) > > at java.io.BufferedInputStream.fill(BufferedInputStream.java:218) > > at java.io.BufferedInputStream.read1(BufferedInputStream.java:258) > > at java.io.BufferedInputStream.read(BufferedInputStream.java:317) > > at java.io.DataInputStream.readFully(DataInputStream.java:178) > > at java.io.DataInputStream.readFully(DataInputStream.java:152) > > at org.apache.hadoop.io.SequenceFile$UncompressedBytes.reset( > > SequenceFile.java:427) > > at org.apache.hadoop.io.SequenceFile$UncompressedBytes.access$700( > > SequenceFile.java:414) > > at > org.apache.hadoop.io.SequenceFile$Reader.nextRawValue(SequenceFile.java > > :1665) > > at > > org.apache.hadoop.io.SequenceFile$Sorter$SegmentDescriptor.nextRawValue( > > SequenceFile.java:2579) > > at org.apache.hadoop.io.SequenceFile$Sorter$MergeQueue.next( > > SequenceFile.java:2351) > > at org.apache.hadoop.io.SequenceFile$Sorter.writeFile(SequenceFile.java > > :2226) > > at org.apache.hadoop.io.SequenceFile$Sorter$MergeQueue.merge( > > SequenceFile.java:2442) > > at > org.apache.hadoop.io.SequenceFile$Sorter.merge(SequenceFile.java:2164) > > at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:270) > > at > org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:1444) > > > > java.lang.NullPointerException > > at > > > org.apache.hadoop.fs.FSDataInputStream$Buffer.seek(FSDataInputStream.java > > :74) > > at > org.apache.hadoop.fs.FSDataInputStream.seek(FSDataInputStream.java:121) > > at org.apache.hadoop.fs.ChecksumFileSystem$FSInputChecker.readBuffer( > > ChecksumFileSystem.java:217) > > at org.apache.hadoop.fs.ChecksumFileSystem$FSInputChecker.read( > > ChecksumFileSystem.java:163) > > at org.apache.hadoop.fs.FSDataInputStream$PositionCache.read( > > FSDataInputStream.java:41) > > at java.io.BufferedInputStream.fill(BufferedInputStream.java:218) > > at java.io.BufferedInputStream.read1(BufferedInputStream.java:258) > > at java.io.BufferedInputStream.read(BufferedInputStream.java:317) > > at java.io.DataInputStream.readFully(DataInputStream.java:178) > > at java.io.DataInputStream.readFully(DataInputStream.java:152) > > at org.apache.hadoop.io.SequenceFile$UncompressedBytes.reset( > > SequenceFile.java:427) > > at org.apache.hadoop.io.SequenceFile$UncompressedBytes.access$700( > > SequenceFile.java:414) > > at > org.apache.hadoop.io.SequenceFile$Reader.nextRawValue(SequenceFile.java > > :1665) > > at > > org.apache.hadoop.io.SequenceFile$Sorter$SegmentDescriptor.nextRawValue( > > SequenceFile.java:2579) > > at org.apache.hadoop.io.SequenceFile$Sorter$MergeQueue.next( > > SequenceFile.java:2351) > > at org.apache.hadoop.io.SequenceFile$Sorter.writeFile(SequenceFile.java > > :2226) > > at org.apache.hadoop.io.SequenceFile$Sorter$MergeQueue.merge( > > SequenceFile.java:2442) > > at > org.apache.hadoop.io.SequenceFile$Sorter.merge(SequenceFile.java:2164) > > at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:270) > > at > org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:1444) > > > > > > > > On 2/28/07, Mike Smith <[EMAIL PROTECTED]> wrote: > > > > > > Thanks Devaraj, patch 1042 seems to be already committed. Also, the > > system > > > never recovered even after 1 min, 300 sec, it stocked there for hours. > I > > > will try patch 1043 and also decrease the back-off time to see if > those > > help > > > > > >
