[ http://issues.apache.org/jira/browse/HADOOP-18?page=comments#action_12368835 ]
Rod Taylor commented on HADOOP-18: ---------------------------------- Finally figured it out. One temp directory was filling up (different sizes) and the fetch was aborting BUT I didn't see this in the logs for the longest time. The patch in NUTCH-143 helped track down the issue because it caused the error to be noticed be the scripts driving the code in a location close to where the error took place rather than several steps (possibly many hours) later.. Please close this bug. > Crash with multiple temp directories > ------------------------------------ > > Key: HADOOP-18 > URL: http://issues.apache.org/jira/browse/HADOOP-18 > Project: Hadoop > Type: Bug > Components: mapred > Reporter: Rod Taylor > Priority: Critical > > A brief read of the code indicated it may be possible to use multiple local > directories using something like the below: > <property> > <name>mapred.local.dir</name> > <value>/local,/local1,/local2</value> > <description>The local directory where MapReduce stores intermediate > data files. > </description> > </property> > This failed with the below exception during either the generate or update > phase (not entirely sure which). > java.lang.ArrayIndexOutOfBoundsException > at java.util.zip.CRC32.update(CRC32.java:51) > at > org.apache.nutch.fs.NFSDataInputStream$Checker.read(NFSDataInputStream.java:92) > at > org.apache.nutch.fs.NFSDataInputStream$PositionCache.read(NFSDataInputStream.java:156) > at java.io.BufferedInputStream.fill(BufferedInputStream.java:218) > at java.io.BufferedInputStream.read1(BufferedInputStream.java:256) > at java.io.BufferedInputStream.read(BufferedInputStream.java:313) > at java.io.DataInputStream.readFully(DataInputStream.java:176) > at > org.apache.nutch.io.DataOutputBuffer$Buffer.write(DataOutputBuffer.java:55) > at > org.apache.nutch.io.DataOutputBuffer.write(DataOutputBuffer.java:89) > at org.apache.nutch.io.SequenceFile$Reader.next(SequenceFile.java:378) > at org.apache.nutch.io.SequenceFile$Reader.next(SequenceFile.java:301) > at org.apache.nutch.io.SequenceFile$Reader.next(SequenceFile.java:323) > at > org.apache.nutch.mapred.SequenceFileRecordReader.next(SequenceFileRecordReader.java:60) > at > org.apache.nutch.segment.SegmentReader$InputFormat$1.next(SegmentReader.java:80) > at org.apache.nutch.mapred.MapTask$2.next(MapTask.java:106) > at org.apache.nutch.mapred.MapRunner.run(MapRunner.java:48) > at org.apache.nutch.mapred.MapTask.run(MapTask.java:116) > at > org.apache.nutch.mapred.TaskTracker$Child.main(TaskTracker.java:604) -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira