Lately I've been getting this error while running Fetcher2:

java.io.EOFException at java.io.DataInputStream.readFully(DataInputStream.java:178) at java.io.DataInputStream.readFully(DataInputStream.java:152) at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1383) at org.apache.hadoop.io.SequenceFile$Reader.(SequenceFile.java:1360) at org.apache.hadoop.io.SequenceFile$Reader.(SequenceFile.java:1283) at org.apache.hadoop.io.SequenceFile$Sorter$SegmentDescriptor.nextRawKey(SequenceFile.java:2866) at org.apache.hadoop.io.SequenceFile$Sorter$MergeQueue.merge(SequenceFile.java:2683) at org.apache.hadoop.io.SequenceFile$Sorter.merge(SequenceFile.java:2392) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.mergeParts(MapTask.java:552) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:607) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:193) at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:1760)

I'm using trunk and thus hadoop 0.15.0. All of my nodes have the same number of map/reduce tasks set. This only happens on about 1/10 of the nodes during a 100M page crawl and once the job restarts it finishes most of the time. Could this be a problem with HDFS or is it a config that I should modify?

Thanks,
Ned

Reply via email to