hi, I am running a mapreduce job on my hadoop cluster.
I am running a 10 gigabytes data and one tiny failed task crashes the whole operation. I am up to 98% complete and throwing away all the finished data seems just like an awful waste. I'd like to save the finished data and run again only the failed ones(the remaining 2%). Is there any way to figure out the range of the splits that failed? I go to "localhost:50030" to see if I can find any useful information but I must be looking at wrong places. Could somebody help me with this problem? Below is the log of a failed task. Any information I can use? *syslog logs* Records R/W=41707/41639 2010-06-30 07:35:30,530 INFO org.apache.hadoop.streaming.PipeMapRed: Records R/W=41776/41726 2010-06-30 07:35:40,554 INFO org.apache.hadoop.streaming.PipeMapRed: Records R/W=41865/41804 2010-06-30 07:35:50,559 INFO org.apache.hadoop.streaming.PipeMapRed: Records R/W=41970/41932 2010-06-30 07:36:00,637 INFO org.apache.hadoop.streaming.PipeMapRed: Records R/W=42073/42065 2010-06-30 07:36:10,772 INFO org.apache.hadoop.streaming.PipeMapRed: Records R/W=42258/42196 2010-06-30 07:36:20,785 INFO org.apache.hadoop.streaming.PipeMapRed: Records R/W=42318/42274 2010-06-30 07:36:30,985 INFO org.apache.hadoop.streaming.PipeMapRed: Records R/W=42378/42351 2010-06-30 07:36:41,005 INFO org.apache.hadoop.streaming.PipeMapRed: Records R/W=42442/42419 2010-06-30 07:36:51,149 INFO org.apache.hadoop.streaming.PipeMapRed: Records R/W=42499/42484 2010-06-30 07:37:01,235 INFO org.apache.hadoop.streaming.PipeMapRed: Records R/W=42559/42547 2010-06-30 07:37:11,242 INFO org.apache.hadoop.streaming.PipeMapRed: Records R/W=42626/42611 2010-06-30 07:37:21,485 INFO org.apache.hadoop.streaming.PipeMapRed: Records R/W=42769/42704 2010-06-30 07:37:31,617 INFO org.apache.hadoop.streaming.PipeMapRed: Records R/W=42845/42782 2010-06-30 07:37:41,725 INFO org.apache.hadoop.streaming.PipeMapRed: Records R/W=42915/42875 2010-06-30 07:37:51,733 INFO org.apache.hadoop.streaming.PipeMapRed: Records R/W=42986/42949 2010-06-30 07:38:01,795 INFO org.apache.hadoop.streaming.PipeMapRed: Records R/W=43070/43051 2010-06-30 07:38:11,849 INFO org.apache.hadoop.streaming.PipeMapRed: Records R/W=43138/43136 2010-06-30 07:38:22,398 INFO org.apache.hadoop.streaming.PipeMapRed: Records R/W=43258/43200 2010-06-30 07:38:31,642 INFO org.apache.hadoop.streaming.PipeMapRed: MRErrorThread done 2010-06-30 07:38:31,643 INFO org.apache.hadoop.streaming.PipeMapRed: MROutputThread done 2010-06-30 07:38:31,765 INFO org.apache.hadoop.streaming.PipeMapRed: log:null R/W/S=43335/43271/0 in:7=43335/5885 [rec/s] out:7=43271/5885 [rec/s] minRecWrittenToEnableSkip_=9223372036854775807 LOGNAME=null HOST=null USER=hadoop HADOOP_USER=null last Hadoop input: |null| last tool output: |[...@d22860| Date: Wed Jun 30 07:38:31 KST 2010 java.io.IOException: Broken pipe at java.io.FileOutputStream.writeBytes(Native Method) at java.io.FileOutputStream.write(FileOutputStream.java:260) at java.io.BufferedOutputStream.write(BufferedOutputStream.java:105) at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65) at java.io.BufferedOutputStream.write(BufferedOutputStream.java:109) at java.io.DataOutputStream.write(DataOutputStream.java:90) at org.apache.hadoop.streaming.PipeMapRed.write(PipeMapRed.java:635) at org.apache.hadoop.streaming.PipeMapper.map(PipeMapper.java:105) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50) at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:36) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307) at org.apache.hadoop.mapred.Child.main(Child.java:170) 2010-06-30 07:38:31,766 INFO org.apache.hadoop.streaming.PipeMapRed: PipeMapRed failed! 2010-06-30 07:38:31,766 INFO org.apache.hadoop.streaming.PipeMapRed: PipeMapRed failed! 2010-06-30 07:38:32,028 WARN org.apache.hadoop.mapred.TaskTracker: Error running child java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 139 at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:311) at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:545) at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:132) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57) at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:36) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307) at org.apache.hadoop.mapred.Child.main(Child.java:170) 2010-06-30 07:38:32,029 INFO org.apache.hadoop.mapred.TaskRunner: Runnning cleanup for the task