Task attempt failed to report status

steven zhuang Fri, 05 Mar 2010 01:12:36 -0800

hi, all,

     when I import data into the HTable with a Map/Reduce job, the task runs
smoothly until the last reducer failed 6 times to report its status.
     In my program I use batchupdate to collect for every 1000 cells, and
update the status. I don't think the normal inserting will cost 10 minutes,
coz the first 99% of the job was smoothly done, only the very last reducer
will get the "fail to report status" error.


     I doubt the problem is caused by regoinserver is way too busy, which
causes the "output.collect(k, bu);" takes too much time to return. but I am
not sure coz I don't know which regionserver is actually committing the
update.

    So which log should I dig into? any hint is appreciated.

    My code's  submitting portion is as follows(just copied from some online
source and changed a little):

        public void reduce(ImmutableBytesWritable k,
                Iterator<HbaseMapWritable<byte[], byte[]>> v,
                OutputCollector<ImmutableBytesWritable, BatchUpdate> output,
                Reporter r) throws IOException {
            while (v.hasNext()) {
                r.setStatus("Reducer begin committing row: " + new
String(k.get(), HConstants.UTF8_ENCODING) + "  Time:"+ new Date());
                BatchUpdate bu = new BatchUpdate(k.get());
                int cellCnt = 0;
                while (v.hasNext()) {
                    HbaseMapWritable<byte[], byte[]> hmw = v.next();
                    Iterator<Entry<byte[], byte[]>> iter =
 hmw.entrySet().iterator();
                    while(iter.hasNext()){
                        Entry<byte[], byte[]> e = iter.next();
                        bu.put(e.getKey(), e.getValue());
                        //System.err.println("now add cell: "+e+" cell
count: " + cellCnt + new Date());
                        if(++cellCnt>1000){
*                            output.collect(k, bu); //this line coz timeout.
*
*                            r.setStatus("Reducer done committing " + new
String(e.getKey(), HConstants.UTF8_ENCODING) + ":"+new String(e.getValue(),
HConstants.UTF8_ENCODING)+ "  Time:"+ new Date());*
                            bu = new BatchUpdate(k.get());
                            cellCnt = 0;
                        }
                    }
                }
                if(cellCnt>0){
                    output.collect(k, bu);
                }
            }


the regionserver log is as follows:

2010-03-05 08:24:07,268 INFO org.apache.hadoop.mapred.TaskTracker:
attempt_201002112137_0858_r_000066_0 0.33333334% reduce > sort
2010-03-05 08:24:07,788 INFO org.apache.hadoop.mapred.TaskTracker:
attempt_201002112137_0858_r_000066_0 0.6666667% reduce > reduce
2010-03-05 08:24:10,795 INFO org.apache.hadoop.mapred.TaskTracker:
attempt_201002112137_0858_r_000066_0 0.66668% Reducer done committing
queries:tempurpedic complaints:3:3:3:3  Time:Fri Mar 05 08:24:10 GMT 2010 >
reduce
2010-03-05 08:24:13,806 INFO org.apache.hadoop.mapred.TaskTracker:
attempt_201002112137_0858_r_000066_0 0.6714777% Reducer done committing
queries:modeling agencies:1:1:1:1  Time:Fri Mar 05 08:24:13 GMT 2010 >
reduce
2010-03-05 08:24:16,810 INFO org.apache.hadoop.mapred.TaskTracker:
attempt_201002112137_0858_r_000066_0 0.6714777% Reducer done committing
queries:polaris accessories:2:2:2:2  Time:Fri Mar 05 08:24:16 GMT 2010 >
reduce
2010-03-05 08:24:19,832 INFO org.apache.hadoop.mapred.TaskTracker:
attempt_201002112137_0858_r_000066_0 0.6714777% Reducer done committing
queries:what was martin luther king jr s pants size:1:1:1:1  Time:Fri Mar 05
08:24:19 GMT 2010 > reduce
2010-03-05 08:24:22,838 INFO org.apache.hadoop.mapred.TaskTracker:
attempt_201002112137_0858_r_000066_0 0.6714777% Reducer done committing
queries:nj npr stations:2:2:2:2  Time:Fri Mar 05 08:24:22 GMT 2010 > reduce
2010-03-05 08:24:25,842 INFO org.apache.hadoop.mapred.TaskTracker:
attempt_201002112137_0858_r_000066_0 0.6714777% Reducer done committing
queries:stacked hair gallery:5:5:5:5  Time:Fri Mar 05 08:24:25 GMT 2010 >
reduce
2010-03-05 08:24:28,877 INFO org.apache.hadoop.mapred.TaskTracker:
attempt_201002112137_0858_r_000066_0 0.6714777% Reducer done committing
queries:school bus endorsement test:2:2:2:2  Time:Fri Mar 05 08:24:28 GMT
2010 > reduce
2010-03-05 08:24:31,919 INFO org.apache.hadoop.mapred.TaskTracker:
attempt_201002112137_0858_r_000066_0 0.6714777% Reducer done committing
queries:spectacular from pretty ricky pictures:2:2:2:2  Time:Fri Mar 05
08:24:31 GMT 2010 > reduce
2010-03-05 08:24:34,926 INFO org.apache.hadoop.mapred.TaskTracker:
attempt_201002112137_0858_r_000066_0 0.6714777% Reducer done committing
queries:sample tax return:2:2:2:2  Time:Fri Mar 05 08:24:34 GMT 2010 >
reduce
2010-03-05 08:24:37,929 INFO org.apache.hadoop.mapred.TaskTracker:
attempt_201002112137_0858_r_000066_0 0.7544105% Reducer begin committing
row: cp_2010/01/18  Time:Fri Mar 05 08:24:37 GMT 2010 > reduce
2010-03-05 08:24:40,933 INFO org.apache.hadoop.mapred.TaskTracker:
attempt_201002112137_0858_r_000066_0 0.7544105% Reducer begin committing
row: cp_2010/01/18  Time:Fri Mar 05 08:24:37 GMT 2010 > reduce
2010-03-05 08:34:43,448 INFO org.apache.hadoop.mapred.TaskTracker:
attempt_201002112137_0858_r_000066_0: Task
attempt_201002112137_0858_r_000066_0 failed to report status for 602
seconds. Killing!
2010-03-05 08:34:43,458 INFO org.apache.hadoop.mapred.TaskTracker: Process
Thread Dump: lost task

Task attempt failed to report status

Reply via email to