hi, all,
when I import data into the HTable with a Map/Reduce job, the task runs
smoothly until the last reducer failed 6 times to report its status.
In my program I use batchupdate to collect for every 1000 cells, and
update the status. I don't think the normal inserting will cost 10 minutes,
coz the first 99% of the job was smoothly done, only the very last reducer
will get the "fail to report status" error.
I doubt the problem is caused by regoinserver is way too busy, which
causes the "output.collect(k, bu);" takes too much time to return. but I am
not sure coz I don't know which regionserver is actually committing the
update.
So which log should I dig into? any hint is appreciated.
My code's submitting portion is as follows(just copied from some online
source and changed a little):
public void reduce(ImmutableBytesWritable k,
Iterator<HbaseMapWritable<byte[], byte[]>> v,
OutputCollector<ImmutableBytesWritable, BatchUpdate> output,
Reporter r) throws IOException {
while (v.hasNext()) {
r.setStatus("Reducer begin committing row: " + new
String(k.get(), HConstants.UTF8_ENCODING) + " Time:"+ new Date());
BatchUpdate bu = new BatchUpdate(k.get());
int cellCnt = 0;
while (v.hasNext()) {
HbaseMapWritable<byte[], byte[]> hmw = v.next();
Iterator<Entry<byte[], byte[]>> iter =
hmw.entrySet().iterator();
while(iter.hasNext()){
Entry<byte[], byte[]> e = iter.next();
bu.put(e.getKey(), e.getValue());
//System.err.println("now add cell: "+e+" cell
count: " + cellCnt + new Date());
if(++cellCnt>1000){
* output.collect(k, bu); //this line coz timeout.
*
* r.setStatus("Reducer done committing " + new
String(e.getKey(), HConstants.UTF8_ENCODING) + ":"+new String(e.getValue(),
HConstants.UTF8_ENCODING)+ " Time:"+ new Date());*
bu = new BatchUpdate(k.get());
cellCnt = 0;
}
}
}
if(cellCnt>0){
output.collect(k, bu);
}
}
the regionserver log is as follows:
2010-03-05 08:24:07,268 INFO org.apache.hadoop.mapred.TaskTracker:
attempt_201002112137_0858_r_000066_0 0.33333334% reduce > sort
2010-03-05 08:24:07,788 INFO org.apache.hadoop.mapred.TaskTracker:
attempt_201002112137_0858_r_000066_0 0.6666667% reduce > reduce
2010-03-05 08:24:10,795 INFO org.apache.hadoop.mapred.TaskTracker:
attempt_201002112137_0858_r_000066_0 0.66668% Reducer done committing
queries:tempurpedic complaints:3:3:3:3 Time:Fri Mar 05 08:24:10 GMT 2010 >
reduce
2010-03-05 08:24:13,806 INFO org.apache.hadoop.mapred.TaskTracker:
attempt_201002112137_0858_r_000066_0 0.6714777% Reducer done committing
queries:modeling agencies:1:1:1:1 Time:Fri Mar 05 08:24:13 GMT 2010 >
reduce
2010-03-05 08:24:16,810 INFO org.apache.hadoop.mapred.TaskTracker:
attempt_201002112137_0858_r_000066_0 0.6714777% Reducer done committing
queries:polaris accessories:2:2:2:2 Time:Fri Mar 05 08:24:16 GMT 2010 >
reduce
2010-03-05 08:24:19,832 INFO org.apache.hadoop.mapred.TaskTracker:
attempt_201002112137_0858_r_000066_0 0.6714777% Reducer done committing
queries:what was martin luther king jr s pants size:1:1:1:1 Time:Fri Mar 05
08:24:19 GMT 2010 > reduce
2010-03-05 08:24:22,838 INFO org.apache.hadoop.mapred.TaskTracker:
attempt_201002112137_0858_r_000066_0 0.6714777% Reducer done committing
queries:nj npr stations:2:2:2:2 Time:Fri Mar 05 08:24:22 GMT 2010 > reduce
2010-03-05 08:24:25,842 INFO org.apache.hadoop.mapred.TaskTracker:
attempt_201002112137_0858_r_000066_0 0.6714777% Reducer done committing
queries:stacked hair gallery:5:5:5:5 Time:Fri Mar 05 08:24:25 GMT 2010 >
reduce
2010-03-05 08:24:28,877 INFO org.apache.hadoop.mapred.TaskTracker:
attempt_201002112137_0858_r_000066_0 0.6714777% Reducer done committing
queries:school bus endorsement test:2:2:2:2 Time:Fri Mar 05 08:24:28 GMT
2010 > reduce
2010-03-05 08:24:31,919 INFO org.apache.hadoop.mapred.TaskTracker:
attempt_201002112137_0858_r_000066_0 0.6714777% Reducer done committing
queries:spectacular from pretty ricky pictures:2:2:2:2 Time:Fri Mar 05
08:24:31 GMT 2010 > reduce
2010-03-05 08:24:34,926 INFO org.apache.hadoop.mapred.TaskTracker:
attempt_201002112137_0858_r_000066_0 0.6714777% Reducer done committing
queries:sample tax return:2:2:2:2 Time:Fri Mar 05 08:24:34 GMT 2010 >
reduce
2010-03-05 08:24:37,929 INFO org.apache.hadoop.mapred.TaskTracker:
attempt_201002112137_0858_r_000066_0 0.7544105% Reducer begin committing
row: cp_2010/01/18 Time:Fri Mar 05 08:24:37 GMT 2010 > reduce
2010-03-05 08:24:40,933 INFO org.apache.hadoop.mapred.TaskTracker:
attempt_201002112137_0858_r_000066_0 0.7544105% Reducer begin committing
row: cp_2010/01/18 Time:Fri Mar 05 08:24:37 GMT 2010 > reduce
2010-03-05 08:34:43,448 INFO org.apache.hadoop.mapred.TaskTracker:
attempt_201002112137_0858_r_000066_0: Task
attempt_201002112137_0858_r_000066_0 failed to report status for 602
seconds. Killing!
2010-03-05 08:34:43,458 INFO org.apache.hadoop.mapred.TaskTracker: Process
Thread Dump: lost task