Re: MapReduce tasks cleanup

Harsh J Tue, 10 Jan 2012 07:03:04 -0800

Mefa,

On 10-Jan-2012, at 6:38 PM, Mefa Grut wrote:


> Two cleanup related questions:
> Can I execute context.write from the reduce/map cleanup phase?

If by cleanup, you mean the mapper/reducer cleanup methods, then the answer is 
Yes, and this has been asked previously: 
http://search-hadoop.com/m/jzO0k18XoNW1 if you want to know some random info. 
on top.

(You probably do not even seek the cleanup method, see my last para.)

> Should I expect cleanup to be killed when a task fail or killed(speculative 
> execution)?

I don't understand this question.

If your task fails, then it fails right there. Your cleanup() method won't even 
be called, since your task would exit with whatever error it ran into. And 
kills (user-killed or speculative-killed) are pure kills, so your task may die 
out immediately when such a signal is issued.

> The idea is to update HBase counters from within mapreduce job (kind of 
> alternative to the builtin mapreduce counters that can scale to millions of 
> counters). 
> Since tak can fail and run again or be duplicated and killed  events can be 
> incremented too many times. How Hadoop workaround this problem with the 
> generic counters? 

In Hadoop, the counters are added only from successful tasks (i.e. tasks that 
have been 'committed' by the framework, via the OutputCommitter).

I think, for your case, it'd be better if you did the final committing with a 
custom impl. of OutputCommitter. But unfortunately the output stream is not 
available inside the FOC, so you'd have to probably hack around a bit to get 
your outputs to HBase in the end. But there may surely be other, possibly 
better solutions :)

A good idea would be to also ask this specific issue on the HBase's user lists, 
so you reach the right audience.

Re: MapReduce tasks cleanup

Reply via email to