Re: Multithreaded Reducer

Todd Lipcon Fri, 10 Apr 2009 13:36:44 -0700

On Fri, Apr 10, 2009 at 12:31 PM, jason hadoop <jason.had...@gmail.com>wrote:


> Hi Sagar!
>
> There is no reason for the body of your reduce method to do more than copy
> and queue the key value set into an execution pool.
>

Agreed. You probably want to use a either a bounded queue on your execution
pool, or even a SynchronousQueue to do handoff to executor threads.
Otherwise your reducer will churn through all of its inputs at IO rate,
potentially fill up RAM, and report "100%" complete way before it's actually
complete. Something like:

BlockingQueue<Runnable> queue = new SynchronousQueue<Runnable>();
threadPool = new ThreadPoolExecutor(WORKER_THREAD_COUNT,
WORKER_THREAD_COUNT, 5, TimeUnit.SECONDS, queue);
**

>
> The close method will need to wait until the all of the items finish
> execution and potentially keep the heartbeat up with the task tracker by
> periodically reporting something. Sadly right now the reporter has to be
> grabbed from the reduce method as configure and close do not get an
> instance.
>

+1. You probably want to call reporter.progress() after each item is
processed by the worker threads.


>
> I believe the key and value objects are reused by the framework on the next
> call to reduce, so making a copy before queuing them into your thread pool
> is important.
>

+1 here too. You will definitely run into issues if you don't make a deep
copy.

-Todd

Re: Multithreaded Reducer

Reply via email to