On Fri, Apr 10, 2009 at 12:31 PM, jason hadoop <jason.had...@gmail.com>wrote:
> Hi Sagar! > > There is no reason for the body of your reduce method to do more than copy > and queue the key value set into an execution pool. > Agreed. You probably want to use a either a bounded queue on your execution pool, or even a SynchronousQueue to do handoff to executor threads. Otherwise your reducer will churn through all of its inputs at IO rate, potentially fill up RAM, and report "100%" complete way before it's actually complete. Something like: BlockingQueue<Runnable> queue = new SynchronousQueue<Runnable>(); threadPool = new ThreadPoolExecutor(WORKER_THREAD_COUNT, WORKER_THREAD_COUNT, 5, TimeUnit.SECONDS, queue); ** > > The close method will need to wait until the all of the items finish > execution and potentially keep the heartbeat up with the task tracker by > periodically reporting something. Sadly right now the reporter has to be > grabbed from the reduce method as configure and close do not get an > instance. > +1. You probably want to call reporter.progress() after each item is processed by the worker threads. > > I believe the key and value objects are reused by the framework on the next > call to reduce, so making a copy before queuing them into your thread pool > is important. > +1 here too. You will definitely run into issues if you don't make a deep copy. -Todd