Re: Coordination between Mapper tasks

Owen O'Malley Fri, 20 Mar 2009 07:56:29 -0700

On Thu, Mar 19, 2009 at 6:42 PM, Stuart White <stuart.whi...@gmail.com>wrote:


>
> My process requires a large dictionary of terms (~ 2GB when loaded
> into RAM).  The terms are looked-up very frequently, so I want the
> terms memory-resident.
>
> So, the problem is, I want 3 processes (to utilize CPU), but each
> process requires ~2GB, but my nodes don't have enough memory to each
> have their own copy of the 2GB of data.  So, I need to somehow share
> the 2GB between the processes.


I would recommend using the multi-threaded map runner. Have 1 map/node and
just use 3 worker threads that all consume the input. The only disadvantage
is that it works best for cpu-heavy loads (or maps that are doing crawling,
etc.), since you only have one record reader for all three of the map
threads.

In the longer term, it might make sense to enable parallel jvm reuse in
addition to serial jvm reuse.

-- Owen

Re: Coordination between Mapper tasks

Reply via email to