On Thu, Mar 19, 2009 at 6:42 PM, Stuart White <stuart.whi...@gmail.com>wrote:

>
> My process requires a large dictionary of terms (~ 2GB when loaded
> into RAM).  The terms are looked-up very frequently, so I want the
> terms memory-resident.
>
> So, the problem is, I want 3 processes (to utilize CPU), but each
> process requires ~2GB, but my nodes don't have enough memory to each
> have their own copy of the 2GB of data.  So, I need to somehow share
> the 2GB between the processes.


I would recommend using the multi-threaded map runner. Have 1 map/node and
just use 3 worker threads that all consume the input. The only disadvantage
is that it works best for cpu-heavy loads (or maps that are doing crawling,
etc.), since you only have one record reader for all three of the map
threads.

In the longer term, it might make sense to enable parallel jvm reuse in
addition to serial jvm reuse.

-- Owen

Reply via email to