On Thu, Mar 19, 2009 at 6:42 PM, Stuart White <stuart.whi...@gmail.com>wrote:
> > My process requires a large dictionary of terms (~ 2GB when loaded > into RAM). The terms are looked-up very frequently, so I want the > terms memory-resident. > > So, the problem is, I want 3 processes (to utilize CPU), but each > process requires ~2GB, but my nodes don't have enough memory to each > have their own copy of the 2GB of data. So, I need to somehow share > the 2GB between the processes. I would recommend using the multi-threaded map runner. Have 1 map/node and just use 3 worker threads that all consume the input. The only disadvantage is that it works best for cpu-heavy loads (or maps that are doing crawling, etc.), since you only have one record reader for all three of the map threads. In the longer term, it might make sense to enable parallel jvm reuse in addition to serial jvm reuse. -- Owen