It's in throwing it in the config of the Reducer, so not likely the vector, but 
it could be.

Once we went back to unigrams, the OOM in that spot went away.

On Nov 7, 2012, at 12:00 PM, Robin Anil wrote:

> Not seen the code in a while but AFAIR the reducer is not loading any
> dictionary. We chunk the dictionary to create partial vector. I think you
> just have a huge vector
> On Nov 7, 2012 10:50 AM, "Sean Owen" <[email protected]> wrote:
> 
>> It's a trie? Yeah that could be a big win. It gets tricky with Unicode, but
>> imagine there is a lot of gain even so.
>> "Bigrams over 11M terms" jumped out too as a place to start.
>> (I don't see any particular backwards compatibility issue with Lucene 3 to
>> even worry about.)
>> 

--------------------------------------------
Grant Ingersoll
http://www.lucidworks.com




Reply via email to