Not seen the code in a while but AFAIR the reducer is not loading any dictionary. We chunk the dictionary to create partial vector. I think you just have a huge vector On Nov 7, 2012 10:50 AM, "Sean Owen" <sro...@gmail.com> wrote:
> It's a trie? Yeah that could be a big win. It gets tricky with Unicode, but > imagine there is a lot of gain even so. > "Bigrams over 11M terms" jumped out too as a place to start. > (I don't see any particular backwards compatibility issue with Lucene 3 to > even worry about.) >