It's a trie? Yeah that could be a big win. It gets tricky with Unicode, but imagine there is a lot of gain even so. "Bigrams over 11M terms" jumped out too as a place to start. (I don't see any particular backwards compatibility issue with Lucene 3 to even worry about.)
- Vectorization, dictionary size, OpenObjectIntHashMap and O... Grant Ingersoll
- Re: Vectorization, dictionary size, OpenObjectIntHash... Sean Owen
- Re: Vectorization, dictionary size, OpenObjectInt... Ted Dunning
- Re: Vectorization, dictionary size, OpenObjec... Sean Owen
- Re: Vectorization, dictionary size, OpenO... Robin Anil
- Re: Vectorization, dictionary size, ... Grant Ingersoll
- Re: Vectorization, dictionary si... David Arthur
- Re: Vectorization, dictionar... Sean Owen
- Re: Vectorization, dictionar... Grant Ingersoll
- Re: Vectorization, dictionary size, OpenO... Grant Ingersoll
- Re: Vectorization, dictionary size, OpenObjec... Grant Ingersoll
