351MB isn't so bad.

I do think the next-best idea to explore is a trie, which could use a
char->Object map data structure provided by our new collections
module? To the extent this data is more compact when encoded in UTF-8,
it will be *much* more compact encoded in a trie.

Sean

On Sat, Jan 16, 2010 at 2:29 PM, Robin Anil <robin.a...@gmail.com> wrote:
> In this specific scenario. Ability to handle bigger dictionary per node
> where the dictionary is load once is a big win for the dictionary
> vectorizer. This in turn reduces the number of partial vector generation
> passes.
>

Reply via email to