351MB isn't so bad. I do think the next-best idea to explore is a trie, which could use a char->Object map data structure provided by our new collections module? To the extent this data is more compact when encoded in UTF-8, it will be *much* more compact encoded in a trie.
Sean On Sat, Jan 16, 2010 at 2:29 PM, Robin Anil <robin.a...@gmail.com> wrote: > In this specific scenario. Ability to handle bigger dictionary per node > where the dictionary is load once is a big win for the dictionary > vectorizer. This in turn reduces the number of partial vector generation > passes. >