+1 for better dictionaries. BTW, on a related note, this is an awesome book:
http://www.amazon.com/Algorithms-Strings-Trees-Sequences-Computational/dp/0521585198 On Wed, Jul 6, 2011 at 9:41 AM, Jörn Kottmann <[email protected]> wrote: > On 7/6/11 4:38 PM, [email protected] wrote: > >> but it also consume less memory after loading. This LGPL dictionary >> library >> uses a FSA data structure that requires less memory than Hashtable to >> store >> 500k words, and also is fast enough during runtime. >> > > Yeah, it would be nice to have a better dictionary in OpenNLP, we also > discussed the usage of bloom-filters, which I believe might be good > enough for feature generation anyway in many cases. > > Jörn > -- Jason Baldridge Assistant Professor, Department of Linguistics The University of Texas at Austin http://www.jasonbaldridge.com http://twitter.com/jasonbaldridge
