You're welcome. It seems to me that storing your dictionary in a trie is a good idea anyway. Tries are compact and very fast for many kinds of string matching queries.

Incidentally, here's an excellent implementation of tries, by the great OMNI folks:

OFTrie - Implementation of the popular trie data structure
http://www.omnigroup.com/developer/

If you can't, or would rather not, store the entire dictionary in memory, you can always split it into, for instance, 26 separate files, each containing all the words that start with a given letter. Then, once you have your query word, you can load into memory only the dictionary corresponding to the word's first letter, build a trie from that one dictionary, and do the search in logarithmic time (or not build a trie, but load that dictionary into an array, and do the search in linear time). Of course, building the trie takes time too, but if you're doing repeated queries within the same sub-dictionary, you only have to build the trie for that sub-dictionary once.

Actually, you might want to do something a little more refined than that since the number of English words that start with, say, A, is much different than the number of words that start with, say, V. If you want your sub-dictionaries to have a relatively uniform size, you could split your full dictionary into more (or less) than 26 sub- dictionaries, depending on your specific needs.

I would suggest that you write you application in a way that's independent of the specifics of how you're going to store the dictionary and do the searches. Then you're free to try different implementations and see which one is best for your needs.

Wagner

On Apr 12, 2009, at 11:55 PM, Miles wrote:

Thanks Wagner. I'm actually just needing to verify if a word is in the
dictionary, but maybe this is still a good choice? If I don't need to search for shared prefixes would I be better off using sqlite so i don't have to
load it all into memory?



On Sun, Apr 12, 2009 at 2:34 PM, WT <jrca...@gmail.com> wrote:

I you're trying to search for shared prefixes, you might want to store your data in a data structure better suited for such tasks, such as a Trie <
http://en.wikipedia.org/wiki/Trie>. Tries are fast and compact.

Implementing a Trie isn't very difficult and, since it's such a commonly used data structure for string-related tasks, you'll be able to find good
implementations on the web.

Wagner


_______________________________________________

Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
http://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com

This email sent to arch...@mail-archive.com

Reply via email to