You know, Miles, I've been thinking about something else you asked earlier, about storing the trie on disk and loading it that way, rather than load the data first and build the trie afterwards.

A trie is a tree structure, and so is a plist, so I think you could combine both and save time in the loading, in the processing after loading, and in the searching. Build the trie off-line and save it as a (binary? xml?) plist, then load it directly with NSArray's arrayWithContentsOfFile.

More specifically, using the example from <http://en.wikipedia.org/wiki/Trie >, you'd create a plist off-line with the contents:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd ">
<plist version="1.0">
<array>
    <dict>
        <key>t</key>
        <array>
            <dict>
                <key>o</key>
                <string>to</string>  <-- you could drop this (see text)
            </dict>
            <dict>
                <key>e</key>
                <array>
                    <dict>
                        <key>a</key>
<string>tea</string> <-- you could drop this (see text)
                    </dict>
                    <dict>
                        <key>d</key>
<string>ted</string> <-- you could drop this (see text)
                    </dict>
                    <dict>
                        <key>n</key>
<string>ten</string> <-- you could drop this (see text)
                    </dict>
                </array>
            </dict>
        </array>
    </dict>
    <dict>
        <key>A</key>
        <string>A</string>  <-- you could drop this (see text)
    </dict>
    <dict>
        <key>i</key>
        <array>
            <dict>
                <key>n</key>
                <array>
                    <dict>
                        <key>n</key>
<string>inn</string> <-- you could drop this (see text)
                    </dict>
                </array>
            </dict>
        </array>
    </dict>
</array>
</plist>

The rationale is that each node is an array of dictionaries. Each dictionary's key is a letter and the associated value is either a string or another array (representing another node). Thus, in the example from the Wikipedia page, the root node is an array of 3 dictionaries. The first has the key "t" and its value is an array, the second has the key "A" and its value is the string "A", and the third dictionary has the key "i" and its value is another array. And so on.

(You might even not have the string values at all, in which case each key is either associated with an array, or with a boolean value that tells whether the key in question terminates a valid word). The actual strings stored in the dictionary are then the concatenation of keys. This is more uniform and saves space, which means it will save on loading time too. Also, since the 26 letters will be cached by the OS, it won't matter how many keys you have. The cost of storing them is just the cost of storing their pointers.)

Once you create this plist off-line for your entire dictionary, you can load it with NSArray's arrayWithContentsOfFile and you have your trie already in memory. All you need then is a search function, which essentially traverses the tree, concatenating keys and comparing them with the target word.

I am not completely sure that this will save you time in the end, and holding the entire trie in memory might be a problem, but it might be worth a try. Only you can decide.

You might also want to use OMNI's trie implementation <http://www.omnigroup.com/developer/ >. That will save you both coding and debugging time.

Wagner
_______________________________________________

Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
http://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com

This email sent to arch...@mail-archive.com

Reply via email to