On 2007-05-11, Terry Reedy <[EMAIL PROTECTED]> wrote: > > "Neil Cerutti" <[EMAIL PROTECTED]> wrote in message > news:[EMAIL PROTECTED] >| Every node is a tuple of its letter, a list of its children, and >| a list of its words. So the two 'pelin' nodes would be (with 'e' >| referenced in the 'h' node): >| >| ('h', [('e', [], ['pelin'])], ['pelin']) >| >| That would in turn be "stored" in the t, n, i and s nodes. > [snip] > > At the outer level, I would use a list in order to build the > structure in pieces, one for each letter, and then add them in. > > At the top level, the letters do not need explicit storage. > The first subtree is for words starting with 'a', etc. In > other words, the position of each subtree indicates its > starting letter. For most letters, this can be carried on > another level.
Here's a proof of concept prototype of the dictionary searching algorithm. The dictionary I found to test with has roughly 5,000 words, and I didn't attempt any optimizations at all. I'm guessing it can't be sped up very much, even though I wrote the simplest possible implementation. Lookup in the worst case is O(N), where N is the number of letters in your prefix. First, a sample of input/output. lookup: pe people: la gente[Noun] pepper: pepe[Noun] peppercorn: grano di pepe[Noun] peppermint: menta peperita[Noun] peppery: pepato, acre, pungente[Adjective] pepsin: pepsina[Noun] permutation: permutazione[Noun] permutations: permutazioni[Noun] permute: permutare[Verb] perorate: perorare perpendicular: perpendicolare[Adjective] perpendicularly: perpendicolarmente[Adverb] perpendiculars: perpendicolari[Noun] perpetrate: perpetrare[Verb] perpetrated: perpetrato[Verb] perpetual: perpetuo[Adjective] petard: petardo[Noun] petroleum: petrolio[Noun] Now the source. # Each node is a tuple of (letter, node list, word list). A word # list is just a list of strings. root = ('', [], []) def insert(node, key, value): if len(key) == 0: node[2].append(value) else: first = key[0] rest = key[1:] for n in node[1]: if n[0] == first: insert(n, rest, value) return node[1].append((first, [], [])) insert(node, key, value) def lookup(node, key): if len(key) == 0: return node else: first = key[0] rest = key[1:] for v in node[1]: if v[0] == first: return lookup(v, rest) return None def word_list(node, word): for v in node[2]: print '%s: %s' % (word, v) for n in node[1]: word_list(n, word+n[0]) # Italian.txt from http://www.june29.com/IDP/ source = open('Italian.txt', 'r') # Build tree for line in source: line = line.strip() if line[0] == '#': continue key, value = line.split('\t') insert(root, key, value) # Look up a prefix x = raw_input("lookup: ") tree = lookup(root, x) if tree: word_list(tree, x) else: print "Not found." -- Neil Cerutti -- http://mail.python.org/mailman/listinfo/python-list