On May 11, 3:12 am, Neil Cerutti <[EMAIL PROTECTED]> wrote: > On 2007-05-10, Gordon Airporte <[EMAIL PROTECTED]> wrote: > > > > > > >> For the above (abrideged) dictionary, you would generate (use a > >> fixed-width "programmers" font so the tree looks good): > > >> a > >> | > >> b > >> | > >> s > >> / \ > >> i o > >> / / \ > >> n l r > >> / / \ \ > >> t u v b->(absorbirati, crpisti) > >> / | | > >> (pelin)<-h t e->(odrije?iti, osloboditi) > >> | | > >> (pelin)<-e e->(apsolutan, apsolutni kod) > > >> As the user enter letters, you just march down the tree, printing > >> all the words held in leaf nodes held in the current node. > > > Call me dense, but how does one do this in Python - which > > doesn't have pointers? Dictionaries with dictionaries within > > dictionaries... (with each letter as the key and the its > > children as values) is going to be extremely space inefficient, > > right? > > Unfortunately, I don't know the space tradeoffs in Python > offhand. Lists and tuples make excellent trees. > > The above might be stored as follows: > > Every node is a tuple of its letter, a list of its children, and > a list of its words. So the two 'pelin' nodes would be (with 'e' > referenced in the 'h' node): > > ('h', [('e', [], ['pelin'])], ['pelin']) > > That would in turn be "stored" in the t, n, i and s nodes. > > ('s', > [('i', > [('n', > [('t', > [('h', [('e', [], ['pelin'])], ['pelin']) > [])] > [])] > []), ('o' trie (thanks Terry) omitted for my sanity)]) > > It's a lot harder to write by hand than it would be to use. > > My intuition says it shouldn't be terribly hard on resources for > for a 180K dictionary, but I could be wrong. I'm too lazy to > measure. ;) > > If it does turn out to be unreasonably huge, then you'd fall back > on solving the problem completely with a binary search returning > a range (I'm not sure of the name), which would be more expensive > at run time, but might be fast enough, and would use a minimal > amount of 'resources'. > > -- > Neil Cerutti
The problem that I see here is that each time user types a letter, all the leaf nodes in the subtree would have to be traversed again. This computation was already done for all the letters user typed before the last one. My suggestion would be to have two data structures. The first one similar to the tree mentioned above, but leaf nodes need not have Croatian translations, and each node should also have total number of leafs in both left siblings and right siblings of that node. The second data structure would be list of English:Croatian word pairs. As the user types a new letter, we just have to remove(not show), the number of leaf elements in the left and/or right siblings, from left and/or right of the second data structure. So we just have to keep track of the node we r currently at(in the tree) and the start, end index for the second data structure(basically the list to be shown to the user). By the way, both data structures could be implemented as tuple in python, for I suppose, if only lookup is needed tuple gives better performance than list. ciju -- http://mail.python.org/mailman/listinfo/python-list