[EMAIL PROTECTED] wrote: > I've tested that sorting just the strings instead of the tuples (and > removing the stripping) reduces the running time enough: > > def __init__(self): > numbers = '22233344455566677778889999' > conv = string.maketrans(string.lowercase, numbers) > lines = > file("/usr/share/dict/words").read().lower().splitlines() > # lines = map(str.strip, lines) > lines.sort() > self.dict = [(word.translate(conv), word) for word in lines] > > If the words file is already sorted you can skip the sorting line. > If the file contains extraneous spaces, you can strip them uncommenting > that line. >
1. Wouldn't it be a good idea to process the raw dictionary *once* and cPickle the result? 2. All responses so far seem to have missed a major point in the research paper quoted by the OP: each word has a *frequency* associated with it. When there are multiple choices (e.g. "43" -> ["he", "if", "id", ...]), the user is presented with the choices in descending frequency order. Note that if one of the sort keys is (-frequency), the actual frequency doesn't need to be retained in the prepared dictionary. 3. Anyone interested in the techniques & heuristics involved in this type of exercise might like to look at input methods for languages like Chinese -- instead of 26 letters mapped to 8 digits, you have tens of thousands of characters of wildly varying frequency mapped to e.g. 400+ Pinyin "words" entered on a "standard" keyboard. Cheers, John -- http://mail.python.org/mailman/listinfo/python-list