Tom Carrick <[EMAIL PROTECTED]> writes: > In my attempted learning of python, I've decided to recode an old > anagram solving program I made in C++. The C++ version runs in less > than a second, while the python takes 30 seconds.
Indeed, your program can be improved to run about ten times as fast, which (on my system, with 96274 entries in /usr/share/dict/words) is below a second. In general you should try to move the loops into C code, i.e. use built-in functions instead of long 'for' blocks. Some comments on your version: > import string > > # Need a function to convert a string to a list to be > # able to use the sort() function > def string2list(s): [snipped] list() achieves the same thing a lot faster. > words = [] You do not need to initialize 'words' here, as you're overwriting it a few lines afterwards. > found = [] > > anagram = raw_input("Find anagrams of word: ") > > f = open('words.txt', 'r') > file = f.read() > f.close() > > words = file.splitlines() Try to avoid assigning to the names of built-in functions if you can. Names like 'file', 'list', 'dict', 'map' etc. are often an obvious choice, but overwriting them means that you don't "just" know what a later use refers to. > sorted_anagram = anagram.lower() > sorted_anagram = string2list(anagram) > sorted_anagram.sort(lambda x, y: cmp(x, y)) Unless you *really* have to, don't use comparison functions with sort(), as they slow the operation considerably. In this (as in most) cases, a plain sorted_anagram.sort() does the trick, and in version 2.4 you can achieve custom sort orders with the optional 'key' argument. The sorted() built-in also comes in handy here. > while words: > if len(words[0]) == len(sorted_anagram): > wordlist = string2list(words[0]) > wordlist.sort(lambda x, y: cmp(x, y)) > sorted_wordlist = wordlist > if sorted_anagram == sorted_wordlist: > found.append(words[0]) > del words[0] Avoid this style of looping at all times! Removing the first element of a list is O(n), so looping through the whole list as above is O(n**2). In most cases you should use a for loop: for word in words: # do something which is O(n) of course. If you do have to loop destructively, pop() from the end (which is the default) like so: while words: word = words.pop() # do something This is also O(n), because removing the *last* element of a list is O(1) (amortized; I suppose the implementation will occasionally shrink the underlying array at linear cost). > print "Anagrams of " + anagram + ": " > while found: > print found[0] + " " > del found[0] I assume you meant not to print a newline between the words, which 'print' does by default. The best solution in that case is " ".join(found). A better version (2.4+ only): -- 8< -- 8< -- anagram = raw_input("Find anagrams of word: ") words = open('words.txt', 'r') sorted_anagram = sorted(anagram.lower()) found = [] for word in words.read().splitlines(): if len(word) == len(anagram) and sorted(word) == sorted_anagram: found.append(word) print "Anagrams of %s: %s" % (anagram, ' '.join(found)) -- >8 -- >8 -- Interestingly, the length comparison makes quite a difference! I removed it at first, thinking it was unnecessary. Here are some timings: * Your original version (for comparison): $ time echo stop | python2.4 anagram_slow.py [...] real 0m9.090s user 0m8.790s sys 0m0.013s * Your version, but with the O(n**2) loop replaced by an O(n) 'for': $ time echo stop | python2.4 anagram_forloop.py [...] real 0m0.221s user 0m0.134s sys 0m0.014s * My version but with the length comparison removed: $ time echo stop | python2.4 anagram_no_lencmp.py [...] real 0m0.408s user 0m0.353s sys 0m0.010s * My version as above: $ time echo stop | python2.4 anagram_fast.py [...] real 0m0.144s user 0m0.099s sys 0m0.008s Hope that helps :-) - Thomas -- If you want to reply by mail, substitute my first and last name for 'foo' and 'bar', respectively, and remove '.invalid'. -- http://mail.python.org/mailman/listinfo/python-list