On 10 Nov 2005 10:43:04 -0800, [EMAIL PROTECTED] wrote: >This can be faster, it avoids doing the same things more times: > >from string import maketrans, ascii_lowercase, ascii_uppercase > >def create_words(afile): > stripper = """'[",;<>{}_&?!():[]\.=+-*\t\n\r^%0123456789/""" > mapper = maketrans(stripper + ascii_uppercase, > " "*len(stripper) + ascii_lowercase) good way to prepare for split
> countDict = {} > for line in afile: > for w in line.translate(mapper).split(): > if w: I suspect it's not possible to get '' in the list from somestring.split() > if w in countDict: > countDict[w] += 1 > else: > countDict[w] = 1 does that beat the try and get versions? I.e., (untested) try: countDict[w] += 1 except KeyError: countDict[w] = 1 or countDict[w] = countDict.get(w, 0) + 1 > word_freq = countDict.items() > word_freq.sort() > for word, freq in word_freq: > print word, freq > >create_words(file("test.txt")) > > >If you can load the whole file in memory then it can be made a little >faster... > >Bear hugs, >bearophile > Regards, Bengt Richter -- http://mail.python.org/mailman/listinfo/python-list