On Mon, 5 Mar 2007, Steve Litt apparently wrote: 
> In preparation to create my index for my book, I created a Ruby program to 
> list every word in a file (in this case the .lyx file).  
...
> My program, which is written in Ruby, is licensed GNU GPL 
> version 2, and is included as the remainder of the body of 
> this document. Have fun with it. 

There are two obvious reasons not to GPL this.
1. Existing public domain code does the same thing.
2. Even if that were false, it is unfriendly to encumber
   trivial scripts with licensing restrictions of any sort.

Therefore I hope we can agree that the code belongs in the 
public domain.

In any case, just for purposes of comparison, I wondered 
what a Python equivalent would look like.  Cleaner I think.
(See below. I tried to mimic the design decisions.)

Cheers,
Alan Isaac

 
=============================================================

import sys

punct=set([",", ".", "/", "<", ">", "?", ";", "'", ":", '"', "[", "]", "{", 
"}", "|", "`", "~", "!", "@", "#", "$", "%", "^", "&", "*", "(", ")", "_", "+"])

word_hash = dict()
for line in sys.stdin:
        line.strip()
        temparr = line.split()
        for word in temparr:
                while len(word) > 0 and word[0] in punct:
                        word = word[1:]
                while len(word) > 0 and word[-1] in punct:
                        word = word[:-1] 
                word_hash[word] = word_hash.get(word,0) + 1

print "================================================="
print "=============== ALPHA ORDER ====================="
print "================================================="

keys = sorted(word_hash.iterkeys())
for key in keys:
        print "%24s %6d\n"%(key, word_hash[key])

print "================================================="
print "================ FREQ ORDER ====================="
print "================================================="

temparray = sorted(word_hash.iteritems(), cmp=lambda a,b: 
cmp((-a[1],a[0].lower()),(-b[1],b[0].lower())))
for word_freq in temparray:
        print "%7d   %s\n"%( word_freq[1], word_freq[0] )




Reply via email to