On 1/22/2010 4:47 PM, Chris Jones wrote:
I was writing a script that counts occurrences of characters in source code 
files:

#!/usr/bin/python
import codecs
tcounters = {}
f = codecs.open('/home/gavron/git/screen/src/screen.c', 'r', "utf-8")
for uline in f:
   lline = []
   for char in uline[:-1]:
     lline += [char]

Same but slower than lline.append(char), however, this loop just uselessless copies uline[:1]

   counters = {}
   for i in set(lline):
     counters[i] = lline.count(i)

slow way to do this

   for c in counters.keys():
     if c in tcounters:
       tcounters[c] += counters[c]
     else:
       tcounters.update({c: counters[c]})

I do not see the reason for intermediate dict

   counters = {}

duplicate line

for c in tcounters.keys():
   print c, '\t', tcounters[c]

To only count ascii chars, as should be the case for C code,

achars = [0]*63
for c in open('xxx', 'c'):
  try:
    achars[ord(c)-32] += 1
  except IndexError:
    pass

for i,n in enumerate(achars)
  print chr(i), n

or sum subsets as desired.

Terry Jan Reedy

--
http://mail.python.org/mailman/listinfo/python-list

Reply via email to