On 12/02/2010 01:49 PM, MRAB wrote:
On 02/12/2010 19:01, chris wrote:
i would like to parse many thousand files and aggregate the counts for
the field entries related to every id.

extract_field grep the identifier for the fields with regex.

result = [ { extract_field("id", line) : [extract_field("field1",
line),extract_field("field2", line)]}  for line  in FILE ]

i like to aggregate them for every line or maybe file and get after
the complete parsing procedure

{'a: {'0':2, '84':2}}
{'b': {'1000':1,'83':1,'84':1} }

I'm not sure what happened to b['0'] based on your initial data, but assuming that was an oversight...

from collections import defaultdict

aggregates = defaultdict(lambda: defaultdict(int))
for entry in result:
      for key, values in entry.items():
          for v in values:
              aggregates[key][v] += 1

Or, if you don't need the intermediate result, you can tweak MRAB's solution and just iterate over the file(s):

  aggregates = defaultdict(lambda: defaultdict(int))
  for line in FILE:
    key = extract_field("id", line)
    aggregates[key][extract_field("field1", line)] += 1
    aggregates[key][extract_field("field2", line)] += 1

or, if you're using an older version (<2.5) that doesn't provide defaultdict, you could do something like

  aggregates = {}
  for line in FILE:
    key = extract_field("id", line)
    d = aggregates.setdefault(key, {})
    for fieldname in ('field1', 'field2'):
      value = extract_field(fieldname, line)
      d[value] = d.get(value, 0) + 1


-tkc



--
http://mail.python.org/mailman/listinfo/python-list

Reply via email to