On 12/02/2010 01:49 PM, MRAB wrote:
On 02/12/2010 19:01, chris wrote:
i would like to parse many thousand files and aggregate the counts for
the field entries related to every id.
extract_field grep the identifier for the fields with regex.
result = [ { extract_field("id", line) : [extract_field("field1",
line),extract_field("field2", line)]} for line in FILE ]
i like to aggregate them for every line or maybe file and get after
the complete parsing procedure
{'a: {'0':2, '84':2}}
{'b': {'1000':1,'83':1,'84':1} }
I'm not sure what happened to b['0'] based on your initial data,
but assuming that was an oversight...
from collections import defaultdict
aggregates = defaultdict(lambda: defaultdict(int))
for entry in result:
for key, values in entry.items():
for v in values:
aggregates[key][v] += 1
Or, if you don't need the intermediate result, you can tweak
MRAB's solution and just iterate over the file(s):
aggregates = defaultdict(lambda: defaultdict(int))
for line in FILE:
key = extract_field("id", line)
aggregates[key][extract_field("field1", line)] += 1
aggregates[key][extract_field("field2", line)] += 1
or, if you're using an older version (<2.5) that doesn't provide
defaultdict, you could do something like
aggregates = {}
for line in FILE:
key = extract_field("id", line)
d = aggregates.setdefault(key, {})
for fieldname in ('field1', 'field2'):
value = extract_field(fieldname, line)
d[value] = d.get(value, 0) + 1
-tkc
--
http://mail.python.org/mailman/listinfo/python-list