On Thu, Dec 2, 2010 at 11:01 AM, chris <oz...@web.de> wrote: > Hi, > > i would like to parse many thousand files and aggregate the counts for > the field entries related to every id. > > extract_field grep the identifier for the fields with regex. > > result = [ { extract_field("id", line) : [extract_field("field1", > line),extract_field("field2", line)]} for line in FILE ] > > result gives me. > {'a: ['0', '84']}, > {'a': ['0', '84']}, > {'b': ['1000', '83']}, > {'b': ['0', '84']}, > > i like to aggregate them for every line or maybe file and get after > the complete parsing procedure > the possibility to count the amount of ids having > 0 entries in > '83'. > > {'a: {'0':2, '84':2}} > {'b': {'1000':1,'83':1,'84':1} }
Er, what happened to the '0' for 'b'? > My current solution with mysql is really slow. Untested: # requires Python 2.7+ due to Counter from collections import defaultdict, Counter FIELDS = ["field1", "field2"] id2counter = defaultdict(Counter) for line in FILE: identifier = extract_field("id", line) counter = id2counter[identifier] for field_name in FIELDS: field_val = int(extract_field(field_name, line)) counter[field_val] += 1 print(id2counter) print(sum(1 for counter in id2counter.values() if counter[83])) Cheers, Chris -- http://blog.rebertia.com -- http://mail.python.org/mailman/listinfo/python-list