Thus spoke Preben Randhol (on 2006-06-17 23:25): > The code is a very good starting point for me! I already > managed to change it and I see I need to make it a bit more robust.
I think, the only thing you have to look at - is the congruence of the regex-based filter rule and the text. suppose you have a text: Apples 34 23 Apples, 234 Lemons 4 Eggs (note the comma!) and some rules: ... '(apples) Apples', 'Apples (apples)', ... the former program would handle that alright after changing the variable assignment from: if k.group(1): varname[k.group(1)] = <value> to if k.group(1): varname[k.group(1)] += <value> (note the +=) and the result would be: 'apples = 57'. It would add up all values corresponding to one variable. aehhm ... would be so in Perl, but Python throws another stone at you: - you have to explicitly instantiate a dictionary value (with 0) if/before you want in-place add to it (why is that?) - you can't have a return value from a regex object auto-converted to a number even if it _is_ a number and smells like a number (???) with these two nasty surprises, out extractor-loop looks like this: for rule in filter: k = re.search(r'\((.+)\)', rule) # pull out variable names ->k if k.group(1): # pull their values from text if not varname.has_key(k.group(1)): varname[k.group(1)] = 0; varname[k.group(1)] += float( \ re.search( re.sub(r'\((.+)\)', varscanner, rule), \ example ).group(1) ) # use regex in modified 'rule' whereas the in Perl-loop, only + to += would change for (@filter) { $v = $1 if s/\((.+)\)/$varscanner/; # pull out variable names ->$1 $varname{$v} += $1 if $example =~ /$_/; # pull their values from text } I'll add the complete python program which handles all cases you mentioned. Regards Mirco ==> DATA = ''' An example text file: ----------- Some text that can span some lines. Apples 34 23 Apples, 234 Lemons 4 Eggs 56 Ducks Some more text. 0.5 g butter ----------------------------------''' # data must show up before usage filter = [ # define filter table '(apples) Apples', 'Apples (apples)', '(ducks) Ducks', '(lemons) Lemons', '(eggs) Eggs', '(butter) g butter', ] varname = {} # variable names to be found in filter varscanner = r'\\b(\S+?)\\b' # expression used to extract values example = DATA # read the appended example text, import re for rule in filter: # iterate over filter rules, rules will be in 'rule' k = re.search(r'\((.+)\)', rule) # pull out variable names ->k if k.group(1): # pull their values from text if not varname.has_key(k.group(1)): varname[k.group(1)] = 0; varname[k.group(1)] += float( \ re.search( re.sub(r'\((.+)\)', varscanner, rule), \ example ).group(1) ) # use regex in modified 'rule' for key, val in varname.items(): print key, "\t= ", val # print what's found <== -- http://mail.python.org/mailman/listinfo/python-list