I am trying to parse a set of files that have a simple syntax using
RE. I'm interested in counting '$' expansions in the files, with one
minor consideration. A line becomes a comment if the first non-white
space character is a semicolon.
e.g. tests 1 and 2 should be ignored
sInput = """
; $1 test1
; test2 $2
test3 ; $3 $3 $3
test4
$5 test5
$6
test7 $7 test7
"""
Required output: ['$3', '$3', '$3', '$5', '$6', '$7']
We're interested in two things: comments and "dollar-something"s
>>> import re
>>> r_comment = re.compile(r'\s*;')
>>> r_dollar = re.compile(r'\$\d+')
Then remove comment lines and find the matching '$' expansions:
>>> [r_dollar.findall(line) for line in sInput.splitlines() if
not r_comment.match(line)]
[[], ['$3', '$3', '$3'], [], ['$5'], ['$6'], ['$7']]
Finally, roll each line's results into a single list by slightly
abusing sum()
>>> sum((r_dollar.findall(line) for line in sInput.splitlines()
if not r_comment.match(line)), [])
['$3', '$3', '$3', '$5', '$6', '$7']
Adjust the r_dollar if your variable pattern differs (such as
reverting to your previous r'\$.' pattern if you prefer, or using
r'\$\w+' for multi-character variables).
-tkc
--
http://mail.python.org/mailman/listinfo/python-list