On Monday 07 November 2005 17:31, Kent Johnson wrote: > James Stroud wrote: > > On Monday 07 November 2005 16:18, [EMAIL PROTECTED] wrote: > >>Ya, for some reason your non-greedy "?" doesn't seem to be taking. > >>This works: > >> > >>re.sub('(.*)(00.*?01) target_mark', r'\2', your_string) > > > > The non-greedy is actually acting as expected. This is because non-greedy > > operators are "forward looking", not "backward looking". So the > > non-greedy finds the start of the first start-of-the-match it comes > > accross and then finds the first occurrence of '01' that makes the > > complete match, otherwise the greedy operator would match .* as much as > > it could, gobbling up all '01's before the last because these match '.*'. > > For example: > > > > py> rgx = re.compile(r"(00.*01) target_mark") > > py> rgx.findall('00 noise1 01 noise2 00 target 01 target_mark 00 dowhat > > 01') ['00 noise1 01 noise2 00 target 01 target_mark 00 dowhat 01'] > > py> rgx = re.compile(r"(00.*?01) target_mark") > > py> rgx.findall('00 noise1 01 noise2 00 target 01 target_mark 00 dowhat > > 01') ['00 noise1 01 noise2 00 target 01', '00 dowhat 01'] > > ??? not in my Python: > >>> rgx = re.compile(r"(00.*01) target_mark") > >>> rgx.findall('00 noise1 01 noise2 00 target 01 target_mark 00 dowhat > >>> 01') > > ['00 noise1 01 noise2 00 target 01'] > > >>> rgx = re.compile(r"(00.*?01) target_mark") > >>> rgx.findall('00 noise1 01 noise2 00 target 01 target_mark 00 dowhat > >>> 01') > > ['00 noise1 01 noise2 00 target 01'] > > Since target_mark only occurs once in the string the greedy and non-greedy > match is the same in this case.
Somehow my cutting and pasting got messed up. It should be: py> rgx = re.compile(r"(00.*?01) target_mark") py> rgx.findall('00 noise1 01 noise2 00 target 01 target_mark 00 dowhat 01 target_mark') ['00 noise1 01 noise2 00 target 01', '00 dowhat 01'] py> rgx = re.compile(r"(00.*01) target_mark") py> rgx.findall('00 noise1 01 noise2 00 target 01 target_mark 00 dowhat 01 target_mark') ['00 noise1 01 noise2 00 target 01 target_mark 00 dowhat 01'] Sorry about that. James -- James Stroud UCLA-DOE Institute for Genomics and Proteomics Box 951570 Los Angeles, CA 90095 http://www.jamesstroud.com/ -- http://mail.python.org/mailman/listinfo/python-list