On 3/11/2010 3:55 AM, John Bond wrote:
Could you please reconsider how would you
work with this new one and see if my steps
are correct? If you agree with my 7-step
execution for the new regex, then:
We finally found a real bug for re.findall:
re.findall('((.a.)*)*', 'Mary has a lamb')
[('', 'Mar'), ('', ''), ('', ''), ('', 'lam'), ('', ''), ('', '')]
Cheers,
Yingjie
Nope, I'm afraid it is lack of understanding again.
The outer capturing group that you've added is matching the entirety
of what's matched by the inner one (which is six matches, that you now
accept). Because it only returns the last of them, it returns one
thing - an empty string (that being the last thing that the inner
group matched). Findall is simply returning that in each of the six
return values it needs to return because of the inner one.
You just need to accept that findall (like all of re) works fine, and
if it doesn't seem to do what you expect, it's because the expectation
is wrong.
Cheers, JB
Just to clarify - findall is returning:
[ (only match in outer group, 1st match in inner group)
, (only match in outer group, 2nd match in inner group)
, (only match in outer group, 3rd match in inner group)
, (only match in outer group, 4th match in inner group)
, (only match in outer group, 5th match in inner group)
, (only match in outer group, 6th match in inner group)
]
Where "only match in outer group" = "6th match in inner group" owing to
the way that capturing groups with repetition only return the last thing
they matched.
Cheers, JB
--
http://mail.python.org/mailman/listinfo/python-list