On 3/11/2010 3:55 AM, John Bond wrote:

Could you please reconsider how would you
work with this new one and see if my steps
are correct? If you agree with my 7-step
execution for the new regex, then:

We finally found a real bug for re.findall:

re.findall('((.a.)*)*', 'Mary has a lamb')
[('', 'Mar'), ('', ''), ('', ''), ('', 'lam'), ('', ''), ('', '')]


Cheers,

Yingjie




Nope, I'm afraid it is lack of understanding again.

The outer capturing group that you've added is matching the entirety of what's matched by the inner one (which is six matches, that you now accept). Because it only returns the last of them, it returns one thing - an empty string (that being the last thing that the inner group matched). Findall is simply returning that in each of the six return values it needs to return because of the inner one.

You just need to accept that findall (like all of re) works fine, and if it doesn't seem to do what you expect, it's because the expectation is wrong.

Cheers, JB

Just to clarify - findall is returning:

[ (only match in outer group, 1st match in inner group)
, (only match in outer group, 2nd match in inner group)
, (only match in outer group, 3rd match in inner group)
, (only match in outer group, 4th match in inner group)
, (only match in outer group, 5th match in inner group)
, (only match in outer group, 6th match in inner group)
]

Where "only match in outer group" = "6th match in inner group" owing to the way that capturing groups with repetition only return the last thing they matched.

Cheers, JB


--
http://mail.python.org/mailman/listinfo/python-list

Reply via email to