On 08/10/10 20:30, MRAB wrote:
Tim Chase wrote:
   r = re.compile(r'((.)\1*)')
   #r = re.compile(r'((\w)\1*)')

That should be \2, not \1.

Alternatively:

      r = re.compile(r'(.)\1*')

Doh, I had played with both and mis-transcribed the combination of them into one malfunctioning regexp. My original trouble with the 2nd one was that r.findall() (not .finditer) was only returning the first letter of each because that's what was matched. Wrapping it in the extra set of parens and using "\2" returned the actual data in sub-tuples:

>>> s = 'spppammmmegggssss'
>>> import re
>>> r = re.compile(r'(.)\1*')
>>> r.findall(s) # no repeated text, just the initial letter
['s', 'p', 'a', 'm', 'e', 'g', 's']
>>> [m.group(0) for m in r.finditer(s)]
['s', 'ppp', 'a', 'mmmm', 'e', 'ggg', 'ssss']
>>> r = re.compile(r'((.)\2*)')
>>> r.findall(s)
[('s', 's'), ('ppp', 'p'), ('a', 'a'), ('mmmm', 'm'), ('e', 'e'), ('ggg', 'g'), ('ssss', 's')]
>>> [m.group(0) for m in r.finditer(s)]
['s', 'ppp', 'a', 'mmmm', 'e', 'ggg', 'ssss']

By then changing to .finditer() it made them both work the way I wanted.

Thanks for catching my mistranscription.

-tkc



--
http://mail.python.org/mailman/listinfo/python-list

Reply via email to