Re: Splitting a sequence into pieces with identical elements

Tim Chase Tue, 10 Aug 2010 19:36:20 -0700

On 08/10/10 20:30, MRAB wrote:

Tim Chase wrote:

   r = re.compile(r'((.)\1*)')
   #r = re.compile(r'((\w)\1*)')


That should be \2, not \1.

Alternatively:

      r = re.compile(r'(.)\1*')

Doh, I had played with both and mis-transcribed the combinationof them into one malfunctioning regexp. My original trouble withthe 2nd one was that r.findall() (not .finditer) was onlyreturning the first letter of each because that's what wasmatched. Wrapping it in the extra set of parens and using "\2"returned the actual data in sub-tuples:


>>> s = 'spppammmmegggssss'
>>> import re
>>> r = re.compile(r'(.)\1*')
>>> r.findall(s) # no repeated text, just the initial letter
['s', 'p', 'a', 'm', 'e', 'g', 's']
>>> [m.group(0) for m in r.finditer(s)]
['s', 'ppp', 'a', 'mmmm', 'e', 'ggg', 'ssss']
>>> r = re.compile(r'((.)\2*)')
>>> r.findall(s)

[('s', 's'), ('ppp', 'p'), ('a', 'a'), ('mmmm', 'm'), ('e', 'e'),('ggg', 'g'), ('ssss', 's')]

>>> [m.group(0) for m in r.finditer(s)]
['s', 'ppp', 'a', 'mmmm', 'e', 'ggg', 'ssss']

By then changing to .finditer() it made them both work the way Iwanted.


Thanks for catching my mistranscription.

-tkc



--
http://mail.python.org/mailman/listinfo/python-list

Re: Splitting a sequence into pieces with identical elements

Reply via email to