On Mon, Jan 20, 2014 at 9:44 PM, km <srikrishnamo...@gmail.com> wrote: >>>> p = re.compile('(CAA)+?(TCT)+?(TA)+?') >>>> p.findall('CAACAACAATCTTCTTCTTCTTATATA') > [('CAA', 'TCT', 'TA')] > > But I instead find only one instance of the CAA/TCT/TA in that order. > How can I get 3 matches of CAA, followed by four matches of TCT followed by > 2 matches of TA ? > Well these patterns (CAA/TCT/TA) can occur any number of times and atleast > once so I have to use + in the regex.
You're capturing the single instance, not the repeated one. It is matching against all three CAA units, but capturing just the first. Try this: >>> p = re.compile('((?:CAA)+)((?:TCT)+)((?:TA)+)') >>> p.findall('CAACAACAATCTTCTTCTTCTTATATA') [('CAACAACAA', 'TCTTCTTCTTCT', 'TATATA')] This groups "CAA" with non-capturing parentheses (?:regex) and then captures that with the + around it. ChrisA -- https://mail.python.org/mailman/listinfo/python-list