[EMAIL PROTECTED] wrote:
> the next step of my job is to make limits of lenght of interposed
> sequences (if someone can help me in this way i'll apreciate a lot)
> thanx everyone.

Kent Johnson had the right approach, with regular expressions.
For a bit of optimization, use non-greedy groups.  That will
give you shorter matches.

Suppose you want no more than 10 bases between terms.  You could
use this pattern.

    a.{,10}?t.{,10}?c.{,10}?g.{,10}?


>>> import re
>>> pat = re.compile('a.{,10}t.{,10}c.{,10}g.{,10}?')
>>> m = pat.search("tcgaacccgtaaaaagctaatcg")
>>> m.group(0), m.start(0), m.end(0)
('aacccgtaaaaagctaatcg', 3, 23)
>>> 

>>> pat.search("tcgaacccgtaaaaagctaatttttttg")
<_sre.SRE_Match object at 0x9b950>
>>> pat.search("tcgaacccgtaaaaagctaattttttttg")
>>> 

If you want to know the location of each of the bases, and
you'll have less than 100 of them (I think that's the limit)
then you can use groups in the regular expression language

>>> def make_pattern(s, limit = None):
...     if limit is None:
...         t = ".*?"
...     else:
...         t = ".{,%d}?" % (limit,)
...     text = []
...     for c in s:
...         text.append("(%s)%s" % (c, t))
...     return "".join(text)
... 
>>> make_pattern("atcg")
'(a).*?(t).*?(c).*?(g).*?'
>>> make_pattern("atcg", 10)
'(a).{,10}?(t).{,10}?(c).{,10}?(g).{,10}?'
>>> pat = re.compile(make_pattern("atcg", 10))
>>> m = pat.search("tcgaacccgtaaaaagctaatttttttg")
>>> m
<_sre.SRE_Match object at 0x8ea70>
>>> m.groups()
('a', 't', 'c', 'g')
>>> for i in range(1, len("atcg")+1):
...   print m.group(i), m.start(i), m.end(i)
... 
a 3 4
t 9 10
c 16 17
g 27 28
>>> 



                                Andrew
                                [EMAIL PROTECTED]

-- 
http://mail.python.org/mailman/listinfo/python-list

Reply via email to