Re: [Tutor] Python re without string consumption

2007-01-25 Thread Terry Carroll
On Thu, 25 Jan 2007, Jacob Abraham wrote:

>I would like to thank you for the solution and
> the helper funtion that I have written is as follows. 

That's very similar to a solution I coded up for a friend, who was doing 
searches in genetic sequences, and had exactly the same problem  you did.  
My solution:

def myfindall(regex, seq):
   resultlist=[]
   pos=0

   while True:
  result = regex.search(seq, pos)
  if result is None:
 break
  resultlist.append(seq[result.start():result.end()])
  pos = result.start()+1
   return resultlist

Using it:

>>> rexp=re.compile("B.B")
>>> sequence="BABBEBIB"
>>> print myfindall(rexp,sequence)
['BAB', 'BEB', 'BIB']

> But I do hope that future versions of Python include a regular
> expression syntax to handle such cases...

My tongue-in-cheek recommendation was that re.findall be renamed to 
re.findmost.



___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Python re without string consumption

2007-01-25 Thread Kent Johnson
Jacob Abraham wrote:
> Hi Danny Yoo,
> 
>I would like to thank you for the solution and
> the helper funtion that I have written is as follows. But I do hope
> that future versions of Python include a regular expression syntax to
> handle such cases simply because this method seems very process and
> memory intensive. I also notice that fall_back_len is a very crude
> solution.
> 
> def searchall(expr, text, fall_back_len=0):
> while True:
> match =  re.search(expr, text)
> if not match:
> break
> yield match
> end = match.end()
> text = text[end-fall_back_len:]
> 
> for match in searchall("abca", "abcabcabca", 1):
>print match.group()

The string slicing is not needed. The search() method for a compiled re 
has an optional pos parameter that tells where to start the search,. You 
can start the next search at the next position after the *start* of a 
successful search, so fall_back_len is not needed. How about this:

def searchall(expr, text):
   searchRe = re.compile(expr)
   match = searchRe.search(text)
   while match:
 yield match
 match = searchRe.search(text, match.start() + 1)

Also, if you are just finding plain text, you don't need to use regular 
expressions at all, you can use str.find():

def searchall(expr, text):
   pos = text.find(expr)
   while pos != -1:
 yield pos
 pos = text.find(expr, pos+1)

(inspired by this recipe: 
http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/499314)

Kent

___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Python re without string consumption

2007-01-25 Thread Jacob Abraham
Hi Danny Yoo,

   I would like to thank you for the solution and
the helper funtion that I have written is as follows. But I do hope
that future versions of Python include a regular expression syntax to
handle such cases simply because this method seems very process and
memory intensive. I also notice that fall_back_len is a very crude
solution.

def searchall(expr, text, fall_back_len=0):
while True:
match =  re.search(expr, text)
if not match:
break
yield match
end = match.end()
text = text[end-fall_back_len:]

for match in searchall("abca", "abcabcabca", 1):
   print match.group()

Thanks Again.

Jacob Abraham




 

Get your own web address.  
Have a HUGE year through Yahoo! Small Business.
http://smallbusiness.yahoo.com/domains/?p=BESTDEAL
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Python re without string consumption

2007-01-24 Thread Luke Paireepinart

>
> def searchall(expr, text, fall_back_len=0):
> [snip]
>
> text = text[end-fallbacklen:]
oops, those look like two different variables to me.
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Python re without string consumption

2007-01-24 Thread Jacob Abraham
Hi Danny Yoo,

   I would like to thank you for the solution and the helper funtion that I 
have written is as follows. But I do hope that future versions of Python 
include a regular expression syntax to handle such cases simply because this 
method seems very process and memory intensive. I also notice that 
fall_back_len is a very crude solution.

def searchall(expr, text, fall_back_len=0):

while True:

match =  re.search(expr, text)

if not match:

break

yield match

end = match.end()

text = text[end-fallbacklen:]



for match in searchall("abca", "abcabcabca", 1):

   print match.group()

Thanks Again.

Jacob Abraham





- Original Message 
From: Danny Yoo <[EMAIL PROTECTED]>
To: Jacob Abraham <[EMAIL PROTECTED]>
Cc: python 
Sent: Thursday, January 25, 2007 12:14:31 PM
Subject: Re: [Tutor] Python re without string consumption



On Wed, 24 Jan 2007, Jacob Abraham wrote:

>>>> import re
>>>> re.findall("abca", "abcabcabca")
> ["abca", "abca"]
>
> While I am expecting.
>
> ["abca", "abca", "abca"]


Hi Jacob,

Just to make sure: do you understand, though, why findall() won't give you 
the results you want?  The documentation on findall() says:

""" Return a list of all non-overlapping matches of pattern in string. If 
one or more groups are present in the pattern, return a list of groups; 
this will be a list of tuples if the pattern has more than one group. 
Empty matches are included in the result unless they touch the beginning 
of another match. New in version 1.5.2. Changed in version 2.4: Added the 
optional flags argument. """

It's designed not to return overlapping items.



> How do I modify my regular expression to do the same.

We can just write our own helper function to restart the match.  That is, 
we can expliciltely call search() ourselves, and pass in a new string 
that's a fragment of the old one.


Concretely,

#
>>> import re
>>> text = "abcabcabca"
>>> re.search("abca", text)
<_sre.SRE_Match object at 0x50d40>
>>> re.search("abca", text).start()
0
#

Ok, so we know the first match starts at 0.  So let's just restart the 
search, skipping that position.

##
>>> re.search("abca", text[1:])
<_sre.SRE_Match object at 0x785d0>
>>> re.search("abca", text[1:]).start()
2
##

There's our second match.  Let's continue.  We have to be careful, though, 
to make sure we're skipping the right number of characters:

###
>>> re.search("abca", text[4:]).start()
2
>>> re.search("abca", text[7:])
>>> 
###

And there are no matches after this point.


You can try writing this helper function yourself.  If you need help doing 
so, please feel free to ask the list for suggestions.





 

Do you Yahoo!?
Everyone is raving about the all-new Yahoo! Mail beta.
http://new.mail.yahoo.com
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Python re without string consumption

2007-01-24 Thread Danny Yoo


On Wed, 24 Jan 2007, Jacob Abraham wrote:

 import re
 re.findall("abca", "abcabcabca")
> ["abca", "abca"]
>
> While I am expecting.
>
> ["abca", "abca", "abca"]


Hi Jacob,

Just to make sure: do you understand, though, why findall() won't give you 
the results you want?  The documentation on findall() says:

""" Return a list of all non-overlapping matches of pattern in string. If 
one or more groups are present in the pattern, return a list of groups; 
this will be a list of tuples if the pattern has more than one group. 
Empty matches are included in the result unless they touch the beginning 
of another match. New in version 1.5.2. Changed in version 2.4: Added the 
optional flags argument. """

It's designed not to return overlapping items.



> How do I modify my regular expression to do the same.

We can just write our own helper function to restart the match.  That is, 
we can expliciltely call search() ourselves, and pass in a new string 
that's a fragment of the old one.


Concretely,

#
>>> import re
>>> text = "abcabcabca"
>>> re.search("abca", text)
<_sre.SRE_Match object at 0x50d40>
>>> re.search("abca", text).start()
0
#

Ok, so we know the first match starts at 0.  So let's just restart the 
search, skipping that position.

##
>>> re.search("abca", text[1:])
<_sre.SRE_Match object at 0x785d0>
>>> re.search("abca", text[1:]).start()
2
##

There's our second match.  Let's continue.  We have to be careful, though, 
to make sure we're skipping the right number of characters:

###
>>> re.search("abca", text[4:]).start()
2
>>> re.search("abca", text[7:])
>>> 
###

And there are no matches after this point.


You can try writing this helper function yourself.  If you need help doing 
so, please feel free to ask the list for suggestions.
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor