Re: [Tutor] regular expression question

2009-04-28 Thread Kent Johnson
On Tue, Apr 28, 2009 at 4:03 AM, Kelie  wrote:
> Hello,
>
> The following code returns 'abc123abc45abc789jk'. How do I revise the pattern 
> so
> that the return value will be 'abc789jk'? In other words, I want to find the
> pattern 'abc' that is closest to 'jk'. Here the string '123', '45' and '789' 
> are
> just examples. They are actually quite different in the string that I'm 
> working
> with.
>
> import re
> s = 'abc123abc45abc789jk'
> p = r'abc.+jk'
> lst = re.findall(p, s)
> print lst[0]

re.findall() won't work because it finds non-overlapping matches.

If there is a character in the initial match which cannot occur in the
middle section, change .+ to exclude that character. For example,
r'abc[^a]+jk' works with your example.

Another possibility is to look for the match starting at different
locations, something like this:
p = re.compile(r'abc.+jk')
lastMatch = None
i = 0
while i < len(s):
  m = p.search(s, i)
  if m is None:
break
  lastMatch = m.group()
  i = m.start() + 1

print lastMatch

Kent
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] regular expression question

2009-04-28 Thread Kent Johnson
2009/4/28 Marek spociń...@go2.pl,Poland :

>> import re
>> s = 'abc123abc45abc789jk'
>> p = r'abc.+jk'
>> lst = re.findall(p, s)
>> print lst[0]
>
> I suggest using r'abc.+?jk' instead.
>
> the additional ? makes the preceeding '.+' non-greedy so instead of matching 
> as long string as it can it matches as short string as possible.

Did you try it? It doesn't do what you expect, it still matches at the
beginning of the string.

The re engine searches for a match at a location and returns the first
one it finds. A non-greedy match doesn't mean "Find the shortest
possible match anywhere in the string", it means, "find the shortest
possible match starting at this location."

Kent
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] regular expression question

2009-04-28 Thread Kelie
spir  free.fr> writes:

> To avoid that, use non-grouping parens (?:...). This also avoids the need for
parens around the whole format:
> p = Pattern(r'abc(?:(?!abc).)+jk')
> print p.findall(s)
> ['abc789jk']
> 
> Denis


This one works! Thank you Denis. I'll try it out on the actual much longer
(multiline) string and see what happens.

___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] regular expression question

2009-04-28 Thread Kelie
Andre Engels  gmail.com> writes:

> 
> 2009/4/28 Marek Spociński  go2.pl,Poland  10g.pl>:

> > I suggest using r'abc.+?jk' instead.
> >

> 
> That was my first idea too, but it does not work for this case,
> because Python will still try to _start_ the match as soon as
> possible. 

yeah, i tried the '?' as well and realized it would not work.


___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] regular expression question

2009-04-28 Thread spir
Le Tue, 28 Apr 2009 11:06:16 +0200,
Marek spociń...@go2.pl,  Poland  s'exprima ainsi:

> > Hello,
> > 
> > The following code returns 'abc123abc45abc789jk'. How do I revise the
> > pattern so that the return value will be 'abc789jk'? In other words, I
> > want to find the pattern 'abc' that is closest to 'jk'. Here the string
> > '123', '45' and '789' are just examples. They are actually quite
> > different in the string that I'm working with. 
> > 
> > import re
> > s = 'abc123abc45abc789jk'
> > p = r'abc.+jk'
> > lst = re.findall(p, s)
> > print lst[0]
> 
> I suggest using r'abc.+?jk' instead.
> 
> the additional ? makes the preceeding '.+' non-greedy so instead of
> matching as long string as it can it matches as short string as possible.

Non-greedy repetition will not work in this case, I guess:

from re import compile as Pattern
s = 'abc123abc45abc789jk'
p = Pattern(r'abc.+?jk')
print p.match(s).group()
==>
abc123abc45abc789jk

(Someone explain why?)

My solution would be to explicitely exclude 'abc' from the sequence of chars 
matched by '.+'. To do this, use negative lookahead (?!...) before '.':
p = Pattern(r'(abc((?!abc).)+jk)')
print p.findall(s)
==>
[('abc789jk', '9')]

But it's not exactly what you want. Because the internal () needed to express 
exclusion will be considered by findall as a group to be returned, so that you 
also get the last char matched in there.
To avoid that, use non-grouping parens (?:...). This also avoids the need for 
parens around the whole format:
p = Pattern(r'abc(?:(?!abc).)+jk')
print p.findall(s)
['abc789jk']

Denis
--
la vita e estrany
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] regular expression question

2009-04-28 Thread Marek Spociński , Poland
Dnia 28 kwietnia 2009 11:16 Andre Engels  napisał(a):
> 2009/4/28 Marek spociń...@go2.pl,Poland :
> >> Hello,
> >>
> >> The following code returns 'abc123abc45abc789jk'. How do I revise the 
> >> pattern so
> >> that the return value will be 'abc789jk'? In other words, I want to find 
> >> the
> >> pattern 'abc' that is closest to 'jk'. Here the string '123', '45' and 
> >> '789' are
> >> just examples. They are actually quite different in the string that I'm 
> >> working
> >> with.
> >>
> >> import re
> >> s = 'abc123abc45abc789jk'
> >> p = r'abc.+jk'
> >> lst = re.findall(p, s)
> >> print lst[0]
> >
> > I suggest using r'abc.+?jk' instead.
> >
> > the additional ? makes the preceeding '.+' non-greedy so instead of 
> > matching as long string as it can it matches as short string as possible.
> 
> That was my first idea too, but it does not work for this case,
> because Python will still try to _start_ the match as soon as
> possible. To use .+? one would have to revert the string, then use the
> reverse regular expression on the result, which looks like a rather
> roundabout way of doing things.

I don't have access to python right now so i cannot test my ideas...
And i don't really want to give you wrong idea too.
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] regular expression question

2009-04-28 Thread Andre Engels
2009/4/28 Marek spociń...@go2.pl,Poland :
>> Hello,
>>
>> The following code returns 'abc123abc45abc789jk'. How do I revise the 
>> pattern so
>> that the return value will be 'abc789jk'? In other words, I want to find the
>> pattern 'abc' that is closest to 'jk'. Here the string '123', '45' and '789' 
>> are
>> just examples. They are actually quite different in the string that I'm 
>> working
>> with.
>>
>> import re
>> s = 'abc123abc45abc789jk'
>> p = r'abc.+jk'
>> lst = re.findall(p, s)
>> print lst[0]
>
> I suggest using r'abc.+?jk' instead.
>
> the additional ? makes the preceeding '.+' non-greedy so instead of matching 
> as long string as it can it matches as short string as possible.

That was my first idea too, but it does not work for this case,
because Python will still try to _start_ the match as soon as
possible. To use .+? one would have to revert the string, then use the
reverse regular expression on the result, which looks like a rather
roundabout way of doing things.



-- 
André Engels, andreeng...@gmail.com
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] regular expression question

2009-04-28 Thread =?UTF-8?Q?Marek_Spoci=C5=84ski
> Hello,
> 
> The following code returns 'abc123abc45abc789jk'. How do I revise the pattern 
> so
> that the return value will be 'abc789jk'? In other words, I want to find the
> pattern 'abc' that is closest to 'jk'. Here the string '123', '45' and '789' 
> are
> just examples. They are actually quite different in the string that I'm 
> working
> with. 
> 
> import re
> s = 'abc123abc45abc789jk'
> p = r'abc.+jk'
> lst = re.findall(p, s)
> print lst[0]

I suggest using r'abc.+?jk' instead.

the additional ? makes the preceeding '.+' non-greedy so instead of matching as 
long string as it can it matches as short string as possible.


___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] regular expression question

2005-04-07 Thread Danny Yoo


> I wonder if anyone can help me with an RE. I also wonder if there is an
> RE mailing list anywhere - I haven't managed to find one.

Hi Debbie,

I haven't found one either.  There appear to be a lot of good resources
here:

http://dmoz.org/Computers/Programming/Languages/Regular_Expressions/

> I'm trying to use this regular expression to delete particular strings
> from a file before tokenising it.

Why not tokenize the file first, and then drop the strings with a period?
You may not need to do all your tokenization at once.  Can you do it in
phases?


> I want to delete all strings that have a full stop (period) when it is
> not at the beginning or end of a word, and also when it is not followed
> by a closing bracket.

Let's make sure we're using the same concepts.  By "string", do you mean
"word"?  That is, if we have something like:

 "I went home last Thursday."

do you expect the regular expression to match against the whole thing?

 "I went home last Thursday."

Or do you expect it to match against the specific end word?

 "Thursday."

I'm just trying to make sure we're using the same terms.  How specific
do you want your regular expression to be?



Going back to your question:

> I want to delete all strings that have a full stop (period) when it is
> not at the beginning or end of a word, and also when it is not followed
> by a closing bracket.

from a first glance, I think you're looking for a "lookahead assertion":

http://www.amk.ca/python/howto/regex/regex.html#SECTION00054




> I want to delete file names (eg. fileX.doc), and websites (when www/http
> not given) but not file extensions (eg. this is in .jpg format). I also
> don't want to delete the last word of each sentence just because it
> precedes a fullstop, or if there's a fullstop followed by a closing
> bracket.

Does this need to be part of the same regular expression?



There are a lot of requirements here: can we encode this in some kind of
test class, so that we're sure we're hitting all your requirements?

Here's what I think you're looking for so far, written in terms of a unit
test:


##
import unittest

class DebbiesRegularExpressionTest(unittest.TestCase):
def setUp(self):
self.fullstopRe = re.compile("... fill me in")

def testRecognizingEndWord(self):
self.assertEquals(
["Thursday."],
self.fullstopRe.findall("I went home last Thursday."))

def testEndWordWithBracket(self):
self.assertEquals(
["bar."],
self.fullstopRe.findall("[this is foo.] bar. licious"))

if __name__ == '__main__':
unittest.main()
##

If these tests don't match with what you want, please feel free to edit
and add more to them so that we can be more clear about what you want.



Best of wishes to you!

___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] regular expression question

2005-04-07 Thread Kent Johnson
D Elliott wrote:
I wonder if anyone can help me with an RE. I also wonder if there is an 
RE mailing list anywhere - I haven't managed to find one.

I'm trying to use this regular expression to delete particular strings 
from a file before tokenising it.

I want to delete all strings that have a full stop (period) when it is 
not at the beginning or end of a word, and also when it is not followed 
by a closing bracket. I want to delete file names (eg. fileX.doc), and 
websites (when www/http not given) but not file extensions (eg. this is 
in .jpg format). I also don't want to delete the last word of each 
sentence just because it precedes a fullstop, or if there's a fullstop 
followed by a closing bracket.

fullstopRe = re.compile (r'\S+\.[^)}]]+')
There are two problems with this is:
- The ] inside the [] group must be escaped like this: [^)}\]]
- [^)}\]] matches any whitespace so it will match on the ends of words
It's not clear from your description if the closing bracket must immediately follow the full stop or 
if it can be anywhere after it. If you want it to follow immediately then use
\S+\.[^)}\]\s]\S*

If you want to allow the bracket anywhere after the stop you must force the match to go to a word 
boundary otherwise you will match foo.bar when the word is foo.bar]. I think this works:
(\S+\.[^)}\]\s]+)(\s)

but you have to include the second group in your substitution string.
BTW C:\Python23\pythonw.exe C:\Python24\Tools\Scripts\redemo.py is very helpful with questions like 
this...

Kent
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] regular expression question

2005-03-09 Thread Kent Johnson
Mike Hall wrote:
A simple example will show what I mean:
 >>> import re
 >>> x = re.compile(r"(A) | (B)")
 >>> s = "X R A Y B E"
 >>> r = x.sub("13", s)
 >>> print r
X R 13Y13 E
...so unless I'm understanding it wrong, "B" is supposed to be ignored 
if "A" is matched, yet I get both matched.  I get the same result if I 
put "A" and "B" within the same group.
The problem is with your use of sub(), not with |.
By default, re.sub() substitutes *all* matches. If you just want to substitute the first match, 
include  the optional count parameter:

 >>> import re
 >>> s = "X R A Y B E"
 >>> re.sub(r"(A) | (B)", '13', s)
'X R 13Y13 E'
 >>> re.sub(r"(A) | (B)", '13', s, 1)
'X R 13Y B E'
BTW, there is a very handy interactive regex tester that comes with Python. On Windows, it is 
installed at
C:\Python23\Tools\Scripts\redemo.py

Kent
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] regular expression question

2005-03-09 Thread Mike Hall
but yeah, it
seems you're expecting it to examine the string as a whole.
I guess I was, good point.

On Mar 9, 2005, at 12:28 PM, Liam Clarke wrote:
Actually, you should get that anyway...
"""
|
Alternation, or the ``or'' operator. If A and B are regular
expressions, A|B will match any string that matches either "A" or "B".
| has very low precedence in order to make it work reasonably when
you're alternating multi-character strings. Crow|Servo will match
either "Crow" or "Servo", not "Cro", a "w" or an "S", and "ervo".
"""
So, for each letter in that string, it's checking to see if any letter
matches 'A' or 'B' ...
the engine steps through one character at a time.
sorta like -
for letter in s:
 if letter == 'A':
#Do some string stuff
 elif letter == 'B':
#do some string stuff
i.e.
k = ['A','B', 'C', 'B']
for i in range(len(k)):
if k[i] == 'A' or k[i]=='B':
   k[i]==13
print k
[13, 13, 'C', 13]
You can limit substitutions using an optional argument, but yeah, it
seems you're expecting it to examine the string as a whole.
Check out the example here -
http://www.amk.ca/python/howto/regex/ 
regex.html#SECTION00032

Also
http://www.regular-expressions.info/alternation.html
Regards,
Liam Clarke
On Thu, 10 Mar 2005 09:09:13 +1300, Liam Clarke <[EMAIL PROTECTED]>  
wrote:
Hi Mike,
Do you get the same results for a search pattern of 'A|B'?
On Wed, 9 Mar 2005 11:11:57 -0800, Mike Hall
<[EMAIL PROTECTED]> wrote:
I'm having some strange results using the "or" operator.  In every  
test
I do I'm matching both sides of the "|" metacharacter, not one or the
other as all documentation says it should be (the parser supposedly
scans left to right, using the first match it finds and ignoring the
rest). It should only go beyond the "|" if there was no match found
before it, no?

Correct me if I'm wrong, but your regex is saying "match dog, unless
it's followed by cat. if it is followed by cat there is no match on
this side of the "|" at which point we advance past it and look at  
the
alternative expression which says to match in front of cat."

However, if I run a .sub using your regex on a string contain both  
dog
and cat, both will be replaced.

A simple example will show what I mean:
import re
x = re.compile(r"(A) | (B)")
s = "X R A Y B E"
r = x.sub("13", s)
print r
X R 13Y13 E
...so unless I'm understanding it wrong, "B" is supposed to be  
ignored
if "A" is matched, yet I get both matched.  I get the same result if  
I
put "A" and "B" within the same group.

On Mar 8, 2005, at 6:47 PM, Danny Yoo wrote:

Regular expressions are a little evil at times; here's what I think
you're
thinking of:
###
import re
pattern = re.compile(r"""dog(?!cat)
...| (?<=dogcat)""", re.VERBOSE)
pattern.match('dogman').start()
0
pattern.search('dogcatcher').start()

Hi Mike,
Gaaah, bad copy-and-paste.  The example with 'dogcatcher' actually  
does
come up with a result:

###
pattern.search('dogcatcher').start()
6
###
Sorry about that!
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor
--
'There is only one basic human right, and that is to do as you damn  
well please.
And with it comes the only basic human duty, to take the consequences.


--
'There is only one basic human right, and that is to do as you damn  
well please.
And with it comes the only basic human duty, to take the consequences.
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] regular expression question

2005-03-09 Thread Liam Clarke
Oops 

I mean 
 for i in range(len(k)):
   i f k[i] == 'A' or k[i]=='B':
k[i ]= 13



On Thu, 10 Mar 2005 09:28:59 +1300, Liam Clarke <[EMAIL PROTECTED]> wrote:
> Actually, you should get that anyway...
> 
> """
> |
> Alternation, or the ``or'' operator. If A and B are regular
> expressions, A|B will match any string that matches either "A" or "B".
> | has very low precedence in order to make it work reasonably when
> you're alternating multi-character strings. Crow|Servo will match
> either "Crow" or "Servo", not "Cro", a "w" or an "S", and "ervo".
> """
> 
> So, for each letter in that string, it's checking to see if any letter
> matches 'A' or 'B' ...
> the engine steps through one character at a time.
> sorta like -
> 
> for letter in s:
>  if letter == 'A':
> #Do some string stuff
>  elif letter == 'B':
> #do some string stuff
> 
> i.e.
> 
> k = ['A','B', 'C', 'B']
> 
> for i in range(len(k)):
> if k[i] == 'A' or k[i]=='B':
>k[i]==13
> 
> print k
> 
> [13, 13, 'C', 13]
> 
> You can limit substitutions using an optional argument, but yeah, it
> seems you're expecting it to examine the string as a whole.
> 
> Check out the example here -
> http://www.amk.ca/python/howto/regex/regex.html#SECTION00032
> 
> Also
> 
> http://www.regular-expressions.info/alternation.html
> 
> Regards,
> 
> Liam Clarke
> 
> 
> On Thu, 10 Mar 2005 09:09:13 +1300, Liam Clarke <[EMAIL PROTECTED]> wrote:
> > Hi Mike,
> >
> > Do you get the same results for a search pattern of 'A|B'?
> >
> >
> > On Wed, 9 Mar 2005 11:11:57 -0800, Mike Hall
> > <[EMAIL PROTECTED]> wrote:
> > > I'm having some strange results using the "or" operator.  In every test
> > > I do I'm matching both sides of the "|" metacharacter, not one or the
> > > other as all documentation says it should be (the parser supposedly
> > > scans left to right, using the first match it finds and ignoring the
> > > rest). It should only go beyond the "|" if there was no match found
> > > before it, no?
> > >
> > > Correct me if I'm wrong, but your regex is saying "match dog, unless
> > > it's followed by cat. if it is followed by cat there is no match on
> > > this side of the "|" at which point we advance past it and look at the
> > > alternative expression which says to match in front of cat."
> > >
> > > However, if I run a .sub using your regex on a string contain both dog
> > > and cat, both will be replaced.
> > >
> > > A simple example will show what I mean:
> > >
> > >  >>> import re
> > >  >>> x = re.compile(r"(A) | (B)")
> > >  >>> s = "X R A Y B E"
> > >  >>> r = x.sub("13", s)
> > >  >>> print r
> > > X R 13Y13 E
> > >
> > > ...so unless I'm understanding it wrong, "B" is supposed to be ignored
> > > if "A" is matched, yet I get both matched.  I get the same result if I
> > > put "A" and "B" within the same group.
> > >
> > >
> > > On Mar 8, 2005, at 6:47 PM, Danny Yoo wrote:
> > >
> > > >
> > > >
> > > >>
> > > >> Regular expressions are a little evil at times; here's what I think
> > > >> you're
> > > >> thinking of:
> > > >>
> > > >> ###
> > > > import re
> > > > pattern = re.compile(r"""dog(?!cat)
> > > >> ...| (?<=dogcat)""", re.VERBOSE)
> > > > pattern.match('dogman').start()
> > > >> 0
> > > > pattern.search('dogcatcher').start()
> > > >
> > > >
> > > >
> > > > Hi Mike,
> > > >
> > > > Gaaah, bad copy-and-paste.  The example with 'dogcatcher' actually does
> > > > come up with a result:
> > > >
> > > > ###
> > >  pattern.search('dogcatcher').start()
> > > > 6
> > > > ###
> > > >
> > > > Sorry about that!
> > > >
> > >
> > > ___
> > > Tutor maillist  -  Tutor@python.org
> > > http://mail.python.org/mailman/listinfo/tutor
> > >
> >
> > --
> > 'There is only one basic human right, and that is to do as you damn well 
> > please.
> > And with it comes the only basic human duty, to take the consequences.
> >
> 
> --
> 'There is only one basic human right, and that is to do as you damn well 
> please.
> And with it comes the only basic human duty, to take the consequences.
> 


-- 
'There is only one basic human right, and that is to do as you damn well please.
And with it comes the only basic human duty, to take the consequences.
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] regular expression question

2005-03-09 Thread Liam Clarke
Actually, you should get that anyway...

"""
|
Alternation, or the ``or'' operator. If A and B are regular
expressions, A|B will match any string that matches either "A" or "B".
| has very low precedence in order to make it work reasonably when
you're alternating multi-character strings. Crow|Servo will match
either "Crow" or "Servo", not "Cro", a "w" or an "S", and "ervo".
"""

So, for each letter in that string, it's checking to see if any letter
matches 'A' or 'B' ...
the engine steps through one character at a time.
sorta like - 

for letter in s:
 if letter == 'A':
#Do some string stuff
 elif letter == 'B':
#do some string stuff


i.e. 

k = ['A','B', 'C', 'B']

for i in range(len(k)):
if k[i] == 'A' or k[i]=='B':
   k[i]==13

print k

[13, 13, 'C', 13]

You can limit substitutions using an optional argument, but yeah, it
seems you're expecting it to examine the string as a whole.


Check out the example here - 
http://www.amk.ca/python/howto/regex/regex.html#SECTION00032

Also

http://www.regular-expressions.info/alternation.html

Regards, 

Liam Clarke


On Thu, 10 Mar 2005 09:09:13 +1300, Liam Clarke <[EMAIL PROTECTED]> wrote:
> Hi Mike,
> 
> Do you get the same results for a search pattern of 'A|B'?
> 
> 
> On Wed, 9 Mar 2005 11:11:57 -0800, Mike Hall
> <[EMAIL PROTECTED]> wrote:
> > I'm having some strange results using the "or" operator.  In every test
> > I do I'm matching both sides of the "|" metacharacter, not one or the
> > other as all documentation says it should be (the parser supposedly
> > scans left to right, using the first match it finds and ignoring the
> > rest). It should only go beyond the "|" if there was no match found
> > before it, no?
> >
> > Correct me if I'm wrong, but your regex is saying "match dog, unless
> > it's followed by cat. if it is followed by cat there is no match on
> > this side of the "|" at which point we advance past it and look at the
> > alternative expression which says to match in front of cat."
> >
> > However, if I run a .sub using your regex on a string contain both dog
> > and cat, both will be replaced.
> >
> > A simple example will show what I mean:
> >
> >  >>> import re
> >  >>> x = re.compile(r"(A) | (B)")
> >  >>> s = "X R A Y B E"
> >  >>> r = x.sub("13", s)
> >  >>> print r
> > X R 13Y13 E
> >
> > ...so unless I'm understanding it wrong, "B" is supposed to be ignored
> > if "A" is matched, yet I get both matched.  I get the same result if I
> > put "A" and "B" within the same group.
> >
> >
> > On Mar 8, 2005, at 6:47 PM, Danny Yoo wrote:
> >
> > >
> > >
> > >>
> > >> Regular expressions are a little evil at times; here's what I think
> > >> you're
> > >> thinking of:
> > >>
> > >> ###
> > > import re
> > > pattern = re.compile(r"""dog(?!cat)
> > >> ...| (?<=dogcat)""", re.VERBOSE)
> > > pattern.match('dogman').start()
> > >> 0
> > > pattern.search('dogcatcher').start()
> > >
> > >
> > >
> > > Hi Mike,
> > >
> > > Gaaah, bad copy-and-paste.  The example with 'dogcatcher' actually does
> > > come up with a result:
> > >
> > > ###
> >  pattern.search('dogcatcher').start()
> > > 6
> > > ###
> > >
> > > Sorry about that!
> > >
> >
> > ___
> > Tutor maillist  -  Tutor@python.org
> > http://mail.python.org/mailman/listinfo/tutor
> >
> 
> --
> 'There is only one basic human right, and that is to do as you damn well 
> please.
> And with it comes the only basic human duty, to take the consequences.
> 


-- 
'There is only one basic human right, and that is to do as you damn well please.
And with it comes the only basic human duty, to take the consequences.
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] regular expression question

2005-03-09 Thread Mike Hall
But I only want to ignore "B" if "A" is a match. If "A" is not a match, 
I'd like it to advance on to "B".

On Mar 9, 2005, at 12:07 PM, Marcos Mendonça wrote:
Hi
Not and regexp expert. But it seems to me that if you want to ignora
"B" then it should be
(A) | (^B)
Hope it helps!
On Wed, 9 Mar 2005 11:11:57 -0800, Mike Hall
<[EMAIL PROTECTED]> wrote:
I'm having some strange results using the "or" operator.  In every 
test
I do I'm matching both sides of the "|" metacharacter, not one or the
other as all documentation says it should be (the parser supposedly
scans left to right, using the first match it finds and ignoring the
rest). It should only go beyond the "|" if there was no match found
before it, no?

Correct me if I'm wrong, but your regex is saying "match dog, unless
it's followed by cat. if it is followed by cat there is no match on
this side of the "|" at which point we advance past it and look at the
alternative expression which says to match in front of cat."
However, if I run a .sub using your regex on a string contain both dog
and cat, both will be replaced.
A simple example will show what I mean:
import re
x = re.compile(r"(A) | (B)")
s = "X R A Y B E"
r = x.sub("13", s)
print r
X R 13Y13 E
...so unless I'm understanding it wrong, "B" is supposed to be ignored
if "A" is matched, yet I get both matched.  I get the same result if I
put "A" and "B" within the same group.
On Mar 8, 2005, at 6:47 PM, Danny Yoo wrote:

Regular expressions are a little evil at times; here's what I think
you're
thinking of:
###
import re
pattern = re.compile(r"""dog(?!cat)
...| (?<=dogcat)""", re.VERBOSE)
pattern.match('dogman').start()
0
pattern.search('dogcatcher').start()

Hi Mike,
Gaaah, bad copy-and-paste.  The example with 'dogcatcher' actually 
does
come up with a result:

###
pattern.search('dogcatcher').start()
6
###
Sorry about that!
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] regular expression question

2005-03-09 Thread Mike Hall
Indeed I do:
>>> import re
>>> x = re.compile('A|B')
>>> s = " Q A R B C"
>>> r = x.sub("13", s)
>>> print r
 Q 13 R 13 C

On Mar 9, 2005, at 12:09 PM, Liam Clarke wrote:
Hi Mike,
Do you get the same results for a search pattern of 'A|B'?
On Wed, 9 Mar 2005 11:11:57 -0800, Mike Hall
<[EMAIL PROTECTED]> wrote:
I'm having some strange results using the "or" operator.  In every 
test
I do I'm matching both sides of the "|" metacharacter, not one or the
other as all documentation says it should be (the parser supposedly
scans left to right, using the first match it finds and ignoring the
rest). It should only go beyond the "|" if there was no match found
before it, no?

Correct me if I'm wrong, but your regex is saying "match dog, unless
it's followed by cat. if it is followed by cat there is no match on
this side of the "|" at which point we advance past it and look at the
alternative expression which says to match in front of cat."
However, if I run a .sub using your regex on a string contain both dog
and cat, both will be replaced.
A simple example will show what I mean:
import re
x = re.compile(r"(A) | (B)")
s = "X R A Y B E"
r = x.sub("13", s)
print r
X R 13Y13 E
...so unless I'm understanding it wrong, "B" is supposed to be ignored
if "A" is matched, yet I get both matched.  I get the same result if I
put "A" and "B" within the same group.
On Mar 8, 2005, at 6:47 PM, Danny Yoo wrote:

Regular expressions are a little evil at times; here's what I think
you're
thinking of:
###
import re
pattern = re.compile(r"""dog(?!cat)
...| (?<=dogcat)""", re.VERBOSE)
pattern.match('dogman').start()
0
pattern.search('dogcatcher').start()

Hi Mike,
Gaaah, bad copy-and-paste.  The example with 'dogcatcher' actually 
does
come up with a result:

###
pattern.search('dogcatcher').start()
6
###
Sorry about that!
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

--
'There is only one basic human right, and that is to do as you damn 
well please.
And with it comes the only basic human duty, to take the consequences.
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] regular expression question

2005-03-09 Thread Liam Clarke
Hi Mike, 

Do you get the same results for a search pattern of 'A|B'?


On Wed, 9 Mar 2005 11:11:57 -0800, Mike Hall
<[EMAIL PROTECTED]> wrote:
> I'm having some strange results using the "or" operator.  In every test
> I do I'm matching both sides of the "|" metacharacter, not one or the
> other as all documentation says it should be (the parser supposedly
> scans left to right, using the first match it finds and ignoring the
> rest). It should only go beyond the "|" if there was no match found
> before it, no?
> 
> Correct me if I'm wrong, but your regex is saying "match dog, unless
> it's followed by cat. if it is followed by cat there is no match on
> this side of the "|" at which point we advance past it and look at the
> alternative expression which says to match in front of cat."
> 
> However, if I run a .sub using your regex on a string contain both dog
> and cat, both will be replaced.
> 
> A simple example will show what I mean:
> 
>  >>> import re
>  >>> x = re.compile(r"(A) | (B)")
>  >>> s = "X R A Y B E"
>  >>> r = x.sub("13", s)
>  >>> print r
> X R 13Y13 E
> 
> ...so unless I'm understanding it wrong, "B" is supposed to be ignored
> if "A" is matched, yet I get both matched.  I get the same result if I
> put "A" and "B" within the same group.
> 
> 
> On Mar 8, 2005, at 6:47 PM, Danny Yoo wrote:
> 
> >
> >
> >>
> >> Regular expressions are a little evil at times; here's what I think
> >> you're
> >> thinking of:
> >>
> >> ###
> > import re
> > pattern = re.compile(r"""dog(?!cat)
> >> ...| (?<=dogcat)""", re.VERBOSE)
> > pattern.match('dogman').start()
> >> 0
> > pattern.search('dogcatcher').start()
> >
> >
> >
> > Hi Mike,
> >
> > Gaaah, bad copy-and-paste.  The example with 'dogcatcher' actually does
> > come up with a result:
> >
> > ###
>  pattern.search('dogcatcher').start()
> > 6
> > ###
> >
> > Sorry about that!
> >
> 
> ___
> Tutor maillist  -  Tutor@python.org
> http://mail.python.org/mailman/listinfo/tutor
> 


-- 
'There is only one basic human right, and that is to do as you damn well please.
And with it comes the only basic human duty, to take the consequences.
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] regular expression question

2005-03-09 Thread Mike Hall
I'm having some strange results using the "or" operator.  In every test 
I do I'm matching both sides of the "|" metacharacter, not one or the 
other as all documentation says it should be (the parser supposedly 
scans left to right, using the first match it finds and ignoring the 
rest). It should only go beyond the "|" if there was no match found 
before it, no?

Correct me if I'm wrong, but your regex is saying "match dog, unless 
it's followed by cat. if it is followed by cat there is no match on 
this side of the "|" at which point we advance past it and look at the 
alternative expression which says to match in front of cat."

However, if I run a .sub using your regex on a string contain both dog 
and cat, both will be replaced.

A simple example will show what I mean:
>>> import re
>>> x = re.compile(r"(A) | (B)")
>>> s = "X R A Y B E"
>>> r = x.sub("13", s)
>>> print r
X R 13Y13 E
...so unless I'm understanding it wrong, "B" is supposed to be ignored 
if "A" is matched, yet I get both matched.  I get the same result if I 
put "A" and "B" within the same group.

On Mar 8, 2005, at 6:47 PM, Danny Yoo wrote:

Regular expressions are a little evil at times; here's what I think 
you're
thinking of:

###
import re
pattern = re.compile(r"""dog(?!cat)
...| (?<=dogcat)""", re.VERBOSE)
pattern.match('dogman').start()
0
pattern.search('dogcatcher').start()

Hi Mike,
Gaaah, bad copy-and-paste.  The example with 'dogcatcher' actually does
come up with a result:
###
pattern.search('dogcatcher').start()
6
###
Sorry about that!
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] regular expression question

2005-03-08 Thread Mike Hall
Sorry, my last reply crossed this one (and yes, I forgot again to CC 
the list).
I'm experimenting now with your use of the "or" operator( "|") between 
two expressions, thanks.


On Mar 8, 2005, at 6:42 PM, Danny Yoo wrote:

On Tue, 8 Mar 2005, Mike Hall wrote:
Yes, my existing regex is using a look behind assertion:
(?<=dog)
...it's also checking the existence of "Cat":
(?!Cat)
...what I'm stuck on is how to essentially use a lookbehind on "Cat",
but only if it exists.
Hi Mike,

[Note: Please do a reply-to-all next time, so that everyone can help 
you.]

Regular expressions are a little evil at times; here's what I think 
you're
thinking of:

###
import re
pattern = re.compile(r"""dog(?!cat)
...| (?<=dogcat)""", re.VERBOSE)
pattern.match('dogman').start()
0
pattern.search('dogcatcher').start()
pattern.search('dogman').start()
0
pattern.search('catwoman')
###
but I can't be sure without seeing some of the examples you'd like the
regular expression to match against.
Best of wishes to you!
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] regular expression question

2005-03-08 Thread Danny Yoo


>
> Regular expressions are a little evil at times; here's what I think you're
> thinking of:
>
> ###
> >>> import re
> >>> pattern = re.compile(r"""dog(?!cat)
> ...| (?<=dogcat)""", re.VERBOSE)
> >>> pattern.match('dogman').start()
> 0
> >>> pattern.search('dogcatcher').start()



Hi Mike,

Gaaah, bad copy-and-paste.  The example with 'dogcatcher' actually does
come up with a result:

###
>>> pattern.search('dogcatcher').start()
6
###

Sorry about that!

___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] regular expression question

2005-03-08 Thread Mike Hall
This will match the position in front of "dog":
(?<=dog)
This will match the position in front of "cat":
(?<=cat)
This will not match in front of "dog" if "dog" is followed by "cat":
(?<=dog)\b (?!cat)
Now my question is how to get this:
(?<=cat)
...but ONLY if "cat" is following "dog." If "dog" does not have "cat"  
following it, then I simply want this:

(?<=dog)

...if that makes sense :) thanks.

On Mar 8, 2005, at 6:05 PM, Danny Yoo wrote:

On Tue, 8 Mar 2005, Mike Hall wrote:
I'd like to get a match for a position in a string preceded by a
specified word (let's call it "Dog"), unless that spot in the string
(after "Dog") is directly followed by a specific word(let's say  
"Cat"),
in which case I want my match to occur directly after "Cat", and not
"Dog."
Hi Mike,
You may want to look at "lookahead" assertions.  These are patterns of  
the
form '(?=...)' or '(?!...).  The documentation mentions them here:

   http://www.python.org/doc/lib/re-syntax.html
and AMK's excellent "Regular Expression HOWTO" covers how one might use
them:
http://www.amk.ca/python/howto/regex/ 
regex.html#SECTION00054

___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] regular expression question

2005-03-08 Thread Danny Yoo


On Tue, 8 Mar 2005, Mike Hall wrote:

> Yes, my existing regex is using a look behind assertion:
>
> (?<=dog)
>
> ...it's also checking the existence of "Cat":
>
> (?!Cat)
>
> ...what I'm stuck on is how to essentially use a lookbehind on "Cat",
> but only if it exists.

Hi Mike,



[Note: Please do a reply-to-all next time, so that everyone can help you.]

Regular expressions are a little evil at times; here's what I think you're
thinking of:

###
>>> import re
>>> pattern = re.compile(r"""dog(?!cat)
...| (?<=dogcat)""", re.VERBOSE)
>>> pattern.match('dogman').start()
0
>>> pattern.search('dogcatcher').start()
>>> pattern.search('dogman').start()
0
>>> pattern.search('catwoman')
>>>
###

but I can't be sure without seeing some of the examples you'd like the
regular expression to match against.


Best of wishes to you!

___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] regular expression question

2005-03-08 Thread Sean Perry
Mike Hall wrote:
First, thanks for the response. Using your re:
my_re = re.compile(r'(dog)(cat)?')

...I seem to simply be matching the pattern "Dog".  Example:
 >>> str1 = "The dog chased the car"
 >>> str2 = "The dog cat parade was under way"
 >>> x1 = re.compile(r'(dog)(cat)?')
 >>> rep1 = x1.sub("REPLACE", str1)
 >>> rep2 = x2.sub("REPLACE", str2)
 >>> print rep1
The REPLACE chased the car
 >>> print rep2
The REPLACE cat parade was under way
...what I'm looking for is a match for the position in front of "Cat", 
should it exist.

Because my regex says 'look for the word "dog" and remember where you 
found it. If you also find the word "cat", remember that too'. Nowhere 
does it say "watch out for whitespace".

r'(dog)\s*(cat)?' says match 'dog' followed by zero or more whitespace 
(spaces, tabs, etc.) and maybe 'cat'.

There is a wonderful O'Reilly book called "Mastering Regular 
Expressions" or as Danny points out the AMK howto is good.
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] regular expression question

2005-03-08 Thread Mike Hall
First, thanks for the response. Using your re:
my_re = re.compile(r'(dog)(cat)?')
...I seem to simply be matching the pattern "Dog".  Example:
>>> str1 = "The dog chased the car"
>>> str2 = "The dog cat parade was under way"
>>> x1 = re.compile(r'(dog)(cat)?')
>>> rep1 = x1.sub("REPLACE", str1)
>>> rep2 = x2.sub("REPLACE", str2)
>>> print rep1
The REPLACE chased the car
>>> print rep2
The REPLACE cat parade was under way
...what I'm looking for is a match for the position in front of "Cat", 
should it exist.


On Mar 8, 2005, at 5:54 PM, Sean Perry wrote:
Mike Hall wrote:
I'd like to get a match for a position in a string preceded by a 
specified word (let's call it "Dog"), unless that spot in the string 
(after "Dog") is directly followed by a specific word(let's say 
"Cat"), in which case I want my match to occur directly after "Cat", 
and not "Dog."
I can easily get the spot after "Dog," and I can also get it to 
ignore this spot if "Dog" is followed by "Cat." But what I'm having 
trouble with is how to match the spot after "Cat" if this word does 
indeed exist in the string.
. >>> import re
. >>> my_re = re.compile(r'(dog)(cat)?') # the ? means "find one or 
zero of these, in other words cat is optional.
. >>> m = my_re.search("This is a nice dog is it not?")
. >>> dir(m)
['__copy__', '__deepcopy__', 'end', 'expand', 'group', 'groupdict', 
'groups', 'span', 'start']
. >>> m.span()
(15, 18)
. >>> m = my_re.search("This is a nice dogcat is it not?")
. >>> m.span()
(15, 21)

If m is None then no match was found. span returns the locations in 
the string where the match occured. So in the dogcat sentence the last 
char is 21.

. >>> "This is a nice dogcat is it not?"[21:]
' is it not?'
Hope that helps.
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] regular expression question

2005-03-08 Thread Danny Yoo


On Tue, 8 Mar 2005, Mike Hall wrote:

> I'd like to get a match for a position in a string preceded by a
> specified word (let's call it "Dog"), unless that spot in the string
> (after "Dog") is directly followed by a specific word(let's say "Cat"),
> in which case I want my match to occur directly after "Cat", and not
> "Dog."

Hi Mike,

You may want to look at "lookahead" assertions.  These are patterns of the
form '(?=...)' or '(?!...).  The documentation mentions them here:

   http://www.python.org/doc/lib/re-syntax.html

and AMK's excellent "Regular Expression HOWTO" covers how one might use
them:

http://www.amk.ca/python/howto/regex/regex.html#SECTION00054

___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] regular expression question

2005-03-08 Thread Sean Perry
Mike Hall wrote:
I'd like to get a match for a position in a string preceded by a 
specified word (let's call it "Dog"), unless that spot in the string 
(after "Dog") is directly followed by a specific word(let's say "Cat"), 
in which case I want my match to occur directly after "Cat", and not "Dog."

I can easily get the spot after "Dog," and I can also get it to ignore 
this spot if "Dog" is followed by "Cat." But what I'm having trouble 
with is how to match the spot after "Cat" if this word does indeed exist 
in the string.


. >>> import re
. >>> my_re = re.compile(r'(dog)(cat)?') # the ? means "find one or zero 
of these, in other words cat is optional.
. >>> m = my_re.search("This is a nice dog is it not?")
. >>> dir(m)
['__copy__', '__deepcopy__', 'end', 'expand', 'group', 'groupdict', 
'groups', 'span', 'start']
. >>> m.span()
(15, 18)
. >>> m = my_re.search("This is a nice dogcat is it not?")
. >>> m.span()
(15, 21)

If m is None then no match was found. span returns the locations in the 
string where the match occured. So in the dogcat sentence the last char 
is 21.

. >>> "This is a nice dogcat is it not?"[21:]
' is it not?'
Hope that helps.
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor