Re: [Tutor] stopping greedy matches

2005-03-20 Thread Joerg Woelke
On Fri, Mar 18, 2005 at 12:27:35PM -0500, Christopher Weimann wrote:

 So this [^\s]+ means match one or more of any char that
 isn't whitespace.  

Could be just \S+

Greetings, Jo!

-- 
Reply hazy, ask again later.
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] stopping greedy matches

2005-03-19 Thread Christopher Weimann
On 03/17/2005-10:15AM, Mike Hall wrote:
 
 applause Very nice sir. I'm interested in what you're doing here with 
 the caret metacharacter. For one thing, why enclose it and the 
 whitespace flag within a character class? 

A caret as the first charachter in a class is a negation.
So this [^\s]+ means match one or more of any char that
isn't whitespace.  

 Does this not traditionally 
 mean you want to strip a metacharacter of it's special meaning?
 

That would be \
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] stopping greedy matches

2005-03-18 Thread Christopher Weimann
On 03/18/2005-10:35AM, Mike Hall wrote:
   
  A caret as the first charachter in a class is a negation. 
  So this [^\s]+ means match one or more of any char that 
  isn't whitespace.   
  
  
 Ok, so the context of metas change within a class. That makes sense, 
 but I'm unclear on the discrepancy below. 
 

The ^ means begining of line EXCEPT inside a charachter class. There it
means NOT for the entire class and it only means that if it is the very
first charachter. I suppose you could consider that the there are two
separate types of char classes. One is started with [ and the other is
started with [^.

 
   
  That would be \ 
  
  
 Here's where I'm confused. From the Python docs:  
 
 Special characters are not active inside sets. For example, [akm$] will 
 match any of the characters a, k, m, or $
 

And the next paragraphs says...

  You can match the characters not within a range by complementing the
  set. This is indicated by including a ^ as the first character of the
  class; ^ elsewhere will simply match the ^ character. For example,
  [^5] will match any character except 5.

___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] stopping greedy matches

2005-03-18 Thread Mike Hall
On Mar 18, 2005, at 1:02 PM, Christopher Weimann wrote:
On 03/18/2005-10:35AM, Mike Hall wrote:
A caret as the first charachter in a class is a negation.
So this [^\s]+ means match one or more of any char that
isn't whitespace.
Ok, so the context of metas change within a class. That makes sense,
but I'm unclear on the discrepancy below.
The ^ means begining of line EXCEPT inside a charachter class. There it
means NOT for the entire class and it only means that if it is the very
first charachter. I suppose you could consider that the there are two
separate types of char classes. One is started with [ and the other is
started with [^.
Got it, thanks.


That would be \
Here's where I'm confused. From the Python docs:
Special characters are not active inside sets. For example, [akm$] 
will
match any of the characters a, k, m, or $

And the next paragraphs says...
  You can match the characters not within a range by complementing the
  set. This is indicated by including a ^ as the first character of 
the
  class; ^ elsewhere will simply match the ^ character. For 
example,
  [^5] will match any character except 5.


The sad thing is I have read that paragraph before (but obviously 
hadn't absorbed the significance). I'm new to this, it'll sink in. 
Thanks.

___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] stopping greedy matches

2005-03-17 Thread Mike Hall
applause Very nice sir. I'm interested in what you're doing here with 
the caret metacharacter. For one thing, why enclose it and the 
whitespace flag within a character class? Does this not traditionally 
mean you want to strip a metacharacter of it's special meaning?

On Mar 16, 2005, at 8:00 PM, Christopher Weimann wrote:
On 03/16/2005-12:12PM, Mike Hall wrote:
I'm having trouble getting re to stop matching after it's consumed
what I want it to.  Using this string as an example, the goal is to
match CAPS:
s = only the word in CAPS should be matched

jet% python
Python 2.4 (#2, Jan  5 2005, 15:59:52)
[GCC 2.95.4 20020320 [FreeBSD]] on freebsd4
Type help, copyright, credits or license for more information.
import re
s = only the word in CAPS should be matched
x=re.compile(r\bin ([^\s]+))
x.findall(s)
['CAPS']


___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] stopping greedy matches

2005-03-17 Thread Mike Hall
On Mar 16, 2005, at 8:32 PM, Kent Johnson wrote:
in (.*?)\b will match against in  because you use .* which will 
match an empty string. Try in (.+?)\b (or (?=\bin)..+?\b )to 
require one character after the space.

Another working example, excellent. I'm not too clear on why the back 
to back .. in (?=\bin)..+?\b ) makes the regex work, but it does.

You can't import it, you have to run it from the command line. I don't 
know if it is installed under Mac OSX though. You might be interested 
in RegexPlor:
http://python.net/~gherman/RegexPlor.html

RegexPlor looks fantastic, will be downloading. Thanks.
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] stopping greedy matches

2005-03-17 Thread Mike Hall
I don't have that script on my system, but I may put pythoncard on here 
and run it through that:

http://pythoncard.sourceforge.net/samples/redemo.html
Although regexPlor looks like it has the same functionality, so I may 
just go with that. Thanks.

On Mar 17, 2005, at 1:31 AM, Michael Dunn wrote:
As Kent said, redemo.py is a script that you run (e.g. from the
command line), rather than something to import into the python
interpretor. On my OSX machine it's located in the directory:
/Applications/MacPython-2.3/Extras/Tools/scripts
Cheers, Michael
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] stopping greedy matches

2005-03-17 Thread Kent Johnson
Mike Hall wrote:
On Mar 16, 2005, at 8:32 PM, Kent Johnson wrote:
in (.*?)\b will match against in  because you use .* which will 
match an empty string. Try in (.+?)\b (or (?=\bin)..+?\b )to 
require one character after the space.


Another working example, excellent. I'm not too clear on why the back to 
back .. in (?=\bin)..+?\b ) makes the regex work, but it does.
The first one matches the space after 'in'. Without it the .+? will match the single space, then \b 
matches the *start* of the next word.

Kent
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] stopping greedy matches

2005-03-17 Thread Mike Hall
On Mar 17, 2005, at 11:11 AM, Kent Johnson wrote:
The first one matches the space after 'in'. Without it the .+? will 
match the single space, then \b matches the *start* of the next word.
I think I understand. Basically the first dot advances the pattern 
forward in order to perform a non-greedy match on the following 
word.(?) Very nice.

___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] stopping greedy matches

2005-03-17 Thread Kent Johnson
Mike Hall wrote:
On Mar 17, 2005, at 11:11 AM, Kent Johnson wrote:
The first one matches the space after 'in'. Without it the .+? will 
match the single space, then \b matches the *start* of the next word.

I think I understand. Basically the first dot advances the pattern 
forward in order to perform a non-greedy match on the following word.(?) 
Very nice.
That's right. The first dot could just as well be a space or \s or maybe even \s+ (to match any 
amount of white space). I actually used the dot because I thought it would be clearer than a space :-)

Kent
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] stopping greedy matches

2005-03-16 Thread Mike Hall
Liam, re.compile(in (.*?)\b) will not find any match in the example 
string I provided. I have had little luck with these non-greedy 
matchers.

I don't appear to have redemo.py on my system (on OSX), as an import 
returns an error. I will look into finding this module, thanks for 
pointing me towards it :)

On Mar 16, 2005, at 2:36 PM, Liam Clarke wrote:
x=re.compile(r(?=\bin).+\b)
Try
x = re.compile(in (.*?)\b)
.*? is a non-greedy matcher I believe.
Are you using python24/tools/scripts/redemo.py? Use that to test 
regexes.

Regards,
Liam Clarke
On Wed, 16 Mar 2005 12:12:32 -0800, Mike Hall
[EMAIL PROTECTED] wrote:
I'm having trouble getting re to stop matching after it's consumed 
what
I want it to.  Using this string as an example, the goal is to match
CAPS:

s = only the word in CAPS should be matched
So let's say I want to specify when to begin my pattern by using a
lookbehind:
x = re.compile(r(?=\bin)) #this will simply match the spot in
front of in
So that's straight forward, but let's say I don't want to use a
lookahead to specify the end of my pattern, I simply want it to stop
after it has combed over the word following in. I would expect this
to work, but it doesn't:
x=re.compile(r(?=\bin).+\b) #this will consume everything past
in all the way to the end of the string
In the above example I would think that the word boundary flag \b
would indicate a stopping point. Is .+\b not saying, keep matching
characters until a word boundary has been reached?
Even stranger are the results I get from:
x=re.compile(r(?=\bin).+\s) #keep matching characters until a
whitespace has been reached(?)
r = x.sub([EMAIL PROTECTED], s)
print r
only the word [EMAIL PROTECTED]
For some reason there it's decided to consume three words instead of
one.
My question is simply this:  after specifying a start point,  how do I
make a match stop after it has found one word, and one word only? As
always, all help is appreciated.
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


--
'There is only one basic human right, and that is to do as you damn 
well please.
And with it comes the only basic human duty, to take the consequences.

___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] stopping greedy matches

2005-03-16 Thread Mike Hall
On Mar 16, 2005, at 5:32 PM, Sean Perry wrote:
I know this does not directly help, but I have never successfully used 
\b in my regexs. I always end up writing something like foo\s+bar or 
something more intense.
I've had luck with the boundary flag in relation to lookbehinds. For 
example, if I wanted to only match after int (and not print) 
(?=\bint) seems to work fine. I'm a bit frustrated at not being able 
to find a simple way to have a  search stop after eating up one word. 
You'd think the \b would do it, but nope.


___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] stopping greedy matches

2005-03-16 Thread Christopher Weimann
On 03/16/2005-12:12PM, Mike Hall wrote:
 I'm having trouble getting re to stop matching after it's consumed 
 what I want it to.  Using this string as an example, the goal is to 
 match CAPS: 
 
  s = only the word in CAPS should be matched 
 

jet% python
Python 2.4 (#2, Jan  5 2005, 15:59:52)
[GCC 2.95.4 20020320 [FreeBSD]] on freebsd4
Type help, copyright, credits or license for more information.
 import re
 s = only the word in CAPS should be matched
 x=re.compile(r\bin ([^\s]+))
 x.findall(s)
['CAPS']


___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] stopping greedy matches

2005-03-16 Thread Kent Johnson
Mike Hall wrote:
Liam, re.compile(in (.*?)\b) will not find any match in the example 
string I provided. I have had little luck with these non-greedy matchers.
in (.*?)\b will match against in  because you use .* which will match an empty string. Try in 
(.+?)\b (or (?=\bin)..+?\b )to require one character after the space.

The non-greedy match is very useful, if you can't get it to work ask for help.
I don't appear to have redemo.py on my system (on OSX), as an import 
returns an error. I will look into finding this module, thanks for 
pointing me towards it :)
You can't import it, you have to run it from the command line. I don't know if it is installed under 
Mac OSX though. You might be interested in RegexPlor:
http://python.net/~gherman/RegexPlor.html

Kent
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor