Re: [Tutor] stopping greedy matches
On Fri, Mar 18, 2005 at 12:27:35PM -0500, Christopher Weimann wrote: So this [^\s]+ means match one or more of any char that isn't whitespace. Could be just \S+ Greetings, Jo! -- Reply hazy, ask again later. ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] stopping greedy matches
On 03/17/2005-10:15AM, Mike Hall wrote: applause Very nice sir. I'm interested in what you're doing here with the caret metacharacter. For one thing, why enclose it and the whitespace flag within a character class? A caret as the first charachter in a class is a negation. So this [^\s]+ means match one or more of any char that isn't whitespace. Does this not traditionally mean you want to strip a metacharacter of it's special meaning? That would be \ ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] stopping greedy matches
On 03/18/2005-10:35AM, Mike Hall wrote: A caret as the first charachter in a class is a negation. So this [^\s]+ means match one or more of any char that isn't whitespace. Ok, so the context of metas change within a class. That makes sense, but I'm unclear on the discrepancy below. The ^ means begining of line EXCEPT inside a charachter class. There it means NOT for the entire class and it only means that if it is the very first charachter. I suppose you could consider that the there are two separate types of char classes. One is started with [ and the other is started with [^. That would be \ Here's where I'm confused. From the Python docs: Special characters are not active inside sets. For example, [akm$] will match any of the characters a, k, m, or $ And the next paragraphs says... You can match the characters not within a range by complementing the set. This is indicated by including a ^ as the first character of the class; ^ elsewhere will simply match the ^ character. For example, [^5] will match any character except 5. ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] stopping greedy matches
On Mar 18, 2005, at 1:02 PM, Christopher Weimann wrote: On 03/18/2005-10:35AM, Mike Hall wrote: A caret as the first charachter in a class is a negation. So this [^\s]+ means match one or more of any char that isn't whitespace. Ok, so the context of metas change within a class. That makes sense, but I'm unclear on the discrepancy below. The ^ means begining of line EXCEPT inside a charachter class. There it means NOT for the entire class and it only means that if it is the very first charachter. I suppose you could consider that the there are two separate types of char classes. One is started with [ and the other is started with [^. Got it, thanks. That would be \ Here's where I'm confused. From the Python docs: Special characters are not active inside sets. For example, [akm$] will match any of the characters a, k, m, or $ And the next paragraphs says... You can match the characters not within a range by complementing the set. This is indicated by including a ^ as the first character of the class; ^ elsewhere will simply match the ^ character. For example, [^5] will match any character except 5. The sad thing is I have read that paragraph before (but obviously hadn't absorbed the significance). I'm new to this, it'll sink in. Thanks. ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] stopping greedy matches
applause Very nice sir. I'm interested in what you're doing here with the caret metacharacter. For one thing, why enclose it and the whitespace flag within a character class? Does this not traditionally mean you want to strip a metacharacter of it's special meaning? On Mar 16, 2005, at 8:00 PM, Christopher Weimann wrote: On 03/16/2005-12:12PM, Mike Hall wrote: I'm having trouble getting re to stop matching after it's consumed what I want it to. Using this string as an example, the goal is to match CAPS: s = only the word in CAPS should be matched jet% python Python 2.4 (#2, Jan 5 2005, 15:59:52) [GCC 2.95.4 20020320 [FreeBSD]] on freebsd4 Type help, copyright, credits or license for more information. import re s = only the word in CAPS should be matched x=re.compile(r\bin ([^\s]+)) x.findall(s) ['CAPS'] ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] stopping greedy matches
On Mar 16, 2005, at 8:32 PM, Kent Johnson wrote: in (.*?)\b will match against in because you use .* which will match an empty string. Try in (.+?)\b (or (?=\bin)..+?\b )to require one character after the space. Another working example, excellent. I'm not too clear on why the back to back .. in (?=\bin)..+?\b ) makes the regex work, but it does. You can't import it, you have to run it from the command line. I don't know if it is installed under Mac OSX though. You might be interested in RegexPlor: http://python.net/~gherman/RegexPlor.html RegexPlor looks fantastic, will be downloading. Thanks. ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] stopping greedy matches
I don't have that script on my system, but I may put pythoncard on here and run it through that: http://pythoncard.sourceforge.net/samples/redemo.html Although regexPlor looks like it has the same functionality, so I may just go with that. Thanks. On Mar 17, 2005, at 1:31 AM, Michael Dunn wrote: As Kent said, redemo.py is a script that you run (e.g. from the command line), rather than something to import into the python interpretor. On my OSX machine it's located in the directory: /Applications/MacPython-2.3/Extras/Tools/scripts Cheers, Michael ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] stopping greedy matches
Mike Hall wrote: On Mar 16, 2005, at 8:32 PM, Kent Johnson wrote: in (.*?)\b will match against in because you use .* which will match an empty string. Try in (.+?)\b (or (?=\bin)..+?\b )to require one character after the space. Another working example, excellent. I'm not too clear on why the back to back .. in (?=\bin)..+?\b ) makes the regex work, but it does. The first one matches the space after 'in'. Without it the .+? will match the single space, then \b matches the *start* of the next word. Kent ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] stopping greedy matches
On Mar 17, 2005, at 11:11 AM, Kent Johnson wrote: The first one matches the space after 'in'. Without it the .+? will match the single space, then \b matches the *start* of the next word. I think I understand. Basically the first dot advances the pattern forward in order to perform a non-greedy match on the following word.(?) Very nice. ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] stopping greedy matches
Mike Hall wrote: On Mar 17, 2005, at 11:11 AM, Kent Johnson wrote: The first one matches the space after 'in'. Without it the .+? will match the single space, then \b matches the *start* of the next word. I think I understand. Basically the first dot advances the pattern forward in order to perform a non-greedy match on the following word.(?) Very nice. That's right. The first dot could just as well be a space or \s or maybe even \s+ (to match any amount of white space). I actually used the dot because I thought it would be clearer than a space :-) Kent ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] stopping greedy matches
Liam, re.compile(in (.*?)\b) will not find any match in the example string I provided. I have had little luck with these non-greedy matchers. I don't appear to have redemo.py on my system (on OSX), as an import returns an error. I will look into finding this module, thanks for pointing me towards it :) On Mar 16, 2005, at 2:36 PM, Liam Clarke wrote: x=re.compile(r(?=\bin).+\b) Try x = re.compile(in (.*?)\b) .*? is a non-greedy matcher I believe. Are you using python24/tools/scripts/redemo.py? Use that to test regexes. Regards, Liam Clarke On Wed, 16 Mar 2005 12:12:32 -0800, Mike Hall [EMAIL PROTECTED] wrote: I'm having trouble getting re to stop matching after it's consumed what I want it to. Using this string as an example, the goal is to match CAPS: s = only the word in CAPS should be matched So let's say I want to specify when to begin my pattern by using a lookbehind: x = re.compile(r(?=\bin)) #this will simply match the spot in front of in So that's straight forward, but let's say I don't want to use a lookahead to specify the end of my pattern, I simply want it to stop after it has combed over the word following in. I would expect this to work, but it doesn't: x=re.compile(r(?=\bin).+\b) #this will consume everything past in all the way to the end of the string In the above example I would think that the word boundary flag \b would indicate a stopping point. Is .+\b not saying, keep matching characters until a word boundary has been reached? Even stranger are the results I get from: x=re.compile(r(?=\bin).+\s) #keep matching characters until a whitespace has been reached(?) r = x.sub([EMAIL PROTECTED], s) print r only the word [EMAIL PROTECTED] For some reason there it's decided to consume three words instead of one. My question is simply this: after specifying a start point, how do I make a match stop after it has found one word, and one word only? As always, all help is appreciated. ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor -- 'There is only one basic human right, and that is to do as you damn well please. And with it comes the only basic human duty, to take the consequences. ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] stopping greedy matches
On Mar 16, 2005, at 5:32 PM, Sean Perry wrote: I know this does not directly help, but I have never successfully used \b in my regexs. I always end up writing something like foo\s+bar or something more intense. I've had luck with the boundary flag in relation to lookbehinds. For example, if I wanted to only match after int (and not print) (?=\bint) seems to work fine. I'm a bit frustrated at not being able to find a simple way to have a search stop after eating up one word. You'd think the \b would do it, but nope. ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] stopping greedy matches
On 03/16/2005-12:12PM, Mike Hall wrote: I'm having trouble getting re to stop matching after it's consumed what I want it to. Using this string as an example, the goal is to match CAPS: s = only the word in CAPS should be matched jet% python Python 2.4 (#2, Jan 5 2005, 15:59:52) [GCC 2.95.4 20020320 [FreeBSD]] on freebsd4 Type help, copyright, credits or license for more information. import re s = only the word in CAPS should be matched x=re.compile(r\bin ([^\s]+)) x.findall(s) ['CAPS'] ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] stopping greedy matches
Mike Hall wrote: Liam, re.compile(in (.*?)\b) will not find any match in the example string I provided. I have had little luck with these non-greedy matchers. in (.*?)\b will match against in because you use .* which will match an empty string. Try in (.+?)\b (or (?=\bin)..+?\b )to require one character after the space. The non-greedy match is very useful, if you can't get it to work ask for help. I don't appear to have redemo.py on my system (on OSX), as an import returns an error. I will look into finding this module, thanks for pointing me towards it :) You can't import it, you have to run it from the command line. I don't know if it is installed under Mac OSX though. You might be interested in RegexPlor: http://python.net/~gherman/RegexPlor.html Kent ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor