Re: [Tutor] regular expression question
On Tue, Apr 28, 2009 at 4:03 AM, Kelie wrote: > Hello, > > The following code returns 'abc123abc45abc789jk'. How do I revise the pattern > so > that the return value will be 'abc789jk'? In other words, I want to find the > pattern 'abc' that is closest to 'jk'. Here the string '123', '45' and '789' > are > just examples. They are actually quite different in the string that I'm > working > with. > > import re > s = 'abc123abc45abc789jk' > p = r'abc.+jk' > lst = re.findall(p, s) > print lst[0] re.findall() won't work because it finds non-overlapping matches. If there is a character in the initial match which cannot occur in the middle section, change .+ to exclude that character. For example, r'abc[^a]+jk' works with your example. Another possibility is to look for the match starting at different locations, something like this: p = re.compile(r'abc.+jk') lastMatch = None i = 0 while i < len(s): m = p.search(s, i) if m is None: break lastMatch = m.group() i = m.start() + 1 print lastMatch Kent ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] regular expression question
2009/4/28 Marek spociń...@go2.pl,Poland : >> import re >> s = 'abc123abc45abc789jk' >> p = r'abc.+jk' >> lst = re.findall(p, s) >> print lst[0] > > I suggest using r'abc.+?jk' instead. > > the additional ? makes the preceeding '.+' non-greedy so instead of matching > as long string as it can it matches as short string as possible. Did you try it? It doesn't do what you expect, it still matches at the beginning of the string. The re engine searches for a match at a location and returns the first one it finds. A non-greedy match doesn't mean "Find the shortest possible match anywhere in the string", it means, "find the shortest possible match starting at this location." Kent ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] regular expression question
spir free.fr> writes: > To avoid that, use non-grouping parens (?:...). This also avoids the need for parens around the whole format: > p = Pattern(r'abc(?:(?!abc).)+jk') > print p.findall(s) > ['abc789jk'] > > Denis This one works! Thank you Denis. I'll try it out on the actual much longer (multiline) string and see what happens. ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] regular expression question
Andre Engels gmail.com> writes: > > 2009/4/28 Marek Spociński go2.pl,Poland 10g.pl>: > > I suggest using r'abc.+?jk' instead. > > > > That was my first idea too, but it does not work for this case, > because Python will still try to _start_ the match as soon as > possible. yeah, i tried the '?' as well and realized it would not work. ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] regular expression question
Le Tue, 28 Apr 2009 11:06:16 +0200, Marek spociń...@go2.pl, Poland s'exprima ainsi: > > Hello, > > > > The following code returns 'abc123abc45abc789jk'. How do I revise the > > pattern so that the return value will be 'abc789jk'? In other words, I > > want to find the pattern 'abc' that is closest to 'jk'. Here the string > > '123', '45' and '789' are just examples. They are actually quite > > different in the string that I'm working with. > > > > import re > > s = 'abc123abc45abc789jk' > > p = r'abc.+jk' > > lst = re.findall(p, s) > > print lst[0] > > I suggest using r'abc.+?jk' instead. > > the additional ? makes the preceeding '.+' non-greedy so instead of > matching as long string as it can it matches as short string as possible. Non-greedy repetition will not work in this case, I guess: from re import compile as Pattern s = 'abc123abc45abc789jk' p = Pattern(r'abc.+?jk') print p.match(s).group() ==> abc123abc45abc789jk (Someone explain why?) My solution would be to explicitely exclude 'abc' from the sequence of chars matched by '.+'. To do this, use negative lookahead (?!...) before '.': p = Pattern(r'(abc((?!abc).)+jk)') print p.findall(s) ==> [('abc789jk', '9')] But it's not exactly what you want. Because the internal () needed to express exclusion will be considered by findall as a group to be returned, so that you also get the last char matched in there. To avoid that, use non-grouping parens (?:...). This also avoids the need for parens around the whole format: p = Pattern(r'abc(?:(?!abc).)+jk') print p.findall(s) ['abc789jk'] Denis -- la vita e estrany ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] regular expression question
Dnia 28 kwietnia 2009 11:16 Andre Engels napisał(a): > 2009/4/28 Marek spociń...@go2.pl,Poland : > >> Hello, > >> > >> The following code returns 'abc123abc45abc789jk'. How do I revise the > >> pattern so > >> that the return value will be 'abc789jk'? In other words, I want to find > >> the > >> pattern 'abc' that is closest to 'jk'. Here the string '123', '45' and > >> '789' are > >> just examples. They are actually quite different in the string that I'm > >> working > >> with. > >> > >> import re > >> s = 'abc123abc45abc789jk' > >> p = r'abc.+jk' > >> lst = re.findall(p, s) > >> print lst[0] > > > > I suggest using r'abc.+?jk' instead. > > > > the additional ? makes the preceeding '.+' non-greedy so instead of > > matching as long string as it can it matches as short string as possible. > > That was my first idea too, but it does not work for this case, > because Python will still try to _start_ the match as soon as > possible. To use .+? one would have to revert the string, then use the > reverse regular expression on the result, which looks like a rather > roundabout way of doing things. I don't have access to python right now so i cannot test my ideas... And i don't really want to give you wrong idea too. ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] regular expression question
2009/4/28 Marek spociń...@go2.pl,Poland : >> Hello, >> >> The following code returns 'abc123abc45abc789jk'. How do I revise the >> pattern so >> that the return value will be 'abc789jk'? In other words, I want to find the >> pattern 'abc' that is closest to 'jk'. Here the string '123', '45' and '789' >> are >> just examples. They are actually quite different in the string that I'm >> working >> with. >> >> import re >> s = 'abc123abc45abc789jk' >> p = r'abc.+jk' >> lst = re.findall(p, s) >> print lst[0] > > I suggest using r'abc.+?jk' instead. > > the additional ? makes the preceeding '.+' non-greedy so instead of matching > as long string as it can it matches as short string as possible. That was my first idea too, but it does not work for this case, because Python will still try to _start_ the match as soon as possible. To use .+? one would have to revert the string, then use the reverse regular expression on the result, which looks like a rather roundabout way of doing things. -- André Engels, andreeng...@gmail.com ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] regular expression question
> Hello, > > The following code returns 'abc123abc45abc789jk'. How do I revise the pattern > so > that the return value will be 'abc789jk'? In other words, I want to find the > pattern 'abc' that is closest to 'jk'. Here the string '123', '45' and '789' > are > just examples. They are actually quite different in the string that I'm > working > with. > > import re > s = 'abc123abc45abc789jk' > p = r'abc.+jk' > lst = re.findall(p, s) > print lst[0] I suggest using r'abc.+?jk' instead. the additional ? makes the preceeding '.+' non-greedy so instead of matching as long string as it can it matches as short string as possible. ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] regular expression question
> I wonder if anyone can help me with an RE. I also wonder if there is an > RE mailing list anywhere - I haven't managed to find one. Hi Debbie, I haven't found one either. There appear to be a lot of good resources here: http://dmoz.org/Computers/Programming/Languages/Regular_Expressions/ > I'm trying to use this regular expression to delete particular strings > from a file before tokenising it. Why not tokenize the file first, and then drop the strings with a period? You may not need to do all your tokenization at once. Can you do it in phases? > I want to delete all strings that have a full stop (period) when it is > not at the beginning or end of a word, and also when it is not followed > by a closing bracket. Let's make sure we're using the same concepts. By "string", do you mean "word"? That is, if we have something like: "I went home last Thursday." do you expect the regular expression to match against the whole thing? "I went home last Thursday." Or do you expect it to match against the specific end word? "Thursday." I'm just trying to make sure we're using the same terms. How specific do you want your regular expression to be? Going back to your question: > I want to delete all strings that have a full stop (period) when it is > not at the beginning or end of a word, and also when it is not followed > by a closing bracket. from a first glance, I think you're looking for a "lookahead assertion": http://www.amk.ca/python/howto/regex/regex.html#SECTION00054 > I want to delete file names (eg. fileX.doc), and websites (when www/http > not given) but not file extensions (eg. this is in .jpg format). I also > don't want to delete the last word of each sentence just because it > precedes a fullstop, or if there's a fullstop followed by a closing > bracket. Does this need to be part of the same regular expression? There are a lot of requirements here: can we encode this in some kind of test class, so that we're sure we're hitting all your requirements? Here's what I think you're looking for so far, written in terms of a unit test: ## import unittest class DebbiesRegularExpressionTest(unittest.TestCase): def setUp(self): self.fullstopRe = re.compile("... fill me in") def testRecognizingEndWord(self): self.assertEquals( ["Thursday."], self.fullstopRe.findall("I went home last Thursday.")) def testEndWordWithBracket(self): self.assertEquals( ["bar."], self.fullstopRe.findall("[this is foo.] bar. licious")) if __name__ == '__main__': unittest.main() ## If these tests don't match with what you want, please feel free to edit and add more to them so that we can be more clear about what you want. Best of wishes to you! ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] regular expression question
D Elliott wrote: I wonder if anyone can help me with an RE. I also wonder if there is an RE mailing list anywhere - I haven't managed to find one. I'm trying to use this regular expression to delete particular strings from a file before tokenising it. I want to delete all strings that have a full stop (period) when it is not at the beginning or end of a word, and also when it is not followed by a closing bracket. I want to delete file names (eg. fileX.doc), and websites (when www/http not given) but not file extensions (eg. this is in .jpg format). I also don't want to delete the last word of each sentence just because it precedes a fullstop, or if there's a fullstop followed by a closing bracket. fullstopRe = re.compile (r'\S+\.[^)}]]+') There are two problems with this is: - The ] inside the [] group must be escaped like this: [^)}\]] - [^)}\]] matches any whitespace so it will match on the ends of words It's not clear from your description if the closing bracket must immediately follow the full stop or if it can be anywhere after it. If you want it to follow immediately then use \S+\.[^)}\]\s]\S* If you want to allow the bracket anywhere after the stop you must force the match to go to a word boundary otherwise you will match foo.bar when the word is foo.bar]. I think this works: (\S+\.[^)}\]\s]+)(\s) but you have to include the second group in your substitution string. BTW C:\Python23\pythonw.exe C:\Python24\Tools\Scripts\redemo.py is very helpful with questions like this... Kent ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] regular expression question
Mike Hall wrote: A simple example will show what I mean: >>> import re >>> x = re.compile(r"(A) | (B)") >>> s = "X R A Y B E" >>> r = x.sub("13", s) >>> print r X R 13Y13 E ...so unless I'm understanding it wrong, "B" is supposed to be ignored if "A" is matched, yet I get both matched. I get the same result if I put "A" and "B" within the same group. The problem is with your use of sub(), not with |. By default, re.sub() substitutes *all* matches. If you just want to substitute the first match, include the optional count parameter: >>> import re >>> s = "X R A Y B E" >>> re.sub(r"(A) | (B)", '13', s) 'X R 13Y13 E' >>> re.sub(r"(A) | (B)", '13', s, 1) 'X R 13Y B E' BTW, there is a very handy interactive regex tester that comes with Python. On Windows, it is installed at C:\Python23\Tools\Scripts\redemo.py Kent ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] regular expression question
but yeah, it seems you're expecting it to examine the string as a whole. I guess I was, good point. On Mar 9, 2005, at 12:28 PM, Liam Clarke wrote: Actually, you should get that anyway... """ | Alternation, or the ``or'' operator. If A and B are regular expressions, A|B will match any string that matches either "A" or "B". | has very low precedence in order to make it work reasonably when you're alternating multi-character strings. Crow|Servo will match either "Crow" or "Servo", not "Cro", a "w" or an "S", and "ervo". """ So, for each letter in that string, it's checking to see if any letter matches 'A' or 'B' ... the engine steps through one character at a time. sorta like - for letter in s: if letter == 'A': #Do some string stuff elif letter == 'B': #do some string stuff i.e. k = ['A','B', 'C', 'B'] for i in range(len(k)): if k[i] == 'A' or k[i]=='B': k[i]==13 print k [13, 13, 'C', 13] You can limit substitutions using an optional argument, but yeah, it seems you're expecting it to examine the string as a whole. Check out the example here - http://www.amk.ca/python/howto/regex/ regex.html#SECTION00032 Also http://www.regular-expressions.info/alternation.html Regards, Liam Clarke On Thu, 10 Mar 2005 09:09:13 +1300, Liam Clarke <[EMAIL PROTECTED]> wrote: Hi Mike, Do you get the same results for a search pattern of 'A|B'? On Wed, 9 Mar 2005 11:11:57 -0800, Mike Hall <[EMAIL PROTECTED]> wrote: I'm having some strange results using the "or" operator. In every test I do I'm matching both sides of the "|" metacharacter, not one or the other as all documentation says it should be (the parser supposedly scans left to right, using the first match it finds and ignoring the rest). It should only go beyond the "|" if there was no match found before it, no? Correct me if I'm wrong, but your regex is saying "match dog, unless it's followed by cat. if it is followed by cat there is no match on this side of the "|" at which point we advance past it and look at the alternative expression which says to match in front of cat." However, if I run a .sub using your regex on a string contain both dog and cat, both will be replaced. A simple example will show what I mean: import re x = re.compile(r"(A) | (B)") s = "X R A Y B E" r = x.sub("13", s) print r X R 13Y13 E ...so unless I'm understanding it wrong, "B" is supposed to be ignored if "A" is matched, yet I get both matched. I get the same result if I put "A" and "B" within the same group. On Mar 8, 2005, at 6:47 PM, Danny Yoo wrote: Regular expressions are a little evil at times; here's what I think you're thinking of: ### import re pattern = re.compile(r"""dog(?!cat) ...| (?<=dogcat)""", re.VERBOSE) pattern.match('dogman').start() 0 pattern.search('dogcatcher').start() Hi Mike, Gaaah, bad copy-and-paste. The example with 'dogcatcher' actually does come up with a result: ### pattern.search('dogcatcher').start() 6 ### Sorry about that! ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor -- 'There is only one basic human right, and that is to do as you damn well please. And with it comes the only basic human duty, to take the consequences. -- 'There is only one basic human right, and that is to do as you damn well please. And with it comes the only basic human duty, to take the consequences. ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] regular expression question
Oops I mean for i in range(len(k)): i f k[i] == 'A' or k[i]=='B': k[i ]= 13 On Thu, 10 Mar 2005 09:28:59 +1300, Liam Clarke <[EMAIL PROTECTED]> wrote: > Actually, you should get that anyway... > > """ > | > Alternation, or the ``or'' operator. If A and B are regular > expressions, A|B will match any string that matches either "A" or "B". > | has very low precedence in order to make it work reasonably when > you're alternating multi-character strings. Crow|Servo will match > either "Crow" or "Servo", not "Cro", a "w" or an "S", and "ervo". > """ > > So, for each letter in that string, it's checking to see if any letter > matches 'A' or 'B' ... > the engine steps through one character at a time. > sorta like - > > for letter in s: > if letter == 'A': > #Do some string stuff > elif letter == 'B': > #do some string stuff > > i.e. > > k = ['A','B', 'C', 'B'] > > for i in range(len(k)): > if k[i] == 'A' or k[i]=='B': >k[i]==13 > > print k > > [13, 13, 'C', 13] > > You can limit substitutions using an optional argument, but yeah, it > seems you're expecting it to examine the string as a whole. > > Check out the example here - > http://www.amk.ca/python/howto/regex/regex.html#SECTION00032 > > Also > > http://www.regular-expressions.info/alternation.html > > Regards, > > Liam Clarke > > > On Thu, 10 Mar 2005 09:09:13 +1300, Liam Clarke <[EMAIL PROTECTED]> wrote: > > Hi Mike, > > > > Do you get the same results for a search pattern of 'A|B'? > > > > > > On Wed, 9 Mar 2005 11:11:57 -0800, Mike Hall > > <[EMAIL PROTECTED]> wrote: > > > I'm having some strange results using the "or" operator. In every test > > > I do I'm matching both sides of the "|" metacharacter, not one or the > > > other as all documentation says it should be (the parser supposedly > > > scans left to right, using the first match it finds and ignoring the > > > rest). It should only go beyond the "|" if there was no match found > > > before it, no? > > > > > > Correct me if I'm wrong, but your regex is saying "match dog, unless > > > it's followed by cat. if it is followed by cat there is no match on > > > this side of the "|" at which point we advance past it and look at the > > > alternative expression which says to match in front of cat." > > > > > > However, if I run a .sub using your regex on a string contain both dog > > > and cat, both will be replaced. > > > > > > A simple example will show what I mean: > > > > > > >>> import re > > > >>> x = re.compile(r"(A) | (B)") > > > >>> s = "X R A Y B E" > > > >>> r = x.sub("13", s) > > > >>> print r > > > X R 13Y13 E > > > > > > ...so unless I'm understanding it wrong, "B" is supposed to be ignored > > > if "A" is matched, yet I get both matched. I get the same result if I > > > put "A" and "B" within the same group. > > > > > > > > > On Mar 8, 2005, at 6:47 PM, Danny Yoo wrote: > > > > > > > > > > > > > > >> > > > >> Regular expressions are a little evil at times; here's what I think > > > >> you're > > > >> thinking of: > > > >> > > > >> ### > > > > import re > > > > pattern = re.compile(r"""dog(?!cat) > > > >> ...| (?<=dogcat)""", re.VERBOSE) > > > > pattern.match('dogman').start() > > > >> 0 > > > > pattern.search('dogcatcher').start() > > > > > > > > > > > > > > > > Hi Mike, > > > > > > > > Gaaah, bad copy-and-paste. The example with 'dogcatcher' actually does > > > > come up with a result: > > > > > > > > ### > > > pattern.search('dogcatcher').start() > > > > 6 > > > > ### > > > > > > > > Sorry about that! > > > > > > > > > > ___ > > > Tutor maillist - Tutor@python.org > > > http://mail.python.org/mailman/listinfo/tutor > > > > > > > -- > > 'There is only one basic human right, and that is to do as you damn well > > please. > > And with it comes the only basic human duty, to take the consequences. > > > > -- > 'There is only one basic human right, and that is to do as you damn well > please. > And with it comes the only basic human duty, to take the consequences. > -- 'There is only one basic human right, and that is to do as you damn well please. And with it comes the only basic human duty, to take the consequences. ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] regular expression question
Actually, you should get that anyway... """ | Alternation, or the ``or'' operator. If A and B are regular expressions, A|B will match any string that matches either "A" or "B". | has very low precedence in order to make it work reasonably when you're alternating multi-character strings. Crow|Servo will match either "Crow" or "Servo", not "Cro", a "w" or an "S", and "ervo". """ So, for each letter in that string, it's checking to see if any letter matches 'A' or 'B' ... the engine steps through one character at a time. sorta like - for letter in s: if letter == 'A': #Do some string stuff elif letter == 'B': #do some string stuff i.e. k = ['A','B', 'C', 'B'] for i in range(len(k)): if k[i] == 'A' or k[i]=='B': k[i]==13 print k [13, 13, 'C', 13] You can limit substitutions using an optional argument, but yeah, it seems you're expecting it to examine the string as a whole. Check out the example here - http://www.amk.ca/python/howto/regex/regex.html#SECTION00032 Also http://www.regular-expressions.info/alternation.html Regards, Liam Clarke On Thu, 10 Mar 2005 09:09:13 +1300, Liam Clarke <[EMAIL PROTECTED]> wrote: > Hi Mike, > > Do you get the same results for a search pattern of 'A|B'? > > > On Wed, 9 Mar 2005 11:11:57 -0800, Mike Hall > <[EMAIL PROTECTED]> wrote: > > I'm having some strange results using the "or" operator. In every test > > I do I'm matching both sides of the "|" metacharacter, not one or the > > other as all documentation says it should be (the parser supposedly > > scans left to right, using the first match it finds and ignoring the > > rest). It should only go beyond the "|" if there was no match found > > before it, no? > > > > Correct me if I'm wrong, but your regex is saying "match dog, unless > > it's followed by cat. if it is followed by cat there is no match on > > this side of the "|" at which point we advance past it and look at the > > alternative expression which says to match in front of cat." > > > > However, if I run a .sub using your regex on a string contain both dog > > and cat, both will be replaced. > > > > A simple example will show what I mean: > > > > >>> import re > > >>> x = re.compile(r"(A) | (B)") > > >>> s = "X R A Y B E" > > >>> r = x.sub("13", s) > > >>> print r > > X R 13Y13 E > > > > ...so unless I'm understanding it wrong, "B" is supposed to be ignored > > if "A" is matched, yet I get both matched. I get the same result if I > > put "A" and "B" within the same group. > > > > > > On Mar 8, 2005, at 6:47 PM, Danny Yoo wrote: > > > > > > > > > > >> > > >> Regular expressions are a little evil at times; here's what I think > > >> you're > > >> thinking of: > > >> > > >> ### > > > import re > > > pattern = re.compile(r"""dog(?!cat) > > >> ...| (?<=dogcat)""", re.VERBOSE) > > > pattern.match('dogman').start() > > >> 0 > > > pattern.search('dogcatcher').start() > > > > > > > > > > > > Hi Mike, > > > > > > Gaaah, bad copy-and-paste. The example with 'dogcatcher' actually does > > > come up with a result: > > > > > > ### > > pattern.search('dogcatcher').start() > > > 6 > > > ### > > > > > > Sorry about that! > > > > > > > ___ > > Tutor maillist - Tutor@python.org > > http://mail.python.org/mailman/listinfo/tutor > > > > -- > 'There is only one basic human right, and that is to do as you damn well > please. > And with it comes the only basic human duty, to take the consequences. > -- 'There is only one basic human right, and that is to do as you damn well please. And with it comes the only basic human duty, to take the consequences. ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] regular expression question
But I only want to ignore "B" if "A" is a match. If "A" is not a match, I'd like it to advance on to "B". On Mar 9, 2005, at 12:07 PM, Marcos Mendonça wrote: Hi Not and regexp expert. But it seems to me that if you want to ignora "B" then it should be (A) | (^B) Hope it helps! On Wed, 9 Mar 2005 11:11:57 -0800, Mike Hall <[EMAIL PROTECTED]> wrote: I'm having some strange results using the "or" operator. In every test I do I'm matching both sides of the "|" metacharacter, not one or the other as all documentation says it should be (the parser supposedly scans left to right, using the first match it finds and ignoring the rest). It should only go beyond the "|" if there was no match found before it, no? Correct me if I'm wrong, but your regex is saying "match dog, unless it's followed by cat. if it is followed by cat there is no match on this side of the "|" at which point we advance past it and look at the alternative expression which says to match in front of cat." However, if I run a .sub using your regex on a string contain both dog and cat, both will be replaced. A simple example will show what I mean: import re x = re.compile(r"(A) | (B)") s = "X R A Y B E" r = x.sub("13", s) print r X R 13Y13 E ...so unless I'm understanding it wrong, "B" is supposed to be ignored if "A" is matched, yet I get both matched. I get the same result if I put "A" and "B" within the same group. On Mar 8, 2005, at 6:47 PM, Danny Yoo wrote: Regular expressions are a little evil at times; here's what I think you're thinking of: ### import re pattern = re.compile(r"""dog(?!cat) ...| (?<=dogcat)""", re.VERBOSE) pattern.match('dogman').start() 0 pattern.search('dogcatcher').start() Hi Mike, Gaaah, bad copy-and-paste. The example with 'dogcatcher' actually does come up with a result: ### pattern.search('dogcatcher').start() 6 ### Sorry about that! ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] regular expression question
Indeed I do: >>> import re >>> x = re.compile('A|B') >>> s = " Q A R B C" >>> r = x.sub("13", s) >>> print r Q 13 R 13 C On Mar 9, 2005, at 12:09 PM, Liam Clarke wrote: Hi Mike, Do you get the same results for a search pattern of 'A|B'? On Wed, 9 Mar 2005 11:11:57 -0800, Mike Hall <[EMAIL PROTECTED]> wrote: I'm having some strange results using the "or" operator. In every test I do I'm matching both sides of the "|" metacharacter, not one or the other as all documentation says it should be (the parser supposedly scans left to right, using the first match it finds and ignoring the rest). It should only go beyond the "|" if there was no match found before it, no? Correct me if I'm wrong, but your regex is saying "match dog, unless it's followed by cat. if it is followed by cat there is no match on this side of the "|" at which point we advance past it and look at the alternative expression which says to match in front of cat." However, if I run a .sub using your regex on a string contain both dog and cat, both will be replaced. A simple example will show what I mean: import re x = re.compile(r"(A) | (B)") s = "X R A Y B E" r = x.sub("13", s) print r X R 13Y13 E ...so unless I'm understanding it wrong, "B" is supposed to be ignored if "A" is matched, yet I get both matched. I get the same result if I put "A" and "B" within the same group. On Mar 8, 2005, at 6:47 PM, Danny Yoo wrote: Regular expressions are a little evil at times; here's what I think you're thinking of: ### import re pattern = re.compile(r"""dog(?!cat) ...| (?<=dogcat)""", re.VERBOSE) pattern.match('dogman').start() 0 pattern.search('dogcatcher').start() Hi Mike, Gaaah, bad copy-and-paste. The example with 'dogcatcher' actually does come up with a result: ### pattern.search('dogcatcher').start() 6 ### Sorry about that! ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor -- 'There is only one basic human right, and that is to do as you damn well please. And with it comes the only basic human duty, to take the consequences. ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] regular expression question
Hi Mike, Do you get the same results for a search pattern of 'A|B'? On Wed, 9 Mar 2005 11:11:57 -0800, Mike Hall <[EMAIL PROTECTED]> wrote: > I'm having some strange results using the "or" operator. In every test > I do I'm matching both sides of the "|" metacharacter, not one or the > other as all documentation says it should be (the parser supposedly > scans left to right, using the first match it finds and ignoring the > rest). It should only go beyond the "|" if there was no match found > before it, no? > > Correct me if I'm wrong, but your regex is saying "match dog, unless > it's followed by cat. if it is followed by cat there is no match on > this side of the "|" at which point we advance past it and look at the > alternative expression which says to match in front of cat." > > However, if I run a .sub using your regex on a string contain both dog > and cat, both will be replaced. > > A simple example will show what I mean: > > >>> import re > >>> x = re.compile(r"(A) | (B)") > >>> s = "X R A Y B E" > >>> r = x.sub("13", s) > >>> print r > X R 13Y13 E > > ...so unless I'm understanding it wrong, "B" is supposed to be ignored > if "A" is matched, yet I get both matched. I get the same result if I > put "A" and "B" within the same group. > > > On Mar 8, 2005, at 6:47 PM, Danny Yoo wrote: > > > > > > >> > >> Regular expressions are a little evil at times; here's what I think > >> you're > >> thinking of: > >> > >> ### > > import re > > pattern = re.compile(r"""dog(?!cat) > >> ...| (?<=dogcat)""", re.VERBOSE) > > pattern.match('dogman').start() > >> 0 > > pattern.search('dogcatcher').start() > > > > > > > > Hi Mike, > > > > Gaaah, bad copy-and-paste. The example with 'dogcatcher' actually does > > come up with a result: > > > > ### > pattern.search('dogcatcher').start() > > 6 > > ### > > > > Sorry about that! > > > > ___ > Tutor maillist - Tutor@python.org > http://mail.python.org/mailman/listinfo/tutor > -- 'There is only one basic human right, and that is to do as you damn well please. And with it comes the only basic human duty, to take the consequences. ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] regular expression question
I'm having some strange results using the "or" operator. In every test I do I'm matching both sides of the "|" metacharacter, not one or the other as all documentation says it should be (the parser supposedly scans left to right, using the first match it finds and ignoring the rest). It should only go beyond the "|" if there was no match found before it, no? Correct me if I'm wrong, but your regex is saying "match dog, unless it's followed by cat. if it is followed by cat there is no match on this side of the "|" at which point we advance past it and look at the alternative expression which says to match in front of cat." However, if I run a .sub using your regex on a string contain both dog and cat, both will be replaced. A simple example will show what I mean: >>> import re >>> x = re.compile(r"(A) | (B)") >>> s = "X R A Y B E" >>> r = x.sub("13", s) >>> print r X R 13Y13 E ...so unless I'm understanding it wrong, "B" is supposed to be ignored if "A" is matched, yet I get both matched. I get the same result if I put "A" and "B" within the same group. On Mar 8, 2005, at 6:47 PM, Danny Yoo wrote: Regular expressions are a little evil at times; here's what I think you're thinking of: ### import re pattern = re.compile(r"""dog(?!cat) ...| (?<=dogcat)""", re.VERBOSE) pattern.match('dogman').start() 0 pattern.search('dogcatcher').start() Hi Mike, Gaaah, bad copy-and-paste. The example with 'dogcatcher' actually does come up with a result: ### pattern.search('dogcatcher').start() 6 ### Sorry about that! ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] regular expression question
Sorry, my last reply crossed this one (and yes, I forgot again to CC the list). I'm experimenting now with your use of the "or" operator( "|") between two expressions, thanks. On Mar 8, 2005, at 6:42 PM, Danny Yoo wrote: On Tue, 8 Mar 2005, Mike Hall wrote: Yes, my existing regex is using a look behind assertion: (?<=dog) ...it's also checking the existence of "Cat": (?!Cat) ...what I'm stuck on is how to essentially use a lookbehind on "Cat", but only if it exists. Hi Mike, [Note: Please do a reply-to-all next time, so that everyone can help you.] Regular expressions are a little evil at times; here's what I think you're thinking of: ### import re pattern = re.compile(r"""dog(?!cat) ...| (?<=dogcat)""", re.VERBOSE) pattern.match('dogman').start() 0 pattern.search('dogcatcher').start() pattern.search('dogman').start() 0 pattern.search('catwoman') ### but I can't be sure without seeing some of the examples you'd like the regular expression to match against. Best of wishes to you! ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] regular expression question
> > Regular expressions are a little evil at times; here's what I think you're > thinking of: > > ### > >>> import re > >>> pattern = re.compile(r"""dog(?!cat) > ...| (?<=dogcat)""", re.VERBOSE) > >>> pattern.match('dogman').start() > 0 > >>> pattern.search('dogcatcher').start() Hi Mike, Gaaah, bad copy-and-paste. The example with 'dogcatcher' actually does come up with a result: ### >>> pattern.search('dogcatcher').start() 6 ### Sorry about that! ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] regular expression question
This will match the position in front of "dog": (?<=dog) This will match the position in front of "cat": (?<=cat) This will not match in front of "dog" if "dog" is followed by "cat": (?<=dog)\b (?!cat) Now my question is how to get this: (?<=cat) ...but ONLY if "cat" is following "dog." If "dog" does not have "cat" following it, then I simply want this: (?<=dog) ...if that makes sense :) thanks. On Mar 8, 2005, at 6:05 PM, Danny Yoo wrote: On Tue, 8 Mar 2005, Mike Hall wrote: I'd like to get a match for a position in a string preceded by a specified word (let's call it "Dog"), unless that spot in the string (after "Dog") is directly followed by a specific word(let's say "Cat"), in which case I want my match to occur directly after "Cat", and not "Dog." Hi Mike, You may want to look at "lookahead" assertions. These are patterns of the form '(?=...)' or '(?!...). The documentation mentions them here: http://www.python.org/doc/lib/re-syntax.html and AMK's excellent "Regular Expression HOWTO" covers how one might use them: http://www.amk.ca/python/howto/regex/ regex.html#SECTION00054 ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] regular expression question
On Tue, 8 Mar 2005, Mike Hall wrote: > Yes, my existing regex is using a look behind assertion: > > (?<=dog) > > ...it's also checking the existence of "Cat": > > (?!Cat) > > ...what I'm stuck on is how to essentially use a lookbehind on "Cat", > but only if it exists. Hi Mike, [Note: Please do a reply-to-all next time, so that everyone can help you.] Regular expressions are a little evil at times; here's what I think you're thinking of: ### >>> import re >>> pattern = re.compile(r"""dog(?!cat) ...| (?<=dogcat)""", re.VERBOSE) >>> pattern.match('dogman').start() 0 >>> pattern.search('dogcatcher').start() >>> pattern.search('dogman').start() 0 >>> pattern.search('catwoman') >>> ### but I can't be sure without seeing some of the examples you'd like the regular expression to match against. Best of wishes to you! ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] regular expression question
Mike Hall wrote: First, thanks for the response. Using your re: my_re = re.compile(r'(dog)(cat)?') ...I seem to simply be matching the pattern "Dog". Example: >>> str1 = "The dog chased the car" >>> str2 = "The dog cat parade was under way" >>> x1 = re.compile(r'(dog)(cat)?') >>> rep1 = x1.sub("REPLACE", str1) >>> rep2 = x2.sub("REPLACE", str2) >>> print rep1 The REPLACE chased the car >>> print rep2 The REPLACE cat parade was under way ...what I'm looking for is a match for the position in front of "Cat", should it exist. Because my regex says 'look for the word "dog" and remember where you found it. If you also find the word "cat", remember that too'. Nowhere does it say "watch out for whitespace". r'(dog)\s*(cat)?' says match 'dog' followed by zero or more whitespace (spaces, tabs, etc.) and maybe 'cat'. There is a wonderful O'Reilly book called "Mastering Regular Expressions" or as Danny points out the AMK howto is good. ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] regular expression question
First, thanks for the response. Using your re: my_re = re.compile(r'(dog)(cat)?') ...I seem to simply be matching the pattern "Dog". Example: >>> str1 = "The dog chased the car" >>> str2 = "The dog cat parade was under way" >>> x1 = re.compile(r'(dog)(cat)?') >>> rep1 = x1.sub("REPLACE", str1) >>> rep2 = x2.sub("REPLACE", str2) >>> print rep1 The REPLACE chased the car >>> print rep2 The REPLACE cat parade was under way ...what I'm looking for is a match for the position in front of "Cat", should it exist. On Mar 8, 2005, at 5:54 PM, Sean Perry wrote: Mike Hall wrote: I'd like to get a match for a position in a string preceded by a specified word (let's call it "Dog"), unless that spot in the string (after "Dog") is directly followed by a specific word(let's say "Cat"), in which case I want my match to occur directly after "Cat", and not "Dog." I can easily get the spot after "Dog," and I can also get it to ignore this spot if "Dog" is followed by "Cat." But what I'm having trouble with is how to match the spot after "Cat" if this word does indeed exist in the string. . >>> import re . >>> my_re = re.compile(r'(dog)(cat)?') # the ? means "find one or zero of these, in other words cat is optional. . >>> m = my_re.search("This is a nice dog is it not?") . >>> dir(m) ['__copy__', '__deepcopy__', 'end', 'expand', 'group', 'groupdict', 'groups', 'span', 'start'] . >>> m.span() (15, 18) . >>> m = my_re.search("This is a nice dogcat is it not?") . >>> m.span() (15, 21) If m is None then no match was found. span returns the locations in the string where the match occured. So in the dogcat sentence the last char is 21. . >>> "This is a nice dogcat is it not?"[21:] ' is it not?' Hope that helps. ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] regular expression question
On Tue, 8 Mar 2005, Mike Hall wrote: > I'd like to get a match for a position in a string preceded by a > specified word (let's call it "Dog"), unless that spot in the string > (after "Dog") is directly followed by a specific word(let's say "Cat"), > in which case I want my match to occur directly after "Cat", and not > "Dog." Hi Mike, You may want to look at "lookahead" assertions. These are patterns of the form '(?=...)' or '(?!...). The documentation mentions them here: http://www.python.org/doc/lib/re-syntax.html and AMK's excellent "Regular Expression HOWTO" covers how one might use them: http://www.amk.ca/python/howto/regex/regex.html#SECTION00054 ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] regular expression question
Mike Hall wrote: I'd like to get a match for a position in a string preceded by a specified word (let's call it "Dog"), unless that spot in the string (after "Dog") is directly followed by a specific word(let's say "Cat"), in which case I want my match to occur directly after "Cat", and not "Dog." I can easily get the spot after "Dog," and I can also get it to ignore this spot if "Dog" is followed by "Cat." But what I'm having trouble with is how to match the spot after "Cat" if this word does indeed exist in the string. . >>> import re . >>> my_re = re.compile(r'(dog)(cat)?') # the ? means "find one or zero of these, in other words cat is optional. . >>> m = my_re.search("This is a nice dog is it not?") . >>> dir(m) ['__copy__', '__deepcopy__', 'end', 'expand', 'group', 'groupdict', 'groups', 'span', 'start'] . >>> m.span() (15, 18) . >>> m = my_re.search("This is a nice dogcat is it not?") . >>> m.span() (15, 21) If m is None then no match was found. span returns the locations in the string where the match occured. So in the dogcat sentence the last char is 21. . >>> "This is a nice dogcat is it not?"[21:] ' is it not?' Hope that helps. ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor