Re: [Tutor] RE Silliness
Omer wrote: Bob, I tried your way. >>> import re >>> urlMask = r"http://[\w\Q./\?=\R]+()?" >>> text=u"Not working examplehttp://this.is.a/url?header=nullAnd another linehttp://and.another.url"; >>> re.findall(urlMask,text) [u'', u''] Oops I failed to notice you were using findall. Kent explained it. Another way to fix it is to make () a non-group: (?:) -- Bob Gailer Chapel Hill NC 919-636-4239 ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] RE Silliness
On Mon, Jan 5, 2009 at 11:16 AM, Omer wrote: > Bob, I tried your way. > import re urlMask = r"http://[\w\Q./\?=\R]+()?" text=u"Not working examplehttp://this.is.a/url?header=nullAnd another linehttp://and.another.url"; re.findall(urlMask,text) > [u'', u''] > > spir, I did understand it. What I'm not understanding is why isn't this > working. There is a bit of a gotcha in re.findall() - its behaviour changes depending on whether there are groups in the re. If the re contains groups, re.findall() only returns the matches for the groups. If you enclose the entire re in parentheses (making it a group) you get a better result: In [2]: urlMask = r"(http://[\w\Q./\?=\R]+()?)" In [3]: text=u"Not working examplehttp://this.is.a/url?header=nullAnd another linehttp://and.another.url"; In [4]: re.findall(urlMask,text) Out[4]: [(u'http://this.is.a/url?header=null', u''), (u'http://and.another.url', u'')] You can also use non-grouping parentheses around the : In [5]: urlMask = r"http://[\w\Q./\?=\R]+(?:)?" In [6]: re.findall(urlMask,text) Out[6]: [u'http://this.is.a/url?header=null', u'http://and.another.url'] Kent ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] RE Silliness
Bob, I tried your way. >>> import re >>> urlMask = r"http://[\w\Q./\?=\R]+()?" >>> text=u"Not working examplehttp://this.is.a/url?header=nullAnd another linehttp://and.another.url"; >>> re.findall(urlMask,text) [u'', u''] spir, I did understand it. What I'm not understanding is why isn't this working. (Whereas, >>> OldurlMask = r"http://[\w\Q./\?=\R]+"; #Not f-ing working. >>> re.findall(OldurlMask,text) ['http://this.is.a/url?header=null', 'http://and.another.url'] does work. Which is what had me frowning. Also, this ugly url mask is working: >>> UglyUrlMask = r"(http://[\w\Q./\?=\R]+|http://[\w\Q./\?=\R]+)" >>> re.findall(UglyUrlMask,text) ['http://this.is.a/url?header=null', 'http://and.another.url'] Anyone?) On Mon, Jan 5, 2009 at 12:08 AM, spir wrote: > On Sun, 04 Jan 2009 14:09:53 -0500 > bob gailer wrote: > > > Omer wrote: > > > I'm sorry, burrowed into the reference until my eyes bled. > > > > > > What I want is to have a regular expression with an optional ending of > > > "" > > > > > > (For those interested, > > > urlMask = r"http://[\w\Q./\?=\R]+"; > > > is ther version w/o the optional ending.) > > > > > > I can't seem to make a string optional- only a single character via > > > []s. I for some reason thuoght it'll be ()s, but no help there- it > > > just returns only the . Anybody? > > > > > urlMask = r"http://[\w\Q./\?=\R]+()?" > > > > From the docs: ? Causes the resulting RE to match 0 or 1 repetitions of > > the preceding RE. ab? will match either 'a' or 'ab'. > > > > > > Maybe Omer had not noted that a sub-expression can be grouped in () so that > an operator (?+*) applies on the whole group. > Denis > > -- > la vida e estranya > ___ > Tutor maillist - Tutor@python.org > http://mail.python.org/mailman/listinfo/tutor > ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] RE Silliness
On Sun, 04 Jan 2009 14:09:53 -0500 bob gailer wrote: > Omer wrote: > > I'm sorry, burrowed into the reference until my eyes bled. > > > > What I want is to have a regular expression with an optional ending of > > "" > > > > (For those interested, > > urlMask = r"http://[\w\Q./\?=\R]+"; > > is ther version w/o the optional ending.) > > > > I can't seem to make a string optional- only a single character via > > []s. I for some reason thuoght it'll be ()s, but no help there- it > > just returns only the . Anybody? > > > urlMask = r"http://[\w\Q./\?=\R]+()?" > > From the docs: ? Causes the resulting RE to match 0 or 1 repetitions of > the preceding RE. ab? will match either 'a' or 'ab'. > > Maybe Omer had not noted that a sub-expression can be grouped in () so that an operator (?+*) applies on the whole group. Denis -- la vida e estranya ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] RE Silliness
Omer wrote: I'm sorry, burrowed into the reference until my eyes bled. What I want is to have a regular expression with an optional ending of "" (For those interested, urlMask = r"http://[\w\Q./\?=\R]+"; is ther version w/o the optional ending.) I can't seem to make a string optional- only a single character via []s. I for some reason thuoght it'll be ()s, but no help there- it just returns only the . Anybody? urlMask = r"http://[\w\Q./\?=\R]+()?" From the docs: ? Causes the resulting RE to match 0 or 1 repetitions of the preceding RE. ab? will match either 'a' or 'ab'. -- Bob Gailer Chapel Hill NC 919-636-4239 ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
[Tutor] RE Silliness
I'm sorry, burrowed into the reference until my eyes bled. What I want is to have a regular expression with an optional ending of "" (For those interested, urlMask = r"http://[\w\Q./\?=\R]+"; is ther version w/o the optional ending.) I can't seem to make a string optional- only a single character via []s. I for some reason thuoght it'll be ()s, but no help there- it just returns only the . Anybody? Thx, Omer. ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor