Re: Need help in Python regular expression
2009/6/12 meryl silverburgh.me...@gmail.com: On Jun 11, 9:41 pm, Mark Tolonen metolone+gm...@gmail.com wrote: meryl silverburgh.me...@gmail.com wrote in message I have this regular expression ... I try adding .* at the end , but it ends up just matching the second one. If there can be more matches in a line, maybe the non-greedy quantifier .*?, and a lookahead assertion can help. You can try something like: (?m)Render(?:Block|Table) (?:\(\w+\)|{\w+})(.+?(?=$|RenderBlock))? (?m) multiline flag - also the end of line can be matched with $ .+? any character - one or more (no greedy, i.e. as little as possible) (?=$|RenderBlock) the lookahead assertion - condition for the following string - not part of the match - here the end of line/string or RenderBlock I guess, if you need to add more possibilities or conditions depending on your source data, it might get too complex for a single regular expression to match effectively. hth vbr -- http://mail.python.org/mailman/listinfo/python-list
Re: Need help in Python regular expression
To the OP, I suggest if you haven't yet Kodos, to get it here http://kodos.sourceforge.net/. It's a python regexp debugger, a lifetime saver. Jean-Michel John S wrote: On Jun 11, 10:30 pm, meryl silverburgh.me...@gmail.com wrote: Hi, I have this regular expression blockRE = re.compile(.*RenderBlock {\w+}) it works if my source is RenderBlock {CENTER}. But I want it to work with 1. RenderTable {TABLE} So i change the regexp to re.compile(.*Render[Block|Table] {\w+}), but that breaks everything 2. RenderBlock (CENTER) So I change the regexp to re.compile(.*RenderBlock {|\(\w+}|\)), that also breaks everything Can you please tell me how to change my reg exp so that I can support all 3 cases: RenderTable {TABLE} RenderBlock (CENTER) RenderBlock {CENTER} Thank you. Short answer: r = re.compile(rRender(?:Block|Table)\s+[({](?:TABLE|CENTER)[})]) s = blah blah blah blah blah blah RenderBlock {CENTER} blah blah RenderBlock {CENTER} blah blah blah RenderTable {TABLE} blah blah RenderBlock (CENTER) blah blah blah print r.findall(s) output: ['RenderBlock {CENTER}', 'RenderBlock {CENTER}', 'RenderTable {TABLE}', 'RenderBlock (CENTER)'] Note that [] only encloses characters, not strings; [foo|bar] matches 'f','o','|','b','a', or 'r', not foo or bar. Use (foo|bar) to match foo or bar; (?xxx) matches xxx without making a backreference (i.e., without capturing text). HTH -- John Strickler -- http://mail.python.org/mailman/listinfo/python-list
Re: Need help in Python regular expression
On Fri, 12 Jun 2009 06:20:24 +0100, meryl silverburgh.me...@gmail.com wrote: On Jun 11, 9:41 pm, Mark Tolonen metolone+gm...@gmail.com wrote: meryl silverburgh.me...@gmail.com wrote in message Hi, I have this regular expression blockRE = re.compile(.*RenderBlock {\w+}) it works if my source is RenderBlock {CENTER}. [snip] ---code-- import re pat = re.compile(r'Render(?:Block|Table) (?:\(\w+\)|{\w+})') testdata = '''\ RenderTable {TABLE} RenderBlock (CENTER) RenderBlock {CENTER} RenderTable {TABLE) #shouldn't match ''' print pat.findall(testdata) --- Result: ['RenderTable {TABLE}', 'RenderBlock (CENTER)', 'RenderBlock {CENTER}'] -Mark Thanks for both of your help. How can i modify the RegExp so that both RenderTable {TABLE} and RenderTable {TABLE} [text with a-zA-Z=SPACE0-9] will match I try adding .* at the end , but it ends up just matching the second one. Curious, it should work (and match rather more than you want, but that's another matter. Try adding this instead: '(?: \[[a-zA-Z= 0-9]*\])?' Personally I'd replace all those spaces with \s* or \s+, but I'm paranoid when it comes to whitespace. -- Rhodri James *-* Wildebeest Herder to the Masses -- http://mail.python.org/mailman/listinfo/python-list
Re: Need help in Python regular expression
meryl silverburgh.me...@gmail.com wrote in message news:2d4d8624-043b-4f5f-ae2d-bf73bca3d...@p6g2000pre.googlegroups.com... Hi, I have this regular expression blockRE = re.compile(.*RenderBlock {\w+}) it works if my source is RenderBlock {CENTER}. But I want it to work with 1. RenderTable {TABLE} So i change the regexp to re.compile(.*Render[Block|Table] {\w+}), but that breaks everything 2. RenderBlock (CENTER) So I change the regexp to re.compile(.*RenderBlock {|\(\w+}|\)), that also breaks everything Can you please tell me how to change my reg exp so that I can support all 3 cases: RenderTable {TABLE} RenderBlock (CENTER) RenderBlock {CENTER} [abcd] syntax matches a single character from the set. Use non-grouping parentheses instead: ---code-- import re pat = re.compile(r'Render(?:Block|Table) (?:\(\w+\)|{\w+})') testdata = '''\ RenderTable {TABLE} RenderBlock (CENTER) RenderBlock {CENTER} RenderTable {TABLE) #shouldn't match ''' print pat.findall(testdata) --- Result: ['RenderTable {TABLE}', 'RenderBlock (CENTER)', 'RenderBlock {CENTER}'] -Mark -- http://mail.python.org/mailman/listinfo/python-list
Re: Need help in Python regular expression
On Jun 11, 10:30 pm, meryl silverburgh.me...@gmail.com wrote: Hi, I have this regular expression blockRE = re.compile(.*RenderBlock {\w+}) it works if my source is RenderBlock {CENTER}. But I want it to work with 1. RenderTable {TABLE} So i change the regexp to re.compile(.*Render[Block|Table] {\w+}), but that breaks everything 2. RenderBlock (CENTER) So I change the regexp to re.compile(.*RenderBlock {|\(\w+}|\)), that also breaks everything Can you please tell me how to change my reg exp so that I can support all 3 cases: RenderTable {TABLE} RenderBlock (CENTER) RenderBlock {CENTER} Thank you. Short answer: r = re.compile(rRender(?:Block|Table)\s+[({](?:TABLE|CENTER)[})]) s = blah blah blah blah blah blah RenderBlock {CENTER} blah blah RenderBlock {CENTER} blah blah blah RenderTable {TABLE} blah blah RenderBlock (CENTER) blah blah blah print r.findall(s) output: ['RenderBlock {CENTER}', 'RenderBlock {CENTER}', 'RenderTable {TABLE}', 'RenderBlock (CENTER)'] Note that [] only encloses characters, not strings; [foo|bar] matches 'f','o','|','b','a', or 'r', not foo or bar. Use (foo|bar) to match foo or bar; (?xxx) matches xxx without making a backreference (i.e., without capturing text). HTH -- John Strickler -- http://mail.python.org/mailman/listinfo/python-list
Re: Need help in Python regular expression
On Jun 11, 9:41 pm, Mark Tolonen metolone+gm...@gmail.com wrote: meryl silverburgh.me...@gmail.com wrote in message news:2d4d8624-043b-4f5f-ae2d-bf73bca3d...@p6g2000pre.googlegroups.com... Hi, I have this regular expression blockRE = re.compile(.*RenderBlock {\w+}) it works if my source is RenderBlock {CENTER}. But I want it to work with 1. RenderTable {TABLE} So i change the regexp to re.compile(.*Render[Block|Table] {\w+}), but that breaks everything 2. RenderBlock (CENTER) So I change the regexp to re.compile(.*RenderBlock {|\(\w+}|\)), that also breaks everything Can you please tell me how to change my reg exp so that I can support all 3 cases: RenderTable {TABLE} RenderBlock (CENTER) RenderBlock {CENTER} [abcd] syntax matches a single character from the set. Use non-grouping parentheses instead: ---code-- import re pat = re.compile(r'Render(?:Block|Table) (?:\(\w+\)|{\w+})') testdata = '''\ RenderTable {TABLE} RenderBlock (CENTER) RenderBlock {CENTER} RenderTable {TABLE) #shouldn't match ''' print pat.findall(testdata) --- Result: ['RenderTable {TABLE}', 'RenderBlock (CENTER)', 'RenderBlock {CENTER}'] -Mark Thanks for both of your help. How can i modify the RegExp so that both RenderTable {TABLE} and RenderTable {TABLE} [text with a-zA-Z=SPACE0-9] will match I try adding .* at the end , but it ends up just matching the second one. Thanks again. -- http://mail.python.org/mailman/listinfo/python-list
Re: Need help with a regular expression
On 19 Gru, 13:08, Sharun [EMAIL PROTECTED] wrote: I am trying to find the substring starting with 'aaa', and ending with ddd OR fff. If ddd is found shouldnt the search stop? Shouldn't re5.search(str5).group(0) return 'aaa bbb\r\n ccc ddd' ? The documentation for the re module (http://docs.python.org/lib/re- syntax.html), tells you that the *, +, and ? qualifiers are all greedy; they match as much text as possible. What you are looking for are the qualifiers *?, +?, ??. Your regex pattern might look like this: aaa.*?(ddd|fff). Regards, Marek -- http://mail.python.org/mailman/listinfo/python-list
Re: Need help with a regular expression
On Dec 19, 12:08 pm, Sharun [EMAIL PROTECTED] wrote: Python newbie here. I am not clear about how the matching is taking place when I do the following str5 = 'aaa bbb\r\n ccc ddd\r\n eee fff' re5=re.compile('aaa.*(ddd|fff)',re.S); re5.search(str5).group(0) 'aaa bbb\r\n ccc ddd\r\n eee fff' re5.search(str5).group(1) 'fff' I am trying to find the substring starting with 'aaa', and ending with ddd OR fff. If ddd is found shouldnt the search stop? Shouldn't re5.search(str5).group(0) return 'aaa bbb\r\n ccc ddd' ? Thanks Have an RE problem in Python? Get Kodos! (http://kodos.sourceforge.net/) - Paddy. -- http://mail.python.org/mailman/listinfo/python-list
Re: Need help with a regular expression
Thanks Marek! -- http://mail.python.org/mailman/listinfo/python-list