Re: Need help in Python regular expression

2009-06-12 Thread Vlastimil Brom
2009/6/12 meryl silverburgh.me...@gmail.com:
 On Jun 11, 9:41 pm, Mark Tolonen metolone+gm...@gmail.com wrote:
 meryl silverburgh.me...@gmail.com wrote in message

  I have this regular expression

...

 I try adding .* at the end , but it ends up just matching the second
 one.

If there can be more matches in a line, maybe the non-greedy
quantifier .*?, and a lookahead assertion can help.
You can try something like:
(?m)Render(?:Block|Table) (?:\(\w+\)|{\w+})(.+?(?=$|RenderBlock))?

(?m) multiline flag - also the end of line can be matched with $
.+? any character - one or more (no greedy, i.e. as little as possible)
(?=$|RenderBlock) the lookahead assertion - condition for the
following string - not part of the match - here the end of line/string
or RenderBlock

I guess, if you need to add more possibilities or conditions depending
on your source data, it might get too complex for a single regular
expression to match effectively.

hth
  vbr
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Need help in Python regular expression

2009-06-12 Thread Jean-Michel Pichavant

To the OP,

I suggest if you haven't yet Kodos, to get it here 
http://kodos.sourceforge.net/.

It's a python regexp debugger, a lifetime saver.

Jean-Michel

John S wrote:

On Jun 11, 10:30 pm, meryl silverburgh.me...@gmail.com wrote:
  

Hi,

I have this regular expression
blockRE = re.compile(.*RenderBlock {\w+})

it works if my source is RenderBlock {CENTER}.

But I want it to work with
1. RenderTable {TABLE}

So i change the regexp to re.compile(.*Render[Block|Table] {\w+}),
but that breaks everything

2. RenderBlock (CENTER)

So I change the regexp to re.compile(.*RenderBlock {|\(\w+}|\)),
that also breaks everything

Can you please tell me how to change my reg exp so that I can support
all 3 cases:
RenderTable {TABLE}
RenderBlock (CENTER)
RenderBlock {CENTER}

Thank you.



Short answer:

r = re.compile(rRender(?:Block|Table)\s+[({](?:TABLE|CENTER)[})])

s = 
blah blah blah
blah blah blah RenderBlock {CENTER} blah blah RenderBlock {CENTER}
blah blah blah RenderTable {TABLE} blah blah RenderBlock (CENTER)
blah blah blah


print r.findall(s)



output:
['RenderBlock {CENTER}', 'RenderBlock {CENTER}', 'RenderTable
{TABLE}', 'RenderBlock (CENTER)']



Note that [] only encloses characters, not strings; [foo|bar] matches
'f','o','|','b','a', or 'r', not foo or bar.
Use (foo|bar) to match foo or bar; (?xxx) matches xxx without
making a backreference (i.e., without capturing text).

HTH

-- John Strickler
  


--
http://mail.python.org/mailman/listinfo/python-list


Re: Need help in Python regular expression

2009-06-12 Thread Rhodri James
On Fri, 12 Jun 2009 06:20:24 +0100, meryl silverburgh.me...@gmail.com  
wrote:

On Jun 11, 9:41 pm, Mark Tolonen metolone+gm...@gmail.com wrote:

meryl silverburgh.me...@gmail.com wrote in message

 Hi,

 I have this regular expression
 blockRE = re.compile(.*RenderBlock {\w+})

 it works if my source is RenderBlock {CENTER}.


[snip]


---code--
import re
pat = re.compile(r'Render(?:Block|Table) (?:\(\w+\)|{\w+})')

testdata = '''\
RenderTable {TABLE}
RenderBlock (CENTER)
RenderBlock {CENTER}
RenderTable {TABLE)      #shouldn't match
'''

print pat.findall(testdata)
---

Result:

['RenderTable {TABLE}', 'RenderBlock (CENTER)', 'RenderBlock {CENTER}']

-Mark


Thanks for both of your help. How can i modify the RegExp so that
both
RenderTable {TABLE}
and
RenderTable {TABLE} [text with a-zA-Z=SPACE0-9]
will match

I try adding .* at the end , but it ends up just matching the second
one.


Curious, it should work (and match rather more than you want, but
that's another matter.  Try adding this instead:

'(?: \[[a-zA-Z= 0-9]*\])?'

Personally I'd replace all those spaces with \s* or \s+, but I'm
paranoid when it comes to whitespace.

--
Rhodri James *-* Wildebeest Herder to the Masses
--
http://mail.python.org/mailman/listinfo/python-list


Re: Need help in Python regular expression

2009-06-11 Thread Mark Tolonen
meryl silverburgh.me...@gmail.com wrote in message 
news:2d4d8624-043b-4f5f-ae2d-bf73bca3d...@p6g2000pre.googlegroups.com...

Hi,

I have this regular expression
blockRE = re.compile(.*RenderBlock {\w+})

it works if my source is RenderBlock {CENTER}.

But I want it to work with
1. RenderTable {TABLE}

So i change the regexp to re.compile(.*Render[Block|Table] {\w+}),
but that breaks everything

2. RenderBlock (CENTER)

So I change the regexp to re.compile(.*RenderBlock {|\(\w+}|\)),
that also breaks everything

Can you please tell me how to change my reg exp so that I can support
all 3 cases:
RenderTable {TABLE}
RenderBlock (CENTER)
RenderBlock {CENTER}


[abcd] syntax matches a single character from the set.  Use non-grouping 
parentheses instead:


---code--
import re
pat = re.compile(r'Render(?:Block|Table) (?:\(\w+\)|{\w+})')

testdata = '''\
RenderTable {TABLE}
RenderBlock (CENTER)
RenderBlock {CENTER}
RenderTable {TABLE)  #shouldn't match
'''

print pat.findall(testdata)
---

Result:

['RenderTable {TABLE}', 'RenderBlock (CENTER)', 'RenderBlock {CENTER}']

-Mark


--
http://mail.python.org/mailman/listinfo/python-list


Re: Need help in Python regular expression

2009-06-11 Thread John S
On Jun 11, 10:30 pm, meryl silverburgh.me...@gmail.com wrote:
 Hi,

 I have this regular expression
 blockRE = re.compile(.*RenderBlock {\w+})

 it works if my source is RenderBlock {CENTER}.

 But I want it to work with
 1. RenderTable {TABLE}

 So i change the regexp to re.compile(.*Render[Block|Table] {\w+}),
 but that breaks everything

 2. RenderBlock (CENTER)

 So I change the regexp to re.compile(.*RenderBlock {|\(\w+}|\)),
 that also breaks everything

 Can you please tell me how to change my reg exp so that I can support
 all 3 cases:
 RenderTable {TABLE}
 RenderBlock (CENTER)
 RenderBlock {CENTER}

 Thank you.

Short answer:

r = re.compile(rRender(?:Block|Table)\s+[({](?:TABLE|CENTER)[})])

s = 
blah blah blah
blah blah blah RenderBlock {CENTER} blah blah RenderBlock {CENTER}
blah blah blah RenderTable {TABLE} blah blah RenderBlock (CENTER)
blah blah blah


print r.findall(s)



output:
['RenderBlock {CENTER}', 'RenderBlock {CENTER}', 'RenderTable
{TABLE}', 'RenderBlock (CENTER)']



Note that [] only encloses characters, not strings; [foo|bar] matches
'f','o','|','b','a', or 'r', not foo or bar.
Use (foo|bar) to match foo or bar; (?xxx) matches xxx without
making a backreference (i.e., without capturing text).

HTH

-- John Strickler
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Need help in Python regular expression

2009-06-11 Thread meryl
On Jun 11, 9:41 pm, Mark Tolonen metolone+gm...@gmail.com wrote:
 meryl silverburgh.me...@gmail.com wrote in message

 news:2d4d8624-043b-4f5f-ae2d-bf73bca3d...@p6g2000pre.googlegroups.com...





  Hi,

  I have this regular expression
  blockRE = re.compile(.*RenderBlock {\w+})

  it works if my source is RenderBlock {CENTER}.

  But I want it to work with
  1. RenderTable {TABLE}

  So i change the regexp to re.compile(.*Render[Block|Table] {\w+}),
  but that breaks everything

  2. RenderBlock (CENTER)

  So I change the regexp to re.compile(.*RenderBlock {|\(\w+}|\)),
  that also breaks everything

  Can you please tell me how to change my reg exp so that I can support
  all 3 cases:
  RenderTable {TABLE}
  RenderBlock (CENTER)
  RenderBlock {CENTER}

 [abcd] syntax matches a single character from the set.  Use non-grouping
 parentheses instead:

 ---code--
 import re
 pat = re.compile(r'Render(?:Block|Table) (?:\(\w+\)|{\w+})')

 testdata = '''\
 RenderTable {TABLE}
 RenderBlock (CENTER)
 RenderBlock {CENTER}
 RenderTable {TABLE)      #shouldn't match
 '''

 print pat.findall(testdata)
 ---

 Result:

 ['RenderTable {TABLE}', 'RenderBlock (CENTER)', 'RenderBlock {CENTER}']

 -Mark

Thanks for both of your help. How can i modify the RegExp so that
both
RenderTable {TABLE}
and
RenderTable {TABLE} [text with a-zA-Z=SPACE0-9]
will match

I try adding .* at the end , but it ends up just matching the second
one.

Thanks again.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Need help with a regular expression

2007-12-19 Thread marek . rocki
On 19 Gru, 13:08, Sharun [EMAIL PROTECTED] wrote:
 I am trying to find the substring starting with 'aaa', and ending with
 ddd OR fff. If ddd is found shouldnt the search stop? Shouldn't
 re5.search(str5).group(0) return 'aaa bbb\r\n ccc ddd' ?

The documentation for the re module (http://docs.python.org/lib/re-
syntax.html), tells you that the *, +, and ? qualifiers are all
greedy; they match as much text as possible. What you are looking for
are the qualifiers *?, +?, ??. Your regex pattern might look
like this: aaa.*?(ddd|fff).

Regards,
Marek
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Need help with a regular expression

2007-12-19 Thread Paddy
On Dec 19, 12:08 pm, Sharun [EMAIL PROTECTED] wrote:
 Python newbie here. I am not clear about how the matching is taking
 place when I do the following

 str5 = 'aaa bbb\r\n ccc ddd\r\n eee fff'
 re5=re.compile('aaa.*(ddd|fff)',re.S);
 re5.search(str5).group(0)

 'aaa bbb\r\n ccc ddd\r\n eee fff'

 re5.search(str5).group(1)

 'fff'

 I am trying to find the substring starting with 'aaa', and ending with
 ddd OR fff. If ddd is found shouldnt the search stop? Shouldn't
 re5.search(str5).group(0) return 'aaa bbb\r\n ccc ddd' ?

 Thanks

Have an RE problem in Python?

Get Kodos! (http://kodos.sourceforge.net/)

- Paddy.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Need help with a regular expression

2007-12-19 Thread Sharun
Thanks Marek!
-- 
http://mail.python.org/mailman/listinfo/python-list