Re: Python's regular expression help

2010-04-29 Thread goldtech
On Apr 29, 11:49 am, Tim Chase  wrote:
> On 04/29/2010 01:00 PM, goldtech wrote:
>
> > Trying to start out with simple things but apparently there's some
> > basics I need help with. This works OK:
>  import re
>  p = re.compile('(ab*)(sss)')
>  m = p.match( 'absss' )
>
>  f=r'abss'
>  f
> > 'abss'
>  m = p.match( f )
>  m.group(0)
> > Traceback (most recent call last):
> >    File "", line 1, in
> >      m.group(0)
> > AttributeError: 'NoneType' object has no attribute 'group'
>
> 'absss' != 'abss'
>
> Your regexp looks for 3 "s", your "f" contains only 2.  So the
> regexp object doesn't, well, match.  Try
>
>    f = 'absss'
>
> and it will work.  As an aside, using raw-strings for this text
> doesn't change anything, but if you want, you _can_ write it as
>
>    f = r'absss'
>
> if it will make you feel better :)
>
> > How do I implement a regex on a multiline string?  I thought this
> > might work but there's problem:
>
>  p = re.compile('(ab*)(sss)', re.S)
>  m = p.match( 'ab\nsss' )
>  m.group(0)
> > Traceback (most recent call last):
> >    File "", line 1, in
> >      m.group(0)
> > AttributeError: 'NoneType' object has no attribute 'group'
>
> Well, it depends on what you want to do -- regexps are fairly
> precise, so if you want to allow whitespace between the two, you
> can use
>
>    r = re.compile(r'(ab*)\s*(sss)')
>
> If you want to allow whitespace anywhere, it gets uglier, and
> your capture/group results will contain that whitespace:
>
>    r'(a\s*b*)\s*(s\s*s\s*s)'
>
> Alternatively, if you don't want to allow arbitrary whitespace
> but only newlines, you can use "\n*" instead of "\s*"
>
> -tkc

Yes, most of my problem is w/my patterns not w/any python re syntax.

I thought re.S will take a multiline string with any spaces or
newlines and make it appear as one line to the regex. Make "/n" be
ignored in a way...still playing w/it. Thanks for the help!
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Python's regular expression help

2010-04-29 Thread Tim Chase

On 04/29/2010 01:00 PM, goldtech wrote:

Trying to start out with simple things but apparently there's some
basics I need help with. This works OK:

import re
p = re.compile('(ab*)(sss)')
m = p.match( 'absss' )



f=r'abss'
f

'abss'

m = p.match( f )
m.group(0)

Traceback (most recent call last):
   File "", line 1, in
 m.group(0)
AttributeError: 'NoneType' object has no attribute 'group'


'absss' != 'abss'

Your regexp looks for 3 "s", your "f" contains only 2.  So the 
regexp object doesn't, well, match.  Try


  f = 'absss'

and it will work.  As an aside, using raw-strings for this text 
doesn't change anything, but if you want, you _can_ write it as


  f = r'absss'

if it will make you feel better :)


How do I implement a regex on a multiline string?  I thought this
might work but there's problem:


p = re.compile('(ab*)(sss)', re.S)
m = p.match( 'ab\nsss' )
m.group(0)

Traceback (most recent call last):
   File "", line 1, in
 m.group(0)
AttributeError: 'NoneType' object has no attribute 'group'


Well, it depends on what you want to do -- regexps are fairly 
precise, so if you want to allow whitespace between the two, you 
can use


  r = re.compile(r'(ab*)\s*(sss)')

If you want to allow whitespace anywhere, it gets uglier, and 
your capture/group results will contain that whitespace:


  r'(a\s*b*)\s*(s\s*s\s*s)'

Alternatively, if you don't want to allow arbitrary whitespace 
but only newlines, you can use "\n*" instead of "\s*"


-tkc



--
http://mail.python.org/mailman/listinfo/python-list


Re: Python's regular expression help

2010-04-29 Thread MRAB

goldtech wrote:

Hi,
Trying to start out with simple things but apparently there's some
basics I need help with. This works OK:

import re
p = re.compile('(ab*)(sss)')
m = p.match( 'absss' )
m.group(0)

'absss'

m.group(1)

'ab'

m.group(2)

'sss'
...
But two questions:

How can I operate a regex on a string variable?
I'm doing something wrong here:


f=r'abss'
f

'abss'

m = p.match( f )
m.group(0)

Traceback (most recent call last):
  File "", line 1, in 
m.group(0)
AttributeError: 'NoneType' object has no attribute 'group'


Look closely: the regex contains 3 letter 's', but the string referred
to by f has only 2.


How do I implement a regex on a multiline string?  I thought this
might work but there's problem:


p = re.compile('(ab*)(sss)', re.S)
m = p.match( 'ab\nsss' )
m.group(0)

Traceback (most recent call last):
  File "", line 1, in 
m.group(0)
AttributeError: 'NoneType' object has no attribute 'group'

Thanks for the newbie regex help, Lee


The string contains a newline between the 'b' and the 's', but the regex
isn't expecting any newline (or any other character) between the 'b' and
the 's', hence no match.
--
http://mail.python.org/mailman/listinfo/python-list


Re: Python's regular expression help

2010-04-29 Thread Dodo

Le 29/04/2010 20:00, goldtech a écrit :

Hi,
Trying to start out with simple things but apparently there's some
basics I need help with. This works OK:

import re
p = re.compile('(ab*)(sss)')
m = p.match( 'absss' )
m.group(0)

'absss'

m.group(1)

'ab'

m.group(2)

'sss'
...
But two questions:

How can I operate a regex on a string variable?
I'm doing something wrong here:


f=r'abss'
f

'abss'

m = p.match( f )
m.group(0)

Traceback (most recent call last):
   File "", line 1, in
 m.group(0)
AttributeError: 'NoneType' object has no attribute 'group'

How do I implement a regex on a multiline string?  I thought this
might work but there's problem:


p = re.compile('(ab*)(sss)', re.S)
m = p.match( 'ab\nsss' )
m.group(0)

Traceback (most recent call last):
   File "", line 1, in
 m.group(0)
AttributeError: 'NoneType' object has no attribute 'group'




Thanks for the newbie regex help, Lee


for multiline, I use re.DOTALL

I do not know match(), findall is pretty efficient :
my = "LINK"
res = re.findall(">(.*?)<",my)
>>> res
['LINK']

Dorian
--
http://mail.python.org/mailman/listinfo/python-list


Python's regular expression help

2010-04-29 Thread goldtech
Hi,
Trying to start out with simple things but apparently there's some
basics I need help with. This works OK:
>>> import re
>>> p = re.compile('(ab*)(sss)')
>>> m = p.match( 'absss' )
>>> m.group(0)
'absss'
>>> m.group(1)
'ab'
>>> m.group(2)
'sss'
...
But two questions:

How can I operate a regex on a string variable?
I'm doing something wrong here:

>>> f=r'abss'
>>> f
'abss'
>>> m = p.match( f )
>>> m.group(0)
Traceback (most recent call last):
  File "", line 1, in 
m.group(0)
AttributeError: 'NoneType' object has no attribute 'group'

How do I implement a regex on a multiline string?  I thought this
might work but there's problem:

>>> p = re.compile('(ab*)(sss)', re.S)
>>> m = p.match( 'ab\nsss' )
>>> m.group(0)
Traceback (most recent call last):
  File "", line 1, in 
m.group(0)
AttributeError: 'NoneType' object has no attribute 'group'
>>>

Thanks for the newbie regex help, Lee
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: A bug in Python's regular expression engine?

2007-11-27 Thread MonkeeSage
On Nov 27, 10:52 am, MonkeeSage <[EMAIL PROTECTED]> wrote:
> On Nov 27, 10:19 am, "Just Another Victim of the Ambient Morality"
>
> <[EMAIL PROTECTED]> wrote:
> > That is funny.  Thank you for your help...
> > Just for clarification, what does the "r" in your code do?
>
> It means a "raw" string (as you know ruby, think of it like %w{}):
>
> This page explains about string literal prefixes (see especially the
> end-notes):
>
> http://docs.python.org/ref/strings.html
>
> HTH,
> Jordan

Arg! %w{} should have said %q{}
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: A bug in Python's regular expression engine?

2007-11-27 Thread MonkeeSage
On Nov 27, 10:19 am, "Just Another Victim of the Ambient Morality"
<[EMAIL PROTECTED]> wrote:


> That is funny.  Thank you for your help...
> Just for clarification, what does the "r" in your code do?

It means a "raw" string (as you know ruby, think of it like %w{}):

This page explains about string literal prefixes (see especially the
end-notes):

http://docs.python.org/ref/strings.html

HTH,
Jordan
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: A bug in Python's regular expression engine?

2007-11-27 Thread Just Another Victim of the Ambient Morality

"Paul Hankin" <[EMAIL PROTECTED]> wrote in message 
news:[EMAIL PROTECTED]
> On Nov 27, 3:48 pm, "Just Another Victim of the Ambient Morality"
> <[EMAIL PROTECTED]> wrote:
>> This won't compile for me:
>>
>> regex = re.compile('(.*\\).*')
>>
>> I get the error:
>>
>> sre_constants.error: unbalanced parenthesis
>>
>> I'm running Python 2.5 on WinXP.  I've tried this expression with
>> another RE engine in another language and it works just fine which leads 
>> me
>> to believe the problem is Python.  Can anyone confirm or deny this bug?
>
> Your code is equivalent to:
> regex = re.compile(r'(.*\).*')
>
> Written like this, it's easier to see that you've started a regular
> expression group with '(', but it's never closed since your closed
> parenthesis is escaped (which causes it to match a literal ')' when
> used). Hence the reported error (which isn't a bug).
>
> Perhaps you meant this?
> regex = re.compile(r'(.*\\).*')
>
> This matches any number of characters followed by a backslash (group
> 1), and then any number of characters. If you're using this for path
> splitting filenames under Windows, you should look at os.path.split
> instead of writing your own.

Indeed, I did end up using os.path functions, instead.
I think I see what's going on.  Backslash has special meaning in both 
the regular expression and Python string declarations.  So, my version 
should have been something like this:


regex = re.compile('(.*).*')


That is funny.  Thank you for your help...
Just for clarification, what does the "r" in your code do?



-- 
http://mail.python.org/mailman/listinfo/python-list


Re: A bug in Python's regular expression engine?

2007-11-27 Thread Paul Hankin
On Nov 27, 3:48 pm, "Just Another Victim of the Ambient Morality"
<[EMAIL PROTECTED]> wrote:
> This won't compile for me:
>
> regex = re.compile('(.*\\).*')
>
> I get the error:
>
> sre_constants.error: unbalanced parenthesis
>
> I'm running Python 2.5 on WinXP.  I've tried this expression with
> another RE engine in another language and it works just fine which leads me
> to believe the problem is Python.  Can anyone confirm or deny this bug?

Your code is equivalent to:
regex = re.compile(r'(.*\).*')

Written like this, it's easier to see that you've started a regular
expression group with '(', but it's never closed since your closed
parenthesis is escaped (which causes it to match a literal ')' when
used). Hence the reported error (which isn't a bug).

Perhaps you meant this?
regex = re.compile(r'(.*\\).*')

This matches any number of characters followed by a backslash (group
1), and then any number of characters. If you're using this for path
splitting filenames under Windows, you should look at os.path.split
instead of writing your own.

HTH
--
Paul Hankin
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: A bug in Python's regular expression engine?

2007-11-27 Thread Neil Cerutti
On 2007-11-27, Just Another Victim of the Ambient Morality
<[EMAIL PROTECTED]> wrote:
> This won't compile for me:
>
>
> regex = re.compile('(.*\\).*')
>
> I get the error:
>  sre_constants.error: unbalanced parenthesis

Hint 1: Always assume that errors are in your own code. Blaming
library code and language implementations will get you nowhere
most of the time.

Hint 2: regular expressions and Python strings use the same
escape character.

Hint 3: Consult the Python documentation about raw strings, and
what they are meant for.

-- 
Neil Cerutti
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: A bug in Python's regular expression engine?

2007-11-27 Thread Diez B. Roggisch
Just Another Victim of the Ambient Morality wrote:

> This won't compile for me:
> 
> 
> regex = re.compile('(.*\\).*')
> 
> 
> I get the error:
> 
> 
> sre_constants.error: unbalanced parenthesis
> 
> 
> I'm running Python 2.5 on WinXP.  I've tried this expression with
> another RE engine in another language and it works just fine which leads
> me
> to believe the problem is Python.  Can anyone confirm or deny this bug?

It pretty much says what the problem is - you escaped the closing
parenthesis, resulting in an invalid rex.

Either use raw-strings or put the proper amount of backslashes in your
string:

regex = re.compile(r'(.*\\).*') # raw string literal

regex = re.compile('(.*).*') # two consecutive \es, meaning an escaped
one

Diez
-- 
http://mail.python.org/mailman/listinfo/python-list


A bug in Python's regular expression engine?

2007-11-27 Thread Just Another Victim of the Ambient Morality
This won't compile for me:


regex = re.compile('(.*\\).*')


I get the error:


sre_constants.error: unbalanced parenthesis


I'm running Python 2.5 on WinXP.  I've tried this expression with 
another RE engine in another language and it works just fine which leads me 
to believe the problem is Python.  Can anyone confirm or deny this bug?
Thank you...



-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Python's regular expression?

2006-05-10 Thread Dave Hughes
Dave Hansen wrote:

> On Wed, 10 May 2006 06:44:27 GMT in comp.lang.python, Edward Elliott
> <[EMAIL PROTECTED]> wrote:
> 
> 
> > 
> > Would I recommend perl for readable, maintainable code?  No, not
> > when better options like Python are available.  But it can be done
> > with some effort.
> 
> I'm reminded of a comment made a few years ago by John Levine,
> moderator of comp.compilers.  He said something like "It's clearly
> possible to write good code in C++.  It's just that no one does."

Reminds me of the quote that used to appear on the front page of the
ViewCVS project (seems to have gone now that they've moved and renamed
themselves to ViewVC). Can't recall the attribution off the top of my
head:

"[Perl] combines the power of C with the readability of PostScript"

Scathing ... but very funny :-)


Dave.

-- 

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Python's regular expression?

2006-05-10 Thread Dave Hansen
On Wed, 10 May 2006 06:44:27 GMT in comp.lang.python, Edward Elliott
<[EMAIL PROTECTED]> wrote:


>
>Would I recommend perl for readable, maintainable code?  No, not when better
>options like Python are available.  But it can be done with some effort.

I'm reminded of a comment made a few years ago by John Levine,
moderator of comp.compilers.  He said something like "It's clearly
possible to write good code in C++.  It's just that no one does."

Regards,
-=Dave

-- 
Change is inevitable, progress is not.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Python's regular expression?

2006-05-10 Thread Edward Elliott
bruno at modulix wrote:
> From a readability/maintenance POV, Perl is a perfect nightmare.

It's certainly true that perl lacks the the eminently readable quality of
python.  But then so do C, C++, Java, and a lot of other languages.

And I'll grant you that perl is more susceptible to the 'executable
line-noise' style than most other languages.  This results from its
heritage as a quick-and-dirty awk/sed type text processing language.

But perl doesn't *have* to look that way, and not every perl program is a
'perfect nightmare'.  If you follow good practices like turning on strict
checking, using readable variable names, avoiding $_, etc, you can produce
pretty readable and maintainable code.  It takes some discipline, but it's
very doable.  I've worked with some perl programs for over 5 years without
any trouble.  About the only thing you can't avoid are the sigils
everywhere.

Would I recommend perl for readable, maintainable code?  No, not when better
options like Python are available.  But it can be done with some effort.

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Python's regular expression?

2006-05-09 Thread Duncan Booth
Mirco Wahab wrote:

> If you wouldn't need dictionary lookup and
> get away with associated categories, all
> you'd have to do would be this:
> 
>$PATTERN = qr/
>(blue |white |red)(?{'Colour'})
>   |   (socks|tights)(?{'Garment'})
>   |   (boot |shoe  |trainer)(?{'Footwear'})
>/x;
> 
>$t = 'blue socks and red shoes';
>print "$^R: $^N\n" while( $t=~/$PATTERN/g );
> 
> What's the point of all that? IMHO, Python's
> Regex support is quite good and useful, but
> won't give you an edge over Perl's in the end.

If you are desperate to collapse the code down to a single print statement 
you can do that easily in Python as well:

>>> PATTERN = '''
  (?Pblue |white |red)
  |   (?Psocks|tights)
  |   (?Pboot |shoe  |trainer)
  '''
>>> t = 'blue socks and red shoes'
>>> print '\n'.join("%s:%s" % (match.lastgroup,
match.group(match.lastgroup))
for match in re.finditer(PATTERN, t, re.VERBOSE))
Colour:blue
Garment:socks
Colour:red
Footwear:shoe
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Python's regular expression?

2006-05-09 Thread bruno at modulix
Davy wrote:
> Hi all,
> 
(snip)
> Does Python support robust regular expression like Perl?

Yes.

> And Python and Perl's File content manipulation, which is better?

>From a raw perf and write-only POV, Perl clearly beats Python (regarding
 I/O, Perl is faster than C - or it least it was the last time I benched
it on a Linux box).

>From a readability/maintenance POV, Perl is a perfect nightmare.

> Any suggestions will be appreciated!

http://pythonology.org/success&story=esr


-- 
bruno desthuilliers
python -c "print '@'.join(['.'.join([w[::-1] for w in p.split('.')]) for
p in '[EMAIL PROTECTED]'.split('@')])"
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Python's regular expression?

2006-05-09 Thread Mirco Wahab
Hi Duncan

> Nick Craig-Wood wrote:
>> Which translates to
>>   match = re.search('(blue|white|red)', t)
>>   if match:
>>   else:
>>  if match:
>>  else:
>> if match:
> 
> This of course gives priority to colours and only looks for garments or 
> footwear if the it hasn't matched on a prior pattern. If you actually 
> wanted to match the first occurrence of any of these (or if the condition 
> was re.match instead of re.search) then named groups can be a nice way of 
> simplifying the code:

A good point. And a good example when to use named
capture group references. This is easily extended
for 'spitting out' all other occuring categories
(see below).

> PATTERN = '''
> (?Pblue|white|red)
> ...

This is one nice thing in Pythons Regex Syntax,
you have to emulate the ?P-thing in other
Regex-Systems more or less 'awk'-wardly ;-)

> For something this simple the titles and group names could be the 
> same, but I'm assuming real code might need a bit more.
Non no, this is quite good because it involves
some math-generated table-code lookup.

I managed somehow to extend your example in order
to spit out all matches and their corresponding
category:

  import re

  PATTERN = '''
  (?Pblue |white |red)
  |   (?Psocks|tights)
  |   (?Pboot |shoe  |trainer)
  '''

  PATTERN = re.compile(PATTERN , re.VERBOSE)
  TITLES = { 'c': 'Colour', 'g': 'Garment', 'f': 'Footwear' }

  t = 'blue socks and red shoes'
  for match in PATTERN.finditer(t):
  grp = match.lastgroup
  print "%s: %s" %( TITLES[grp], match.group(grp) )

which writes out the expected:
   Colour: blue
   Garment: socks
   Colour: red
   Footwear: shoe

The corresponding Perl-program would look like this:

   $PATTERN = qr/
   (blue |white |red)(?{'c'})
   |   (socks|tights)(?{'g'})
   |   (boot |shoe  |trainer)(?{'f'})
   /x;

   %TITLES = (c =>'Colour', g =>'Garment', f =>'Footwear');

   $t = 'blue socks and red shoes';
   print "$TITLES{$^R}: $^N\n" while( $t=~/$PATTERN/g );

and prints the same:
   Colour: blue
   Garment: socks
   Colour: red
   Footwear: shoe

You don't have nice named match references (?P<..>)
in Perl-5, so you have to emulate this by an ordinary
code assertion (?{..}) an set some value ($^R) on
the fly - which is not that bad in the end (imho).

(?{..}) means "zero with code assertion",
this sets Perl-predefined $^R to its evaluated
value from the {...}

As you can see, the pattern matching related part
reduces from 4 lines to one line.

If you wouldn't need dictionary lookup and
get away with associated categories, all
you'd have to do would be this:

   $PATTERN = qr/
   (blue |white |red)(?{'Colour'})
   |   (socks|tights)(?{'Garment'})
   |   (boot |shoe  |trainer)(?{'Footwear'})
   /x;

   $t = 'blue socks and red shoes';
   print "$^R: $^N\n" while( $t=~/$PATTERN/g );

What's the point of all that? IMHO, Python's
Regex support is quite good and useful, but
won't give you an edge over Perl's in the end.

Thanks & Regards

Mirco

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Python's regular expression?

2006-05-08 Thread Duncan Booth
Nick Craig-Wood wrote:

> Which translates to
> 
>   match = re.search('(blue|white|red)', t)
>   if match:
>  print "Colour:", match.group(1)
>   else:
>  match = re.search('(socks|tights)', t)
>  if match:
> print "Garment:", match.group(1)
>  else:
> match = re.search('(boot|shoe|trainer)', t)
> if match:
>print "Footwear:", match.group(1)
># indented ad infinitum!

This of course gives priority to colours and only looks for garments or 
footwear if the it hasn't matched on a prior pattern. If you actually 
wanted to match the first occurrence of any of these (or if the condition 
was re.match instead of re.search) then named groups can be a nice way of 
simplifying the code:

PATTERN = '''
(?Pblue|white|red)
|   (?Psocks|tights)
|   (?Pboot|shoe|trainer)
'''
PATTERN = re.compile(PATTERN, re.VERBOSE)
TITLES = { 'c': 'Colour', 'g': 'Garment', 'f': 'Footwear' }

match = PATTERN.search(t)
if match:
grp = match.lastgroup
print "%s: %s" % (TITLES[grp], match.group(grp))

For something this simple the titles and group names could be the same, but 
I'm assuming real code might need a bit more.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Python's regular expression?

2006-05-08 Thread Nick Craig-Wood
Mirco Wahab <[EMAIL PROTECTED]> wrote:
>  After some minutes in this NG I start to get
>  the picture. So I narrowed the above regex-question
>  down to a nice equivalence between Perl and Python:
> 
>  Python:
> 
> import re
> 
> t = 'blue socks and red shoes'
> if re.match('blue|white|red', t):
> print t
> 
> t = 'blue socks and red shoes'
> if re.search('blue|white|red', t):
>print t
> 
>  Perl:
> 
> use Acme::Pythonic;
> 
> $t = 'blue socks and red shoes'
> if $t =~ /blue|white|red/:
>   print $t
> 
>  And Python Regexes eventually lost (for me) some of
>  their (what I believed) 'clunky appearance' ;-)

If you are used to perl regexes there is one clunkiness of python
regexpes which you'll notice eventually...

Let's make the above example a bit more "real world", ie use the
matched item in some way...

Perl:

   $t = 'blue socks and red shoes';
   if ( $t =~ /(blue|white|red)/ )
   {
  print "Colour: $1\n";
   }
   
Which prints

  Colour: blue

In python you have to express this like

  import re

  t = 'blue socks and red shoes'
  match = re.search('(blue|white|red)', t)
  if match:
 print "Colour:", match.group(1)

Note the extra variable "match".  You can't do assignment in an
expression in python which makes for the extra verbiosity, and you
need a variable to store the result of the match in (since python
doesn't have the magic $1..$9 variables).

This becomes particularly frustrating when you have to do a series of
regexp matches, eg

   if ( $t =~ /(blue|white|red)/ )
   {
  print "Colour: $1\n";
   }
   elsif ( $t =~ /(socks|tights)/)
   {
  print "Garment: $1\n";
   }
   elsif ( $t =~ /(boot|shoe|trainer)/)
   {
  print "Footwear: $1\n";
   }

Which translates to

  match = re.search('(blue|white|red)', t)
  if match:
 print "Colour:", match.group(1)
  else:
 match = re.search('(socks|tights)', t)
 if match:
print "Garment:", match.group(1)
 else:
match = re.search('(boot|shoe|trainer)', t)
if match:
   print "Footwear:", match.group(1)
   # indented ad infinitum!

You can use a helper class to get over this frustration like this

import re

class Matcher:
  def search(self, r,s):
self.value = re.search(r,s)
return self.value
  def __getitem__(self, i):
return self.value.group(i)

m = Matcher()
t = 'blue socks and red shoes'

if m.search(r'(blue|white|red)', t):
print "Colour:", m[1]
elif m.search(r'(socks|tights)', t):
print "Garment:", m[1]
elif m.search(r'(boot|shoe|trainer)', t):
print "Footwear:", m[1]

Having made the transition from perl to python a couple of years ago,
I find myself using regexpes much less.  In perl everything looks like
it needs a regexp, but python has a much richer set of string methods,
eg .startswith, .endswith, good subscripting and the nice "in"
operator for strings.

-- 
Nick Craig-Wood <[EMAIL PROTECTED]> -- http://www.craig-wood.com/nick
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Python's regular expression?

2006-05-08 Thread Mirco Wahab
Hi John

>> But what would be an appropriate use
>> of search() vs. match()? When to use what?
> 
> ReadTheFantasticManual :-)

>From the manual you mentioned, i don't get
the point of 'match'. So why should you use
an extra function entry match(),

re.match('whatever', t):

which is, according to the FM,
equivalent to (a special case of?)

re.search('^whatever', t):

For me, it looks like match() should
be used on simple string comparisons
like a 'ramped up C-strcmp()'.

Or isn't ist? Maybe I dont get it ;-)

Thanks

Mirco
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Python's regular expression?

2006-05-08 Thread John Machin
On 8/05/2006 11:13 PM, Mirco Wahab wrote:
> Hi John
> 
>>>import re
>>>
>>>t = 'blue socks and red shoes'
>>>p = re.compile('(blue|white|red)')
>>>if p.match(t):
>> What do you expect when t == "green socks and red shoes"? Is it possible
>> that you mean to use search() rather than match()?
> 
> This is interesting.
> What's in this example the difference then between:

I suggest that you (a) read the description on the difference between 
search and match in the manual (b) try out search and match  on both 
your original string and the one I proposed.

> 
>import re
> 
>t = 'blue socks and red shoes'
>if re.compile('blue|white|red').match(t):
>   print t
> 
> and
> 
>t = 'blue socks and red shoes'
>if re.search('blue|white|red', t):
>   print t
[snip]
> 
> But what would be an appropriate use
> of search() vs. match()? When to use what?

ReadTheFantasticManual :-)

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Python's regular expression?

2006-05-08 Thread Mirco Wahab
Hi Duncan

> There is no need to compile the regular expression in advance in Python 
> either:
> ... 
> The only advantage to compiling in advance is a small speed up, and most of 
> the time that won't be significant.

I read 'some' introductions into Python Regexes
and got confused in the first place when to use
what and why.

After some minutes in this NG I start to get
the picture. So I narrowed the above regex-question
down to a nice equivalence between Perl and Python:

Python:

   import re

   t = 'blue socks and red shoes'
   if re.match('blue|white|red', t):
   print t

   t = 'blue socks and red shoes'
   if re.search('blue|white|red', t):
  print t

Perl:

   use Acme::Pythonic;

   $t = 'blue socks and red shoes'
   if $t =~ /blue|white|red/:
 print $t


And Python Regexes eventually lost (for me) some of
their (what I believed) 'clunky appearance' ;-)

Thanks

Mirco
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Python's regular expression?

2006-05-08 Thread Mirco Wahab
Hi John

>>import re
>>
>>t = 'blue socks and red shoes'
>>p = re.compile('(blue|white|red)')
>>if p.match(t):
> 
> What do you expect when t == "green socks and red shoes"? Is it possible
> that you mean to use search() rather than match()?

This is interesting.
What's in this example the difference then between:

   import re

   t = 'blue socks and red shoes'
   if re.compile('blue|white|red').match(t):
  print t

and

   t = 'blue socks and red shoes'
   if re.search('blue|white|red', t):
  print t

> There is no need to compile the regex in advance in Python, either.
> Please consider the module-level function search() ...
> if re.search(r"blue|white|red", t):
> # also, no need for () in the regex.

Thats true. Thank you for pointing this out.
But what would be an appropriate use
of search() vs. match()? When to use what?

I answered the posting in the first place
because also I'm coming from a C/C++/Perl
background and trying to get along in Python.

Thanks,

Mirco

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Python's regular expression?

2006-05-08 Thread Duncan Booth
Mirco Wahab wrote:

> Lets see - a really simple find/match
> would look like this in Python:
> 
>import re
> 
>t = 'blue socks and red shoes'
>p = re.compile('(blue|white|red)')
>if p.match(t):
>   print t
> 
> which prints the text 't' because  of
> the positive pattern match.
> 
> In Perl, you write:
> 
>use Acme::Pythonic;
> 
>$t = 'blue socks and red shoes'
>if ($t =~ /(blue|white|red)/):
>  print $t
> 
> which is one line shorter (no need
> to compile the regular expression
> in advance).
> 

There is no need to compile the regular expression in advance in Python 
either:

   t = 'blue socks and red shoes'
   if re.match('(blue|white|red)', t):
  print t

The only advantage to compiling in advance is a small speed up, and most of 
the time that won't be significant.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Python's regular expression?

2006-05-08 Thread John Machin
On 8/05/2006 10:31 PM, Mirco Wahab wrote:
[snip]
> 
> Lets see - a really simple find/match
> would look like this in Python:
> 
>import re
> 
>t = 'blue socks and red shoes'
>p = re.compile('(blue|white|red)')
>if p.match(t):

What do you expect when t == "green socks and red shoes"? Is it possible 
that you mean to use search() rather than match()?

>   print t
> 
> which prints the text 't' because  of
> the positive pattern match.
> 
> In Perl, you write:
> 
>use Acme::Pythonic;
> 
>$t = 'blue socks and red shoes'
>if ($t =~ /(blue|white|red)/):
>  print $t
> 
> which is one line shorter (no need
> to compile the regular expression
> in advance).

There is no need to compile the regex in advance in Python, either. 
Please consider the module-level function search() ...
if re.search(r"blue|white|red", t):
# also, no need for () in the regex.

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Python's regular expression?

2006-05-08 Thread Mirco Wahab
Hi Davy

> > More similar than Perl ;-)

But C has { }'s everywhere, so has Perl ;-)

> > And what's 'integrated' mean (must include some library)?

Yes. In Python, regular expressions are just
another function library - you use them like
in Java or C.

In Perl, it's part of the core language, you
use the awk-style (eg: /.../) regular expressions
everywhere you want.

If you used regexp in C/C++ before, you can use them
in almost the same way in Python - which may give you
an easy start.

BTW. Python has some fine extensions to the
perl(5)-Regexes, e.g. 'named backreferences'.

But you won't see much regular expressions
in Python code posted to this group, maybe
because it looks clunky - which is unpythonic ;-)

Lets see - a really simple find/match
would look like this in Python:

   import re

   t = 'blue socks and red shoes'
   p = re.compile('(blue|white|red)')
   if p.match(t):
  print t

which prints the text 't' because  of
the positive pattern match.

In Perl, you write:

   use Acme::Pythonic;

   $t = 'blue socks and red shoes'
   if ($t =~ /(blue|white|red)/):
 print $t

which is one line shorter (no need
to compile the regular expression
in advance).

> > I like C++ file I/O, is it 'low' or 'high'?

C++ has afaik actually three levels of I/O:

(1) - (from C, very low) operating system level, included
by  which provides direct access to operating system
services (read(), write(), lseek() etc.)

(2) - C-Standard-Library buffered IO, included by ,
provides structured 'mid-level' access like (block-) fread()/
fwrite(), line read (fgets()) and formatted I/O (fprintf()/
fscanf())

(3) - C++/streams library (high level, , , ),
which abstracts out the i/o devices, provides the same set of
functionality for any abstract input or output.

Perl provides all three levels of I/O, the 'abstracting' is introduced
by modules which tie 'handle variables' to anything that may receive
or send data.

Python also does a good job on all three levels, but provides
the (low level) operating system I/O by external modules (afaik).
I didn't do much I/O in Python, so I can't say much here.

Regards

Mirco
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Python's regular expression?

2006-05-08 Thread Davy
By the way, is there any tutorial talk about how to use the Python
Shell (IDE). I wish it simple like VC++ :)

Regards,
Davy

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Python's regular expression?

2006-05-08 Thread Davy
Hi Mirco,

Thank you!

More similar than Perl ;-)

And what's 'integrated' mean (must include some library)?

I like C++ file I/O, is it 'low' or 'high'? 

Regards,
Davy

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Python's regular expression?

2006-05-08 Thread Mirco Wahab
Hi Davy wrote:

> I am a C/C++/Perl user and want to switch to Python 

OK

> (I found Python is more similar to C).

;-) More similar than what?

> Does Python support robust regular expression like Perl?

It supports them fairly good, but it's
not 'integrated' - at least it feels not
integrated for me ;-) If you did a lot of
Perl, you know what 'integrated' means ...

> And Python and Perl's File content manipulation, which is better?

What is a 'file content manipulation'?
Did you mean 'good xxx level file IO',
where xxx means either 'low' or 'high'?

> Any suggestions will be appreciated!

Just try to start a small project in Python -
from source that you already have in C or Perl
or something.


Regards

Mirco
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Python's regular expression?

2006-05-08 Thread Lawrence Oluyede
"Davy" <[EMAIL PROTECTED]> writes:
> Does Python support robust regular expression like Perl?

Yep, Python regular expression is robust. Have a look at the Regex Howto:
http://www.amk.ca/python/howto/regex/ and the re module:
http://docs.python.org/lib/module-re.html

-- 
Lawrence - http://www.oluyede.org/blog
"Nothing is more dangerous than an idea
if it's the only one you have" - E. A. Chartier
-- 
http://mail.python.org/mailman/listinfo/python-list


Python's regular expression?

2006-05-08 Thread Davy
Hi all,

I am a C/C++/Perl user and want to switch to Python (I found Python is
more similar to C).

Does Python support robust regular expression like Perl?

And Python and Perl's File content manipulation, which is better?

Any suggestions will be appreciated!
Best regards,
Davy

-- 
http://mail.python.org/mailman/listinfo/python-list


ANN: pyregex 0.5 - command line tools for Python's regular expression

2006-03-09 Thread [EMAIL PROTECTED]
pyregex is a command line tools for constructing and testing Python's
regular expression. Features includes text highlighting, detail break
down of match groups, substitution and a syntax quick reference. It is
released in the public domain.

Screenshot and download from
http://tungwaiyip.info/software/pyregex.html.

Wai Yip Tung


Usage: pyregex.py [options] "-"|filename regex [replacement [count]]

Test Python regular expressions. Specify test data's filename or use
"-"
to enter test text from console. Optionally specify a replacement text.

Options:
-f  filter mode
-n nnn  limit to examine the first nnn lines. default no limit.
-m  show only matched line. default False


Regular Expression Syntax

Special Characters

.   matches any character except a newline
^   matches the start of the string
$   matches the end of the string or just before the newline at the
end of
the string
*   matches 0 or more repetitions of the preceding RE
+   matches 1 or more repetitions of the preceding RE
?   matches 0 or 1 repetitions of the preceding RE
{m} exactly m copies of the previous RE should be matched
{m,n}   matches from m to n repetitions of the preceding RE
\   either escapes special characters or signals a special sequence
[]  indicate a set of characters. Characters can be listed
individually,
or a range of characters can be indicated by giving two
characters and
separating them by a "-". Special characters are not active
inside sets
Including a "^" as the first character match the complement of
the set
|   A|B matches either A or B
(...)   indicates the start and end of a group
(?...)  this is an extension notation. See documentation for detail
(?iLmsux) I ignorecase; L locale; M multiline; S dotall; U unicode; X
verbose

*, +, ? and {m,n} are greedy. Append the ? qualifier to match
non-greedily.


Special Sequences

\number matches the contents of the group of the same number. Groups
are
numbered starting from 1
\A  matches only at the start of the string
\b  matches the empty string at the beginning or end of a word
\B  matches the empty string not at the beginning or end of a word
\d  matches any decimal digit
\D  matches any non-digit character
\guse the substring matched by the group named 'name' for sub()
\s  matches any whitespace character
\S  matches any non-whitespace character
\w  matches any alphanumeric character and the underscore
\W  matches any non-alphanumeric character
\Z  matches only at the end of the string


See the Python documentation on Regular Expression Syntax for more
detail

http://docs.python.org/lib/re-syntax.html

-- 
http://mail.python.org/mailman/listinfo/python-list