Re: Python's regular expression help
Le 29/04/2010 20:00, goldtech a écrit : Hi, Trying to start out with simple things but apparently there's some basics I need help with. This works OK: import re p = re.compile('(ab*)(sss)') m = p.match( 'absss' ) m.group(0) 'absss' m.group(1) 'ab' m.group(2) 'sss' ... But two questions: How can I operate a regex on a string variable? I'm doing something wrong here: f=r'abss' f 'abss' m = p.match( f ) m.group(0) Traceback (most recent call last): File pyshell#15, line 1, inmodule m.group(0) AttributeError: 'NoneType' object has no attribute 'group' How do I implement a regex on a multiline string? I thought this might work but there's problem: p = re.compile('(ab*)(sss)', re.S) m = p.match( 'ab\nsss' ) m.group(0) Traceback (most recent call last): File pyshell#26, line 1, inmodule m.group(0) AttributeError: 'NoneType' object has no attribute 'group' Thanks for the newbie regex help, Lee for multiline, I use re.DOTALL I do not know match(), findall is pretty efficient : my = a href=\hello world.com\LINK/a res = re.findall((.*?),my) res ['LINK'] Dorian -- http://mail.python.org/mailman/listinfo/python-list
Re: Python's regular expression help
goldtech wrote: Hi, Trying to start out with simple things but apparently there's some basics I need help with. This works OK: import re p = re.compile('(ab*)(sss)') m = p.match( 'absss' ) m.group(0) 'absss' m.group(1) 'ab' m.group(2) 'sss' ... But two questions: How can I operate a regex on a string variable? I'm doing something wrong here: f=r'abss' f 'abss' m = p.match( f ) m.group(0) Traceback (most recent call last): File pyshell#15, line 1, in module m.group(0) AttributeError: 'NoneType' object has no attribute 'group' Look closely: the regex contains 3 letter 's', but the string referred to by f has only 2. How do I implement a regex on a multiline string? I thought this might work but there's problem: p = re.compile('(ab*)(sss)', re.S) m = p.match( 'ab\nsss' ) m.group(0) Traceback (most recent call last): File pyshell#26, line 1, in module m.group(0) AttributeError: 'NoneType' object has no attribute 'group' Thanks for the newbie regex help, Lee The string contains a newline between the 'b' and the 's', but the regex isn't expecting any newline (or any other character) between the 'b' and the 's', hence no match. -- http://mail.python.org/mailman/listinfo/python-list
Re: Python's regular expression help
On 04/29/2010 01:00 PM, goldtech wrote: Trying to start out with simple things but apparently there's some basics I need help with. This works OK: import re p = re.compile('(ab*)(sss)') m = p.match( 'absss' ) f=r'abss' f 'abss' m = p.match( f ) m.group(0) Traceback (most recent call last): File pyshell#15, line 1, inmodule m.group(0) AttributeError: 'NoneType' object has no attribute 'group' 'absss' != 'abss' Your regexp looks for 3 s, your f contains only 2. So the regexp object doesn't, well, match. Try f = 'absss' and it will work. As an aside, using raw-strings for this text doesn't change anything, but if you want, you _can_ write it as f = r'absss' if it will make you feel better :) How do I implement a regex on a multiline string? I thought this might work but there's problem: p = re.compile('(ab*)(sss)', re.S) m = p.match( 'ab\nsss' ) m.group(0) Traceback (most recent call last): File pyshell#26, line 1, inmodule m.group(0) AttributeError: 'NoneType' object has no attribute 'group' Well, it depends on what you want to do -- regexps are fairly precise, so if you want to allow whitespace between the two, you can use r = re.compile(r'(ab*)\s*(sss)') If you want to allow whitespace anywhere, it gets uglier, and your capture/group results will contain that whitespace: r'(a\s*b*)\s*(s\s*s\s*s)' Alternatively, if you don't want to allow arbitrary whitespace but only newlines, you can use \n* instead of \s* -tkc -- http://mail.python.org/mailman/listinfo/python-list
Re: Python's regular expression help
On Apr 29, 11:49 am, Tim Chase python.l...@tim.thechases.com wrote: On 04/29/2010 01:00 PM, goldtech wrote: Trying to start out with simple things but apparently there's some basics I need help with. This works OK: import re p = re.compile('(ab*)(sss)') m = p.match( 'absss' ) f=r'abss' f 'abss' m = p.match( f ) m.group(0) Traceback (most recent call last): File pyshell#15, line 1, inmodule m.group(0) AttributeError: 'NoneType' object has no attribute 'group' 'absss' != 'abss' Your regexp looks for 3 s, your f contains only 2. So the regexp object doesn't, well, match. Try f = 'absss' and it will work. As an aside, using raw-strings for this text doesn't change anything, but if you want, you _can_ write it as f = r'absss' if it will make you feel better :) How do I implement a regex on a multiline string? I thought this might work but there's problem: p = re.compile('(ab*)(sss)', re.S) m = p.match( 'ab\nsss' ) m.group(0) Traceback (most recent call last): File pyshell#26, line 1, inmodule m.group(0) AttributeError: 'NoneType' object has no attribute 'group' Well, it depends on what you want to do -- regexps are fairly precise, so if you want to allow whitespace between the two, you can use r = re.compile(r'(ab*)\s*(sss)') If you want to allow whitespace anywhere, it gets uglier, and your capture/group results will contain that whitespace: r'(a\s*b*)\s*(s\s*s\s*s)' Alternatively, if you don't want to allow arbitrary whitespace but only newlines, you can use \n* instead of \s* -tkc Yes, most of my problem is w/my patterns not w/any python re syntax. I thought re.S will take a multiline string with any spaces or newlines and make it appear as one line to the regex. Make /n be ignored in a way...still playing w/it. Thanks for the help! -- http://mail.python.org/mailman/listinfo/python-list
Re: Python's regular expression?
bruno at modulix wrote: From a readability/maintenance POV, Perl is a perfect nightmare. It's certainly true that perl lacks the the eminently readable quality of python. But then so do C, C++, Java, and a lot of other languages. And I'll grant you that perl is more susceptible to the 'executable line-noise' style than most other languages. This results from its heritage as a quick-and-dirty awk/sed type text processing language. But perl doesn't *have* to look that way, and not every perl program is a 'perfect nightmare'. If you follow good practices like turning on strict checking, using readable variable names, avoiding $_, etc, you can produce pretty readable and maintainable code. It takes some discipline, but it's very doable. I've worked with some perl programs for over 5 years without any trouble. About the only thing you can't avoid are the sigils everywhere. Would I recommend perl for readable, maintainable code? No, not when better options like Python are available. But it can be done with some effort. -- http://mail.python.org/mailman/listinfo/python-list
Re: Python's regular expression?
On Wed, 10 May 2006 06:44:27 GMT in comp.lang.python, Edward Elliott [EMAIL PROTECTED] wrote: Would I recommend perl for readable, maintainable code? No, not when better options like Python are available. But it can be done with some effort. I'm reminded of a comment made a few years ago by John Levine, moderator of comp.compilers. He said something like It's clearly possible to write good code in C++. It's just that no one does. Regards, -=Dave -- Change is inevitable, progress is not. -- http://mail.python.org/mailman/listinfo/python-list
Re: Python's regular expression?
Dave Hansen wrote: On Wed, 10 May 2006 06:44:27 GMT in comp.lang.python, Edward Elliott [EMAIL PROTECTED] wrote: Would I recommend perl for readable, maintainable code? No, not when better options like Python are available. But it can be done with some effort. I'm reminded of a comment made a few years ago by John Levine, moderator of comp.compilers. He said something like It's clearly possible to write good code in C++. It's just that no one does. Reminds me of the quote that used to appear on the front page of the ViewCVS project (seems to have gone now that they've moved and renamed themselves to ViewVC). Can't recall the attribution off the top of my head: [Perl] combines the power of C with the readability of PostScript Scathing ... but very funny :-) Dave. -- -- http://mail.python.org/mailman/listinfo/python-list
Re: Python's regular expression?
Hi Duncan Nick Craig-Wood wrote: Which translates to match = re.search('(blue|white|red)', t) if match: else: if match: else: if match: This of course gives priority to colours and only looks for garments or footwear if the it hasn't matched on a prior pattern. If you actually wanted to match the first occurrence of any of these (or if the condition was re.match instead of re.search) then named groups can be a nice way of simplifying the code: A good point. And a good example when to use named capture group references. This is easily extended for 'spitting out' all other occuring categories (see below). PATTERN = ''' (?Pcblue|white|red) ... This is one nice thing in Pythons Regex Syntax, you have to emulate the ?P-thing in other Regex-Systems more or less 'awk'-wardly ;-) For something this simple the titles and group names could be the same, but I'm assuming real code might need a bit more. Non no, this is quite good because it involves some math-generated table-code lookup. I managed somehow to extend your example in order to spit out all matches and their corresponding category: import re PATTERN = ''' (?Pcblue |white |red) | (?Pgsocks|tights) | (?Pfboot |shoe |trainer) ''' PATTERN = re.compile(PATTERN , re.VERBOSE) TITLES = { 'c': 'Colour', 'g': 'Garment', 'f': 'Footwear' } t = 'blue socks and red shoes' for match in PATTERN.finditer(t): grp = match.lastgroup print %s: %s %( TITLES[grp], match.group(grp) ) which writes out the expected: Colour: blue Garment: socks Colour: red Footwear: shoe The corresponding Perl-program would look like this: $PATTERN = qr/ (blue |white |red)(?{'c'}) | (socks|tights)(?{'g'}) | (boot |shoe |trainer)(?{'f'}) /x; %TITLES = (c ='Colour', g ='Garment', f ='Footwear'); $t = 'blue socks and red shoes'; print $TITLES{$^R}: $^N\n while( $t=~/$PATTERN/g ); and prints the same: Colour: blue Garment: socks Colour: red Footwear: shoe You don't have nice named match references (?P..) in Perl-5, so you have to emulate this by an ordinary code assertion (?{..}) an set some value ($^R) on the fly - which is not that bad in the end (imho). (?{..}) means zero with code assertion, this sets Perl-predefined $^R to its evaluated value from the {...} As you can see, the pattern matching related part reduces from 4 lines to one line. If you wouldn't need dictionary lookup and get away with associated categories, all you'd have to do would be this: $PATTERN = qr/ (blue |white |red)(?{'Colour'}) | (socks|tights)(?{'Garment'}) | (boot |shoe |trainer)(?{'Footwear'}) /x; $t = 'blue socks and red shoes'; print $^R: $^N\n while( $t=~/$PATTERN/g ); What's the point of all that? IMHO, Python's Regex support is quite good and useful, but won't give you an edge over Perl's in the end. Thanks Regards Mirco -- http://mail.python.org/mailman/listinfo/python-list
Re: Python's regular expression?
Davy wrote: Hi all, (snip) Does Python support robust regular expression like Perl? Yes. And Python and Perl's File content manipulation, which is better? From a raw perf and write-only POV, Perl clearly beats Python (regarding I/O, Perl is faster than C - or it least it was the last time I benched it on a Linux box). From a readability/maintenance POV, Perl is a perfect nightmare. Any suggestions will be appreciated! http://pythonology.org/successstory=esr -- bruno desthuilliers python -c print '@'.join(['.'.join([w[::-1] for w in p.split('.')]) for p in '[EMAIL PROTECTED]'.split('@')]) -- http://mail.python.org/mailman/listinfo/python-list
Re: Python's regular expression?
Mirco Wahab wrote: If you wouldn't need dictionary lookup and get away with associated categories, all you'd have to do would be this: $PATTERN = qr/ (blue |white |red)(?{'Colour'}) | (socks|tights)(?{'Garment'}) | (boot |shoe |trainer)(?{'Footwear'}) /x; $t = 'blue socks and red shoes'; print $^R: $^N\n while( $t=~/$PATTERN/g ); What's the point of all that? IMHO, Python's Regex support is quite good and useful, but won't give you an edge over Perl's in the end. If you are desperate to collapse the code down to a single print statement you can do that easily in Python as well: PATTERN = ''' (?PColourblue |white |red) | (?PGarmentsocks|tights) | (?PFootwearboot |shoe |trainer) ''' t = 'blue socks and red shoes' print '\n'.join(%s:%s % (match.lastgroup, match.group(match.lastgroup)) for match in re.finditer(PATTERN, t, re.VERBOSE)) Colour:blue Garment:socks Colour:red Footwear:shoe -- http://mail.python.org/mailman/listinfo/python-list
Re: Python's regular expression?
Davy [EMAIL PROTECTED] writes: Does Python support robust regular expression like Perl? Yep, Python regular expression is robust. Have a look at the Regex Howto: http://www.amk.ca/python/howto/regex/ and the re module: http://docs.python.org/lib/module-re.html -- Lawrence - http://www.oluyede.org/blog Nothing is more dangerous than an idea if it's the only one you have - E. A. Chartier -- http://mail.python.org/mailman/listinfo/python-list
Re: Python's regular expression?
Hi Davy wrote: I am a C/C++/Perl user and want to switch to Python OK (I found Python is more similar to C). ;-) More similar than what? Does Python support robust regular expression like Perl? It supports them fairly good, but it's not 'integrated' - at least it feels not integrated for me ;-) If you did a lot of Perl, you know what 'integrated' means ... And Python and Perl's File content manipulation, which is better? What is a 'file content manipulation'? Did you mean 'good xxx level file IO', where xxx means either 'low' or 'high'? Any suggestions will be appreciated! Just try to start a small project in Python - from source that you already have in C or Perl or something. Regards Mirco -- http://mail.python.org/mailman/listinfo/python-list
Re: Python's regular expression?
Hi Mirco, Thank you! More similar than Perl ;-) And what's 'integrated' mean (must include some library)? I like C++ file I/O, is it 'low' or 'high'? Regards, Davy -- http://mail.python.org/mailman/listinfo/python-list
Re: Python's regular expression?
By the way, is there any tutorial talk about how to use the Python Shell (IDE). I wish it simple like VC++ :) Regards, Davy -- http://mail.python.org/mailman/listinfo/python-list
Re: Python's regular expression?
Hi Davy More similar than Perl ;-) But C has { }'s everywhere, so has Perl ;-) And what's 'integrated' mean (must include some library)? Yes. In Python, regular expressions are just another function library - you use them like in Java or C. In Perl, it's part of the core language, you use the awk-style (eg: /.../) regular expressions everywhere you want. If you used regexp in C/C++ before, you can use them in almost the same way in Python - which may give you an easy start. BTW. Python has some fine extensions to the perl(5)-Regexes, e.g. 'named backreferences'. But you won't see much regular expressions in Python code posted to this group, maybe because it looks clunky - which is unpythonic ;-) Lets see - a really simple find/match would look like this in Python: import re t = 'blue socks and red shoes' p = re.compile('(blue|white|red)') if p.match(t): print t which prints the text 't' because of the positive pattern match. In Perl, you write: use Acme::Pythonic; $t = 'blue socks and red shoes' if ($t =~ /(blue|white|red)/): print $t which is one line shorter (no need to compile the regular expression in advance). I like C++ file I/O, is it 'low' or 'high'? C++ has afaik actually three levels of I/O: (1) - (from C, very low) operating system level, included by io.h which provides direct access to operating system services (read(), write(), lseek() etc.) (2) - C-Standard-Library buffered IO, included by stdio.h, provides structured 'mid-level' access like (block-) fread()/ fwrite(), line read (fgets()) and formatted I/O (fprintf()/ fscanf()) (3) - C++/streams library (high level, fstream, iostream, sstream), which abstracts out the i/o devices, provides the same set of functionality for any abstract input or output. Perl provides all three levels of I/O, the 'abstracting' is introduced by modules which tie 'handle variables' to anything that may receive or send data. Python also does a good job on all three levels, but provides the (low level) operating system I/O by external modules (afaik). I didn't do much I/O in Python, so I can't say much here. Regards Mirco -- http://mail.python.org/mailman/listinfo/python-list
Re: Python's regular expression?
On 8/05/2006 10:31 PM, Mirco Wahab wrote: [snip] Lets see - a really simple find/match would look like this in Python: import re t = 'blue socks and red shoes' p = re.compile('(blue|white|red)') if p.match(t): What do you expect when t == green socks and red shoes? Is it possible that you mean to use search() rather than match()? print t which prints the text 't' because of the positive pattern match. In Perl, you write: use Acme::Pythonic; $t = 'blue socks and red shoes' if ($t =~ /(blue|white|red)/): print $t which is one line shorter (no need to compile the regular expression in advance). There is no need to compile the regex in advance in Python, either. Please consider the module-level function search() ... if re.search(rblue|white|red, t): # also, no need for () in the regex. -- http://mail.python.org/mailman/listinfo/python-list
Re: Python's regular expression?
Mirco Wahab wrote: Lets see - a really simple find/match would look like this in Python: import re t = 'blue socks and red shoes' p = re.compile('(blue|white|red)') if p.match(t): print t which prints the text 't' because of the positive pattern match. In Perl, you write: use Acme::Pythonic; $t = 'blue socks and red shoes' if ($t =~ /(blue|white|red)/): print $t which is one line shorter (no need to compile the regular expression in advance). There is no need to compile the regular expression in advance in Python either: t = 'blue socks and red shoes' if re.match('(blue|white|red)', t): print t The only advantage to compiling in advance is a small speed up, and most of the time that won't be significant. -- http://mail.python.org/mailman/listinfo/python-list
Re: Python's regular expression?
Hi John import re t = 'blue socks and red shoes' p = re.compile('(blue|white|red)') if p.match(t): What do you expect when t == green socks and red shoes? Is it possible that you mean to use search() rather than match()? This is interesting. What's in this example the difference then between: import re t = 'blue socks and red shoes' if re.compile('blue|white|red').match(t): print t and t = 'blue socks and red shoes' if re.search('blue|white|red', t): print t There is no need to compile the regex in advance in Python, either. Please consider the module-level function search() ... if re.search(rblue|white|red, t): # also, no need for () in the regex. Thats true. Thank you for pointing this out. But what would be an appropriate use of search() vs. match()? When to use what? I answered the posting in the first place because also I'm coming from a C/C++/Perl background and trying to get along in Python. Thanks, Mirco -- http://mail.python.org/mailman/listinfo/python-list
Re: Python's regular expression?
Hi Duncan There is no need to compile the regular expression in advance in Python either: ... The only advantage to compiling in advance is a small speed up, and most of the time that won't be significant. I read 'some' introductions into Python Regexes and got confused in the first place when to use what and why. After some minutes in this NG I start to get the picture. So I narrowed the above regex-question down to a nice equivalence between Perl and Python: Python: import re t = 'blue socks and red shoes' if re.match('blue|white|red', t): print t t = 'blue socks and red shoes' if re.search('blue|white|red', t): print t Perl: use Acme::Pythonic; $t = 'blue socks and red shoes' if $t =~ /blue|white|red/: print $t And Python Regexes eventually lost (for me) some of their (what I believed) 'clunky appearance' ;-) Thanks Mirco -- http://mail.python.org/mailman/listinfo/python-list
Re: Python's regular expression?
On 8/05/2006 11:13 PM, Mirco Wahab wrote: Hi John import re t = 'blue socks and red shoes' p = re.compile('(blue|white|red)') if p.match(t): What do you expect when t == green socks and red shoes? Is it possible that you mean to use search() rather than match()? This is interesting. What's in this example the difference then between: I suggest that you (a) read the description on the difference between search and match in the manual (b) try out search and match on both your original string and the one I proposed. import re t = 'blue socks and red shoes' if re.compile('blue|white|red').match(t): print t and t = 'blue socks and red shoes' if re.search('blue|white|red', t): print t [snip] But what would be an appropriate use of search() vs. match()? When to use what? ReadTheFantasticManual :-) -- http://mail.python.org/mailman/listinfo/python-list
Re: Python's regular expression?
Hi John But what would be an appropriate use of search() vs. match()? When to use what? ReadTheFantasticManual :-) From the manual you mentioned, i don't get the point of 'match'. So why should you use an extra function entry match(), re.match('whatever', t): which is, according to the FM, equivalent to (a special case of?) re.search('^whatever', t): For me, it looks like match() should be used on simple string comparisons like a 'ramped up C-strcmp()'. Or isn't ist? Maybe I dont get it ;-) Thanks Mirco -- http://mail.python.org/mailman/listinfo/python-list
Re: Python's regular expression?
Mirco Wahab [EMAIL PROTECTED] wrote: After some minutes in this NG I start to get the picture. So I narrowed the above regex-question down to a nice equivalence between Perl and Python: Python: import re t = 'blue socks and red shoes' if re.match('blue|white|red', t): print t t = 'blue socks and red shoes' if re.search('blue|white|red', t): print t Perl: use Acme::Pythonic; $t = 'blue socks and red shoes' if $t =~ /blue|white|red/: print $t And Python Regexes eventually lost (for me) some of their (what I believed) 'clunky appearance' ;-) If you are used to perl regexes there is one clunkiness of python regexpes which you'll notice eventually... Let's make the above example a bit more real world, ie use the matched item in some way... Perl: $t = 'blue socks and red shoes'; if ( $t =~ /(blue|white|red)/ ) { print Colour: $1\n; } Which prints Colour: blue In python you have to express this like import re t = 'blue socks and red shoes' match = re.search('(blue|white|red)', t) if match: print Colour:, match.group(1) Note the extra variable match. You can't do assignment in an expression in python which makes for the extra verbiosity, and you need a variable to store the result of the match in (since python doesn't have the magic $1..$9 variables). This becomes particularly frustrating when you have to do a series of regexp matches, eg if ( $t =~ /(blue|white|red)/ ) { print Colour: $1\n; } elsif ( $t =~ /(socks|tights)/) { print Garment: $1\n; } elsif ( $t =~ /(boot|shoe|trainer)/) { print Footwear: $1\n; } Which translates to match = re.search('(blue|white|red)', t) if match: print Colour:, match.group(1) else: match = re.search('(socks|tights)', t) if match: print Garment:, match.group(1) else: match = re.search('(boot|shoe|trainer)', t) if match: print Footwear:, match.group(1) # indented ad infinitum! You can use a helper class to get over this frustration like this import re class Matcher: def search(self, r,s): self.value = re.search(r,s) return self.value def __getitem__(self, i): return self.value.group(i) m = Matcher() t = 'blue socks and red shoes' if m.search(r'(blue|white|red)', t): print Colour:, m[1] elif m.search(r'(socks|tights)', t): print Garment:, m[1] elif m.search(r'(boot|shoe|trainer)', t): print Footwear:, m[1] Having made the transition from perl to python a couple of years ago, I find myself using regexpes much less. In perl everything looks like it needs a regexp, but python has a much richer set of string methods, eg .startswith, .endswith, good subscripting and the nice in operator for strings. -- Nick Craig-Wood [EMAIL PROTECTED] -- http://www.craig-wood.com/nick -- http://mail.python.org/mailman/listinfo/python-list
Re: Python's regular expression?
Nick Craig-Wood wrote: Which translates to match = re.search('(blue|white|red)', t) if match: print Colour:, match.group(1) else: match = re.search('(socks|tights)', t) if match: print Garment:, match.group(1) else: match = re.search('(boot|shoe|trainer)', t) if match: print Footwear:, match.group(1) # indented ad infinitum! This of course gives priority to colours and only looks for garments or footwear if the it hasn't matched on a prior pattern. If you actually wanted to match the first occurrence of any of these (or if the condition was re.match instead of re.search) then named groups can be a nice way of simplifying the code: PATTERN = ''' (?Pcblue|white|red) | (?Pgsocks|tights) | (?Pfboot|shoe|trainer) ''' PATTERN = re.compile(PATTERN, re.VERBOSE) TITLES = { 'c': 'Colour', 'g': 'Garment', 'f': 'Footwear' } match = PATTERN.search(t) if match: grp = match.lastgroup print %s: %s % (TITLES[grp], match.group(grp)) For something this simple the titles and group names could be the same, but I'm assuming real code might need a bit more. -- http://mail.python.org/mailman/listinfo/python-list