On 25/05/2006 7:58 PM, gisleyt wrote: > I'm trying to compile a perfectly valid regex, but get the error > message: > > r = > re.compile(r'([^\d]*)(\d{1,3}\.\d{0,2})?(\d*)(\,\d{1,3}\.\d{0,2})?(\d*)?.*') > Traceback (most recent call last): > File "<stdin>", line 1, in ? > File "/usr/lib/python2.3/sre.py", line 179, in compile > return _compile(pattern, flags) > File "/usr/lib/python2.3/sre.py", line 230, in _compile > raise error, v # invalid expression > sre_constants.error: nothing to repeat > > What does this mean? I know that the regex > ([^\d]*)(\d{1,3}\.\d{0,2})?(\d*)(\,\d{1,3}\.\d{0,2})?(\d*)?.* > is valid because i'm able to use it in Regex Coach.
Say what??? From the Regex Coach website: (1) "can be used to experiment with (Perl-compatible) regular expressions" (2) "PCRE (which is used by projects like Python" -- once upon a time, way back in the dream-time, when the world was young, ... The problem is this little snippet near the end of your regex: >>> re.compile(r'(\d*)?') Traceback (most recent call last): File "<stdin>", line 1, in ? File "C:\Python24\lib\sre.py", line 180, in compile return _compile(pattern, flags) File "C:\Python24\lib\sre.py", line 227, in _compile raise error, v # invalid expression sre_constants.error: nothing to repeat The message is a little cryptic, should be something like "a repeat operator has an operand which may match nothing". In other words, you have said X? (optional occurrence of X) *BUT* X can already match a zero-length string. X in this case is (\d*) This is a theoretically valid regex, but it's equivalent to just plain X, and leaves the reader (and the re implementors, obviously) wondering whether you (a) have made a typo (b) are a member of the re implementation quality assurance inspectorate or (c) just plain confused :-) BTW, reading your regex was making my eyes bleed, so I did this to find out which piece was the problem: import re pat0 = r'([^\d]*)(\d{1,3}\.\d{0,2})?(\d*)(\,\d{1,3}\.\d{0,2})?(\d*)?.*' pat1 = r'([^\d]*)' pat2 = r'(\d{1,3}\.\d{0,2})?' pat3 = r'(\d*)' pat4 = r'(\,\d{1,3}\.\d{0,2})?' pat5 = r'(\d*)?.*' for k, pat in enumerate([pat1, pat2, pat3, pat4, pat5]): print k+1 re.compile(pat) > But is Python's > regex syntax different that an ordinary syntax? Python aims to lift itself above the ordinary :-) > > By the way, i'm using it to normalise strings like: > > London|country/uk/region/europe/geocoord/32.3244,42,1221244 > to: > London|country/uk/region/europe/geocoord/32.32,42,12 > > By using \1\2\4 as replace. I'm open for other suggestions to achieve > this! > Well, you are just about on the right track. You need to avoid the eye-bleed (by using VERBOSE patterns) and having test data that doesn't have typos in it, and more test data. You may like to roll your own test harness, in *Python*, for *Python* regexes, like the following: C:\junk>type re_demo.py import re tests = [ ["AA222.22333,444.44555FF", "AA222.22,444.44"], ["foo/geocoord/32.3244,42.1221244", "foo/geocoord/32.32,42.12"], # what you meant ["foo/geocoord/32.3244,42,1221244", "foo/geocoord/32.32,42,12"], # what you posted ] pat0 = r'([^\d]*)(\d{1,3}\.\d{0,2})?(\d*)(\,\d{1,3}\.\d{0,2})?(\d*)?.*' patx = r""" ([^\d]*) # Grp 1: zero/more non-digits (\d{1,3}\.\d{0,2})? # Grp 2: 1-3 digits, a dot, 0-2 digits (optional) (\d*) # Grp 3: zero/more digits (\,\d{1,3}\.\d{0,2})? # Grp 4: like grp 2 with comma in front (optional) (\d*) # Grp 5: zero/more digits (.*) # Grp 6: any old rubbish """ rx = re.compile(patx, re.VERBOSE) for testin, expected in tests: print "\ntestin:", testin mobj = rx.match(testin) if not mobj: print "no match" continue for k, grp in enumerate(mobj.groups()): print "Group %d matched %r" % (k+1, grp) actual = rx.sub(r"\1\2\4", testin) print "expected: %r; actual: %r; same: %r" % (expected, actual, expected == actual) C:\junk>re_demo.py testin: AA222.22333,444.44555FF Group 1 matched 'AA' Group 2 matched '222.22' Group 3 matched '333' Group 4 matched ',444.44' Group 5 matched '555' Group 6 matched 'FF' expected: 'AA222.22,444.44'; actual: 'AA222.22,444.44'; same: True testin: foo/geocoord/32.3244,42.1221244 Group 1 matched 'foo/geocoord/' Group 2 matched '32.32' Group 3 matched '44' Group 4 matched ',42.12' Group 5 matched '21244' Group 6 matched '' expected: 'foo/geocoord/32.32,42.12'; actual: 'foo/geocoord/32.32,42.12'; same: True testin: foo/geocoord/32.3244,42,1221244 Group 1 matched 'foo/geocoord/' Group 2 matched '32.32' Group 3 matched '44' Group 4 matched None Group 5 matched '' Group 6 matched ',42,1221244' Traceback (most recent call last): File "C:\junk\re_demo.py", line 28, in ? actual = rx.sub(r"\1\2\4", testin) File "C:\Python24\lib\sre.py", line 260, in filter return sre_parse.expand_template(template, match) File "C:\Python24\lib\sre_parse.py", line 782, in expand_template raise error, "unmatched group" sre_constants.error: unmatched group === HTH, John -- http://mail.python.org/mailman/listinfo/python-list