Thanks for the answer. I will give a try to pypy regex.
On Fri, Sep 30, 2011 at 4:56 PM, Vlastimil Brom <vlastimil.b...@gmail.com> wrote: > 2011/9/30 Ovidiu Deac <ovidiud...@gmail.com>: >> This is only part of a regex taken from an old perl application which >> we are trying to understand/port to our new Python implementation. >> >> The original regex was considerably more complex and it didn't compile >> in python so I removed all the parts I could in order to isolate the >> problem such that I can ask help here. >> >> So the problem is that this regex doesn't compile. On the other hand >> I'm not really sure it should. It's an anchor on which you apply *. >> I'm not sure if this is legal. >> >> On the other hand if I remove one of the * it compiles. >> >>>>> re.compile(r"""^(?: [^y]* )*""", re.X) >> Traceback (most recent call last): >> File "<stdin>", line 1, in <module> >> File "/usr/lib/python2.6/re.py", line 190, in compile >> return _compile(pattern, flags) >> File "/usr/lib/python2.6/re.py", line 245, in _compile >> raise error, v # invalid expression >> sre_constants.error: nothing to repeat >>>>> re.compile(r"""^(?: [^y] )*""", re.X) >> <_sre.SRE_Pattern object at 0x7f4069cc36b0> >>>>> re.compile(r"""^(?: [^y]* )""", re.X) >> <_sre.SRE_Pattern object at 0x7f4069cc3730> >> >> Is this a bug in python regex engine? Or maybe some incompatibility with >> Perl? >> >> On Fri, Sep 30, 2011 at 12:29 PM, Chris Angelico <ros...@gmail.com> wrote: >>> On Fri, Sep 30, 2011 at 7:26 PM, Ovidiu Deac <ovidiud...@gmail.com> wrote: >>>> $ python --version >>>> Python 2.6.6 >>> >>> Ah, I think I was misinterpreting the traceback. You do actually have >>> a useful message there; it's the same error that my Py3.2 produced: >>> >>> sre_constants.error: nothing to repeat >>> >>> I'm not sure what your regex is trying to do, but the problem seems to >>> be connected with the * at the end of the pattern. >>> >>> ChrisA >>> -- > > I believe, this is a limitation of the builtin re engine concerning > nested infinite quantifiers - (...*)* - in your pattern. > You can try a more powerful recent regex implementation, which appears > to handle it: > > http://pypi.python.org/pypi/regex > > using the VERBOSE flag - re.X all (unescaped) whitespace outside of > character classes is ignored, > http://docs.python.org/library/re.html#re.VERBOSE > the pattern should be equivalent to: > r"^(?:[^y]*)*" > ie. you are not actually gaining anything with double quantifier, as > there isn't anything "real" in the pattern outside [^y]* > > It appears, that you have oversimplified the pattern (if it had worked > in the original app), > however, you may simply try with > import regex as re > and see, if it helps. > > Cf: >>>> >>>> regex.findall(r"""^(?: [^y]* )*""", "a bcd e", re.X) > ['a bcd e'] >>>> re.findall(r"""^(?: [^y]* )*""", "a bcd e", re.X) > Traceback (most recent call last): > File "<input>", line 1, in <module> > File "re.pyc", line 177, in findall > File "re.pyc", line 244, in _compile > error: nothing to repeat >>>> >>>> re.findall(r"^(?:[^y]*)*", "a bcd e") > Traceback (most recent call last): > File "<input>", line 1, in <module> > File "re.pyc", line 177, in findall > File "re.pyc", line 244, in _compile > error: nothing to repeat >>>> regex.findall(r"^(?:[^y]*)*", "a bcd e") > ['a bcd e'] >>>> regex.findall(r"^[^y]*", "a bcd e") > ['a bcd e'] >>>> > > > hth, > vbr > -- > http://mail.python.org/mailman/listinfo/python-list > -- http://mail.python.org/mailman/listinfo/python-list