MonkeeSage wrote:
> Licheng Fang wrote:
> > Basically, the problem is this:
> >
> > >>> p = re.compile("do|dolittle")
> > >>> p.match("dolittle").group()
> > 'do'
>
> >From what I understand, this isn't python specific, it is the expected
> behavior of that pattern in any implementation. You are using
> alternation, which means "either, or", and you have the shorter
> subexpression first, so the condition is satisfied by just 'do' and the
> matching terminates.
>
> > There's another example:
> >
> > >>> p = re.compile("one(self)?(selfsufficient)?")
> > >>> p.match("oneselfsufficient").group()
> > 'oneself'
>
> Again, I don't think this has anything to do with python. You pattern
> basically means "match 'one' whether it is followed by 'self' or not,
> and whether it is followed by 'selfsufficient' or not". For this
> particular example, you'd want something like
> "one(self)?(sufficient)?".
>
> I think you could construct a pattern that would do what you want in
> python without any problem. If you post a (short) example of your data,
> I'm sure someone could help you with it.
>
> Regards,
> JordanHi, according to these regexp engine discussions, it's NOT a behavior true to any implementation. http://www.softec.st/en/OpenSource/DevelopersCorner/RegularExpressions/RegularExpressionEngines.html http://www.softec.st/en/OpenSource/DevelopersCorner/RegularExpressions/RegularExpressionEngines.html Python's NFA engine reads along the input string, matching it to the pattern, and backtracking when needed. By contrast a DFA engine, to my understanding, constructs a DFA and uses it to munch as many characters as possible. Maybe it's like this: Pattern: one(self)?(selfsufficient)? PYTHON'S NFA ENGINE: one self, none selfsufficient, none (start)------->((1))------------>((2))----------------------->((3)) DFA ENGINE: one self (start)------->((123))------------>((23)) | | | selfsufficient --------------->((3)) I want to know if there is some way to make Python RE behave like grep does, or do I have to change to another engine? -- http://mail.python.org/mailman/listinfo/python-list
