how then, do i specify a non-greedy regex <1st-pat><not-1st-pat>*?<follow-pat>
that is, such that non-greedy part <not-1st-pat>*? excludes a match of <1st-pat> in other words, how do i write regexes for my examples? what book or books on regexes or with a good section on regexes would you recommend? Hopcroft and Ullman? "André Malo" <[EMAIL PROTECTED]> wrote in message news:[EMAIL PROTECTED] > * "lothar" <[EMAIL PROTECTED]> wrote: > > > this response is nothing but a description of the behavior i reported. > > Then you have not read my response carefully enough. > > > as to whether this behaviour was intended, one would have to ask the module > > writer about that. > > No, I've responded with a view on regexes, not on the module. That is the way > _regexes_ work. Non-greedy regexes do not match the minimal-length at all, they > are just ... non-greedy (technically the backtracking just stacks the longest > instead of the shortest). They *may* match the shortest match, but it's a > special case. Therefore I've stated that the documentation is incomplete. > > Actually your expectations go a bit beyond the documentation. From a certain > point of view (matches always start most left) the matches you're seeing > *are* the minimal-length matches. > > > because of the statement in the documentation, which places no qualification > ^^^^^^^^^^^^^^^^ > that's the point. > > > on how the scan for the shortest possible match is to be done, my guess is > > that this problem was overlooked. > > In the docs, yes. But buy yourself a regex book and learn for yourself ;-) > The first thing you should learn about regexes is that the source of pain > of most regex implementations is the documentation, which is very likely > to be wrong. > > Finally let me ask a question: > > import re > x = re.compile('<.*?>') > print x.search('<title>...</title><body>...</body>').group(0) > > What would you expect to be printed out? <title> or <body>? Why? > > nd -- http://mail.python.org/mailman/listinfo/python-list