Niu Danny wrote, on 04 Mar 2025: > > With the current resolution of Bug-1857 > // > <//www.austingroupbugs.net/view.php?id=1857#c6881>www.austingroupbugs.net/view.php?id=1857#c6881 > <http://www.austingroupbugs.net/view.php?id=1857#c6881> , we have: > > > Consistent with the match for the entire > > regular expression being the leftmost and > > longest for which any minimal repetitions > > used in the match have the shortest possible > > match, > > Q1: does the "for which" clause imply that > if there are any minimal quantifiers, > the overall match may *Not Necessarily* be > the longest?
It's the longest, subject to some other condition being satisfied. So not necessarily the longest absent that condition. > The examples from the previous paragraph seem to confirm this: > > > However, the ERE "(aaa??)*" matches only > > the first four characters of the string "aaaaa", > > not all five, because in order to match all five, > > "a??" would match with length one instead of zero; > > the ERE "(aaa??)*|(aaa?)*" matches all five because > > the longest match is one which does not use > > any minimal repetitions. > > In which case, I think the length of the overall match > is ambiguous. I disagree. > ---- > > > each BRE or ERE in a concatenated set, > > from left to right, shall match the longest > > possible string for which any minimal repetitions > > used in the match for that BRE or ERE have > > the shortest possible match. > > Q2: are the said BRE and ERE parenthesized? They don't have to be, but involving parentheses is the only way to tell which part of the overall BRE or ERE matched which part of the input string. > It is mentioned in a bug note (from @geoffclare): > www.austingroupbugs.net/view.php?id=1857#c6890 > <http://www.austingroupbugs.net/view.php?id=1857#c6890> > > ---- > > > There is certainly no intention to require > > the '?' modifier to act recursively, and > > I can't see any way to interpret my suggested > > wording as implying it. > > Q3: How can it simultaneously: > > - not act recursively, > - match the shortest subject string when > it's applied to a parenthesized subexpression > with a greedy quantifier in it? > > e.g. `([0-9]+)+?` This is a pathological case because you are simultaneously asking for both the longest and shortest match for the SAME part of the string. Such cases ought not to occur in real-world use. What I meant, when I said it is not recursive, is something like: ([0-9]+[a-z]*)+? where the inner + and * are individually greedy; they don't inherit the outer repetition's non-greediness. > ---- > > Observation 1: > > @geoffclare replying to @dannyniu > www.austingroupbugs.net/view.php?id=1857#c6883 > <http://www.austingroupbugs.net/view.php?id=1857#c6883> > > >> if both greedy **AND** lazy quantifiers're nested ... > > > That was the reason for wording it as "longest > > possible ... for which any minimal repetitions used ... > > have the shortest possible match". A minimal > > repetition nested inside a greedy one has precedence > > (if used); otherwise, each just follows its normal rule. > > However, greedy ones nested inside minimal ones are > not discussed, and I think this should be added. I would not object to that, if we can find a way to do it that does not make the (already quite convoluted) text harder to read. > ---- > > Observation 2: > > @steffen did experiment on PCRE and TRE, and > the result seem to conflict with Geoff's interpretation > of Danny's torture testing regular expression and > subject string > > Steffen's note: > https://www.austingroupbugs.net/view.php?id=1857#c6888 > > Geoff's note: > https://www.austingroupbugs.net/view.php?id=1857#c6898 We know PCRE is different, and TRE seems to be more buggy than macOS which is why we chose to standardise the macOS behaviour (except for cases where we think it also is buggy). -- Geoff Clare <[email protected]> The Open Group, Apex Plaza, Forbury Road, Reading, RG1 1AX, England
