See below. > 2025年3月5日 02:37,Steffen Nurpmeso via austin-group-l at The Open Group > <[email protected]> 写道: > > [it seems to me Mantis should set Reply-To: and/or > Mail-Followup-To: to [email protected], as it did > before?] > > Austin Group Issue Tracker wrote in > <l6pptcinzl3kuidmhbp0tlfuiskjkmdpkkmroi...@www.austingroupbugs.net>: > ... > |https://www.austingroupbugs.net/view.php?id=1857 > ... > | (0007090) dannyniu (reporter) - 2025-03-04 14:56 > | https://www.austingroupbugs.net/view.php?id=1857#c7090 > |---------------------------------------------------------------------- > |For the sake of public record, I'm duplicating mailing list message \ > |to note here > |that Geoff's step-by-step analysis of my torture testing case (at > |https://www.austingroupbugs.net/view.php?id=1857#c6898 ) is inconsistent \ > |with > |macOS `grep`. Here's my terminal output: > ... > |Whether this is indeed a bug in software with no change to the standard \ > |text > |needed, or that the standard text itself is in error is arguable. > ... > > I argue in favour of what is the resolution of this bug, and which > reads (is parts): > > If the pattern permits a variable number of matching characters and thus > there is more than one such sequence starting at that point, the matched > sequence shall be the longest such sequence for which any minimal repetitions > (see [xref to 9.4.6]) used in the match have the shortest possible match. For > example, the BRE "bb*" matches the second to fourth characters of the string > "abbbc", and the ERE "(wee|week)(knights|night)" matches all ten characters > of the string "weeknights". However, the ERE "(aaa??)*" matches only the > first four characters of the string "aaaaa", not all five, because in order > to match all five, "a??" would match with length one instead of zero; the ERE > "(aaa??)*|(aaa?)*" matches all five because the longest match is one which > does not use any minimal repetitions. > > Consistent with the match for the entire regular expression being the > leftmost and longest for which any minimal repetitions used in the match have > the shortest possible match, each BRE or ERE in a concatenated set, from left > to right, shall match the longest possible string for which any minimal > repetitions used in the match for that BRE or ERE have the shortest possible > match. > > and > > Note that the repetition modifier '?' (<question-mark>) is specified as > changing the matching behavior for the modified repetition from the leftmost > longest possible match to the leftmost shortest possible match. This does not > necessarily give the same result as matching with the least repetitions. For > example, the ERE "([ab]{6}|a)*?b" matches the first five characters of the > string "aaaabbbb" as this is the shortest for the minimal repetition "*?". > Matching with the least repetitions would match the first seven characters by > using one repetition of "[ab]{6}" instead of four repetitions of "a". This > distinction is only possible because the alternatives in an ERE alternation > are chosen according to which gives the longest (or shortest) match. Other > types of regular expression exist (notably in perl, php, and python) where > the alternatives are tried in order; for those there is no difference between > longest and most repetitions or between shortest and least repetitions.
I think having an informative text describing the matching behavior in terms of the precedences of constructs will instantly make it more understandable, and therefore, more clear. I was thinking about something like this: The precedence of quantifiers are as follow (from highest to lowest): 1. The length of any minimal quantifier -modified subexpression shall be such that they match the shortest substring of the subject string, in descending priority from left to right. 2. Consistent with rule 1, the length of the overall match shall be the longest possible. 3. The length of any greedy quantifier -modified subexpression shall be such that they match the longest substring of the subject string, in descending priority from left to right. What do you think? @steffen
