Steffen Nurpmeso wrote, on 26 Sep 2024: > > Also, compare for example this snippet of "man perlre" > > Alternatives are tried from left to right, so the first alternative > found for which the entire expression matches, is the one that is > chosen. This means that alternatives are not necessarily greedy. For > example: when matching "foo|foot" against "barefoot", only the "foo" > part will match, as that is the first alternative tried, and it > successfully matches the target string. (This might not seem important, > but it is important when you are capturing matched text using > parentheses.) > > with 9.1 (p 179 bottom / 180 top), > > If the pattern permits a variable number of matching characters > and thus there is more than one such sequence starting at that > point, the longest such sequence is matched. For example, the > BRE "bb*" matches the second to fourth characters of the string > "abbbc", and the ERE "(wee|week)(knights|night)" matches all ten > characters of the string "weeknights.
*LIGHT*BULB*MOMENT* THIS is the real key to the shortest vs. least repetitions issue. I'm fairly sure you can only get a different result from longest vs. most repetitions (or shortest vs. least repetitions) if you can freely choose between alternatives on each repetition. perl and python (and presumably php) don't have the issue because, unlike POSIX, they try the alternatives in order. My original test case has the [ab]{6} first: $ echo aaaabbbb | perl -e '$i=<STDIN>;if($i =~ "([ab]{6}|a)*?b"){print "i<$i>; 1<$1> 2<$2> 3<$3> &<$&>\n"}else{print "no match\n"}' i<aaaabbbb >; 1<aaaabb> 2<> 3<> &<aaaabbb> $ echo aaaabbbb | perl -e '$i=<STDIN>;if($i =~ "([ab]{6}|a)*b"){print "i<$i>; 1<$1> 2<$2> 3<$3> &<$&>\n"}else{print "no match\n"}' i<aaaabbbb >; 1<aaaabb> 2<> 3<> &<aaaabbb> $ python3 Python 3.11.2 (main, Mar 13 2023, 12:18:29) [GCC 12.2.0] on linux Type "help", "copyright", "credits" or "license" for more information. >>> import re >>> print(re.search(r'([ab]{6}|a)*?b', "aaaabbbb").group(0)) aaaabbb >>> print(re.search(r'([ab]{6}|a)*b', "aaaabbbb").group(0)) aaaabbb >>> Switching the alternatives round gives the shorter result for both greedy and non-greedy: $ echo aaaabbbb | perl -e '$i=<STDIN>;if($i =~ "(a|[ab]{6})*b"){print "i<$i>; 1<$1> 2<$2> 3<$3> &<$&>\n"}else{print "no match\n"}' i<aaaabbbb >; 1<a> 2<> 3<> &<aaaab> $ echo aaaabbbb | perl -e '$i=<STDIN>;if($i =~ "(a|[ab]{6})*?b"){print "i<$i>; 1<$1> 2<$2> 3<$3> &<$&>\n"}else{print "no match\n"}' i<aaaabbbb >; 1<a> 2<> 3<> &<aaaab> $ python3 Python 3.11.2 (main, Mar 13 2023, 12:18:29) [GCC 12.2.0] on linux Type "help", "copyright", "credits" or "license" for more information. >>> import re >>> print(re.search(r'(a|[ab]{6})*?b', "aaaabbbb").group(0)) aaaab >>> print(re.search(r'(a|[ab]{6})*b', "aaaabbbb").group(0)) aaaab >>> This means all of the comparisons between macOS and perl/python/php regarding non-greedy matching (with alternatives) are invalid and there is no problem specifying the macOS behaviour in POSIX. -- Geoff Clare <g.cl...@opengroup.org> The Open Group, Apex Plaza, Forbury Road, Reading, RG1 1AX, England