Steffen Nurpmeso wrote, on 26 Sep 2024:
>
> Also, compare for example this snippet of "man perlre"
> 
>  Alternatives are tried from left to right, so the first alternative
>  found for which the entire expression matches, is the one that is
>  chosen. This means that alternatives are not necessarily greedy. For
>  example: when matching "foo|foot" against "barefoot", only the "foo"
>  part will match, as that is the first alternative tried, and it
>  successfully matches the target string. (This might not seem important,
>  but it is important when you are capturing matched text using
>  parentheses.)
> 
> with 9.1 (p 179 bottom / 180 top),
> 
>   If the pattern permits a variable number of matching characters
>   and thus there is more than one such sequence starting at that
>   point, the longest such sequence is matched. For example, the
>   BRE "bb*" matches the second to fourth characters of the string
>   "abbbc", and the ERE "(wee|week)(knights|night)" matches all ten
>   characters of the string "weeknights.

*LIGHT*BULB*MOMENT*

THIS is the real key to the shortest vs. least repetitions issue.

I'm fairly sure you can only get a different result from longest
vs. most repetitions (or shortest vs. least repetitions) if you
can freely choose between alternatives on each repetition.

perl and python (and presumably php) don't have the issue because,
unlike POSIX, they try the alternatives in order.

My original test case has the [ab]{6} first:

$ echo aaaabbbb | perl -e '$i=<STDIN>;if($i =~ "([ab]{6}|a)*?b"){print "i<$i>; 
1<$1> 2<$2> 3<$3> &<$&>\n"}else{print "no match\n"}'
i<aaaabbbb
>; 1<aaaabb> 2<> 3<> &<aaaabbb>
$ echo aaaabbbb | perl -e '$i=<STDIN>;if($i =~ "([ab]{6}|a)*b"){print "i<$i>; 
1<$1> 2<$2> 3<$3> &<$&>\n"}else{print "no match\n"}' 
i<aaaabbbb
>; 1<aaaabb> 2<> 3<> &<aaaabbb>
$ python3
Python 3.11.2 (main, Mar 13 2023, 12:18:29) [GCC 12.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import re
>>> print(re.search(r'([ab]{6}|a)*?b', "aaaabbbb").group(0))
aaaabbb
>>> print(re.search(r'([ab]{6}|a)*b', "aaaabbbb").group(0))
aaaabbb
>>> 

Switching the alternatives round gives the shorter result for both
greedy and non-greedy:

$ echo aaaabbbb | perl -e '$i=<STDIN>;if($i =~ "(a|[ab]{6})*b"){print "i<$i>; 
1<$1> 2<$2> 3<$3> &<$&>\n"}else{print "no match\n"}'  
i<aaaabbbb
>; 1<a> 2<> 3<> &<aaaab>
$ echo aaaabbbb | perl -e '$i=<STDIN>;if($i =~ "(a|[ab]{6})*?b"){print "i<$i>; 
1<$1> 2<$2> 3<$3> &<$&>\n"}else{print "no match\n"}'
i<aaaabbbb
>; 1<a> 2<> 3<> &<aaaab>
$ python3
Python 3.11.2 (main, Mar 13 2023, 12:18:29) [GCC 12.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import re
>>> print(re.search(r'(a|[ab]{6})*?b', "aaaabbbb").group(0))
aaaab
>>> print(re.search(r'(a|[ab]{6})*b', "aaaabbbb").group(0))
aaaab
>>> 

This means all of the comparisons between macOS and perl/python/php
regarding non-greedy matching (with alternatives) are invalid and
there is no problem specifying the macOS behaviour in POSIX.

-- 
Geoff Clare <g.cl...@opengroup.org>
The Open Group, Apex Plaza, Forbury Road, Reading, RG1 1AX, England

          • ... Steffen Nurpmeso via austin-group-l at The Open Group
            • ... Niu Danny via austin-group-l at The Open Group
              • ... Steffen Nurpmeso via austin-group-l at The Open Group
              • ... Mats Wichmann via austin-group-l at The Open Group
            • ... Steffen Nurpmeso via austin-group-l at The Open Group
              • ... Geoff Clare via austin-group-l at The Open Group
              • ... Steffen Nurpmeso via austin-group-l at The Open Group
              • ... Geoff Clare via austin-group-l at The Open Group
              • ... Geoff Clare via austin-group-l at The Open Group
              • ... Steffen Nurpmeso via austin-group-l at The Open Group
              • ... Geoff Clare via austin-group-l at The Open Group
              • ... Steffen Nurpmeso via austin-group-l at The Open Group
            • ... Hans Åberg via austin-group-l at The Open Group
              • ... Steffen Nurpmeso via austin-group-l at The Open Group
              • ... Hans Åberg via austin-group-l at The Open Group
              • ... Steffen Nurpmeso via austin-group-l at The Open Group
              • ... Hans Åberg via austin-group-l at The Open Group
        • ... Stephane Chazelas via austin-group-l at The Open Group
          • ... Geoff Clare via austin-group-l at The Open Group
  • [1003.1(2024... Austin Group Bug Tracker via austin-group-l at The Open Group
  • [1003.1(2024... Austin Group Bug Tracker via austin-group-l at The Open Group

Reply via email to