Niu Danny wrote, on 23 Sep 2024:
>
> How do we then resolve the ambiguity matrix of greedy/lazy partial/entire 
> match? I can't derive concrete step/outline from the current wording.

We should clarify this.  What macOS does is give precedence to non-greedy
matches, as per the re_format(7) man page:

   In the current implementation, minimal repetitions have a high
   precedence, and can cause other standards requirements to be
   violated. For instance, on the string `aaaaa', the RE `(aaa??)*' will
   only match the first four characters, violating the rules that the
   longest possible match is made and the longest subexpressions are
   matched.

I checked that the man page is accurate about this case:

$ ./a.out '(aaa??)*' aaaaa
regexec() returned 0
rm_so 0, rm_eo 4

Unless someone knows of an alternative implementation that does it
differently, I think we should specify this behaviour.

Regards,
Geoff.

> 
> 获取Outlook for Android<https://aka.ms/AAb9ysg>
> ________________________________
> From: austin-group-l@opengroup.org <austin-group-l@opengroup.org> on behalf 
> of Geoff Clare via austin-group-l at The Open Group 
> <austin-group-l@opengroup.org>
> Sent: Monday, September 23, 2024 10:12:48 PM
> To: austin-group-l@opengroup.org <austin-group-l@opengroup.org>
> Subject: Re: [1003.1(2024)/Issue8 0001857]: Several problems with the new 
> "lazy" regex quantifier.
> 
> Niu Danny wrote, on 23 Sep 2024:
> >
> > Just for clarification,
> >
> > Do you agree that the behavior I wrote down matches that from the 
> > implementation you use?
> 
> What you wrote does not match macOS behaviour.
> 
> > Do you disagree that most/least- repetition should replace longest/shortest 
> > as terminology when used in the standard?
> >
> 
> Yes I disagree.  The standard should continue to say longest/shortest.
> 
> Regards,
> Geoff.
> 
> > ________________________________
> > From: austin-group-l@opengroup.org <austin-group-l@opengroup.org> on behalf 
> > of Austin Group Bug Tracker via austin-group-l at The Open Group 
> > <austin-group-l@opengroup.org>
> > Sent: Monday, September 23, 2024 4:56:40 PM
> > To: austin-group-l@opengroup.org <austin-group-l@opengroup.org>
> > Subject: [1003.1(2024)/Issue8 0001857]: Several problems with the new 
> > "lazy" regex quantifier.
> >
> >
> > A NOTE has been added to this issue.
> > ======================================================================
> > https://jpn01.safelinks.protection.outlook.com/?url=https%3A%2F%2Faustingroupbugs.net%2Fview.php%3Fid%3D1857&data=05%7C02%7C%7C180d18847ad1489af47b08dcdbd9def4%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C638626976033423054%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=so0onpXlWKu6FWkERb3gs5yGnLTESXFQOgVsr079R0U%3D&reserved=0<https://jpn01.safelinks.protection.outlook.com/?url=https%3A%2F%2Faustingroupbugs.net%2Fview.php%3Fid%3D1857&data=05%7C02%7C%7C180d18847ad1489af47b08dcdbd9def4%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C638626976033440702%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=qteIZ69f5mN2ImAKsEW40A383aP6bhdla3tXeHZNafk%3D&reserved=0><https://austingroupbugs.net/view.php?id=1857>
> > ======================================================================
> > Reported By:                dannyniu
> > Assigned To:
> > ======================================================================
> > Project:                    1003.1(2024)/Issue8
> > Issue ID:                   1857
> > Category:                   Base Definitions and Headers
> > Type:                       Error
> > Severity:                   Objection
> > Priority:                   normal
> > Status:                     New
> > Name:                       DannyNiu/NJF
> > Organization:               Individual
> > User Reference:
> > Section:                    9.1 Regular Expression Definitions # and others.
> > Page Number:                179-180 and others
> > Line Number:                6366-6368 and others.
> > Interp Status:              ---
> > Final Accepted Text:
> > ======================================================================
> > Date Submitted:             2024-09-14 12:54 UTC
> > Last Modified:              2024-09-23 08:56 UTC
> > ======================================================================
> > Summary:                    Several problems with the new "lazy" regex
> > quantifier.
> > ======================================================================
> >
> > ----------------------------------------------------------------------
> >  (0006880) geoffclare (manager) - 2024-09-23 08:56
> >  
> > https://jpn01.safelinks.protection.outlook.com/?url=https%3A%2F%2Faustingroupbugs.net%2Fview.php%3Fid%3D1857%23c6880&data=05%7C02%7C%7C180d18847ad1489af47b08dcdbd9def4%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C638626976033453821%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=RYKjIPTOMeLM3s4bTjEnXWV5%2B9XKrd3BF0rjIA3pjDg%3D&reserved=0<https://jpn01.safelinks.protection.outlook.com/?url=https%3A%2F%2Faustingroupbugs.net%2Fview.php%3Fid%3D1857%23c6880&data=05%7C02%7C%7C180d18847ad1489af47b08dcdbd9def4%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C638626976033466845%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=Xaca8Gi9GaetxwWaOvwEiVEhZjnaqxtftYo3ppLx5W0%3D&reserved=0><https://austingroupbugs.net/view.php?id=1857#c6880>
> > ----------------------------------------------------------------------
> > > For quantifiers without the `?` lazy quantifier, the most number of
> > possible repetition is the fittest in terms of length; likewise, for
> > quantifiers with the `?` lazy quantifier, the least number of possible
> > repetition is the fittest in terms of length.
> >
> > This would change the established-for-decades "longest" requirement to
> > "most repetitions", which is not the same thing. And it turns out that on
> > macOS the '?' modifier does not change to matching the least repetitions,
> > it is shortest match; the re_format(7) man page is wrong. Tested using the
> > program at the end of 
> > https://jpn01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fposix.rhansen.org%2Fp%2F2020-11-09&data=05%7C02%7C%7C180d18847ad1489af47b08dcdbd9def4%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C638626976033481490%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=Uim%2F%2FkvO1BdugtWR35NH2WIpTXETX3atA5DO70ijhgA%3D&reserved=0<https://jpn01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fposix.rhansen.org%2Fp%2F2020-11-09&data=05%7C02%7C%7C180d18847ad1489af47b08dcdbd9def4%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C638626976033496623%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=TOPna2HntXPsCzsNSilrW0QwFdddL9tnGyyu9nMRAC0%3D&reserved=0><https://posix.rhansen.org/p/2020-11-09>
> >  with
> > REG_MINIMAL removed:
> > <pre>$ ./a.out '([ab]{6}|a)*?b' aaaabbbb
> > regexec() returned 0
> > rm_so 0, rm_eo 5</pre>
> > (Least repetitions would give rm_eo 7.)
> >
> > Same test with grep, using -o to see what matched:
> > <pre>$ echo aaaabbbb | grep -E -o '([ab]{6}|a)*?b'
> > aaaab
> > b
> > b
> > b</pre>
> > This behaviour makes sense as the whole point of REG_MINIMAL and the '?'
> > modifier is to change to the opposite greediness, and the opposite of
> > longest is shortest. Having the default as longest and REG_MINIMAL/'?' as
> > least repetitions would produce the same output in the above tests with and
> > without the '?', making them pointless in such cases.

-- 
Geoff Clare <g.cl...@opengroup.org>
The Open Group, Apex Plaza, Forbury Road, Reading, RG1 1AX, England

              • ... Steffen Nurpmeso via austin-group-l at The Open Group
              • ... Hans Åberg via austin-group-l at The Open Group
        • ... Stephane Chazelas via austin-group-l at The Open Group
          • ... Geoff Clare via austin-group-l at The Open Group
  • [1003.1(2024... Austin Group Bug Tracker via austin-group-l at The Open Group
  • [1003.1(2024... Austin Group Bug Tracker via austin-group-l at The Open Group
    • Re: [10... Niu Danny via austin-group-l at The Open Group
    • Re: [10... Niu Danny via austin-group-l at The Open Group
      • Re:... Geoff Clare via austin-group-l at The Open Group
        • ... Niu Danny via austin-group-l at The Open Group
          • ... Geoff Clare via austin-group-l at The Open Group
  • [1003.1(2024... Austin Group Bug Tracker via austin-group-l at The Open Group
    • Re: [10... Steffen Nurpmeso via austin-group-l at The Open Group
      • Re:... Geoff Clare via austin-group-l at The Open Group
        • ... Harald van Dijk via austin-group-l at The Open Group
        • ... Garrett Wollman via austin-group-l at The Open Group
          • ... Steffen Nurpmeso via austin-group-l at The Open Group
          • ... Geoff Clare via austin-group-l at The Open Group
            • ... Geoff Clare via austin-group-l at The Open Group
        • ... Steffen Nurpmeso via austin-group-l at The Open Group
  • [1003.1(2024... Austin Group Bug Tracker via austin-group-l at The Open Group

Reply via email to