Niu Danny wrote, on 23 Sep 2024: > > How do we then resolve the ambiguity matrix of greedy/lazy partial/entire > match? I can't derive concrete step/outline from the current wording.
We should clarify this. What macOS does is give precedence to non-greedy matches, as per the re_format(7) man page: In the current implementation, minimal repetitions have a high precedence, and can cause other standards requirements to be violated. For instance, on the string `aaaaa', the RE `(aaa??)*' will only match the first four characters, violating the rules that the longest possible match is made and the longest subexpressions are matched. I checked that the man page is accurate about this case: $ ./a.out '(aaa??)*' aaaaa regexec() returned 0 rm_so 0, rm_eo 4 Unless someone knows of an alternative implementation that does it differently, I think we should specify this behaviour. Regards, Geoff. > > 获取Outlook for Android<https://aka.ms/AAb9ysg> > ________________________________ > From: austin-group-l@opengroup.org <austin-group-l@opengroup.org> on behalf > of Geoff Clare via austin-group-l at The Open Group > <austin-group-l@opengroup.org> > Sent: Monday, September 23, 2024 10:12:48 PM > To: austin-group-l@opengroup.org <austin-group-l@opengroup.org> > Subject: Re: [1003.1(2024)/Issue8 0001857]: Several problems with the new > "lazy" regex quantifier. > > Niu Danny wrote, on 23 Sep 2024: > > > > Just for clarification, > > > > Do you agree that the behavior I wrote down matches that from the > > implementation you use? > > What you wrote does not match macOS behaviour. > > > Do you disagree that most/least- repetition should replace longest/shortest > > as terminology when used in the standard? > > > > Yes I disagree. The standard should continue to say longest/shortest. > > Regards, > Geoff. > > > ________________________________ > > From: austin-group-l@opengroup.org <austin-group-l@opengroup.org> on behalf > > of Austin Group Bug Tracker via austin-group-l at The Open Group > > <austin-group-l@opengroup.org> > > Sent: Monday, September 23, 2024 4:56:40 PM > > To: austin-group-l@opengroup.org <austin-group-l@opengroup.org> > > Subject: [1003.1(2024)/Issue8 0001857]: Several problems with the new > > "lazy" regex quantifier. > > > > > > A NOTE has been added to this issue. > > ====================================================================== > > https://jpn01.safelinks.protection.outlook.com/?url=https%3A%2F%2Faustingroupbugs.net%2Fview.php%3Fid%3D1857&data=05%7C02%7C%7C180d18847ad1489af47b08dcdbd9def4%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C638626976033423054%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=so0onpXlWKu6FWkERb3gs5yGnLTESXFQOgVsr079R0U%3D&reserved=0<https://jpn01.safelinks.protection.outlook.com/?url=https%3A%2F%2Faustingroupbugs.net%2Fview.php%3Fid%3D1857&data=05%7C02%7C%7C180d18847ad1489af47b08dcdbd9def4%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C638626976033440702%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=qteIZ69f5mN2ImAKsEW40A383aP6bhdla3tXeHZNafk%3D&reserved=0><https://austingroupbugs.net/view.php?id=1857> > > ====================================================================== > > Reported By: dannyniu > > Assigned To: > > ====================================================================== > > Project: 1003.1(2024)/Issue8 > > Issue ID: 1857 > > Category: Base Definitions and Headers > > Type: Error > > Severity: Objection > > Priority: normal > > Status: New > > Name: DannyNiu/NJF > > Organization: Individual > > User Reference: > > Section: 9.1 Regular Expression Definitions # and others. > > Page Number: 179-180 and others > > Line Number: 6366-6368 and others. > > Interp Status: --- > > Final Accepted Text: > > ====================================================================== > > Date Submitted: 2024-09-14 12:54 UTC > > Last Modified: 2024-09-23 08:56 UTC > > ====================================================================== > > Summary: Several problems with the new "lazy" regex > > quantifier. > > ====================================================================== > > > > ---------------------------------------------------------------------- > > (0006880) geoffclare (manager) - 2024-09-23 08:56 > > > > https://jpn01.safelinks.protection.outlook.com/?url=https%3A%2F%2Faustingroupbugs.net%2Fview.php%3Fid%3D1857%23c6880&data=05%7C02%7C%7C180d18847ad1489af47b08dcdbd9def4%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C638626976033453821%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=RYKjIPTOMeLM3s4bTjEnXWV5%2B9XKrd3BF0rjIA3pjDg%3D&reserved=0<https://jpn01.safelinks.protection.outlook.com/?url=https%3A%2F%2Faustingroupbugs.net%2Fview.php%3Fid%3D1857%23c6880&data=05%7C02%7C%7C180d18847ad1489af47b08dcdbd9def4%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C638626976033466845%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=Xaca8Gi9GaetxwWaOvwEiVEhZjnaqxtftYo3ppLx5W0%3D&reserved=0><https://austingroupbugs.net/view.php?id=1857#c6880> > > ---------------------------------------------------------------------- > > > For quantifiers without the `?` lazy quantifier, the most number of > > possible repetition is the fittest in terms of length; likewise, for > > quantifiers with the `?` lazy quantifier, the least number of possible > > repetition is the fittest in terms of length. > > > > This would change the established-for-decades "longest" requirement to > > "most repetitions", which is not the same thing. And it turns out that on > > macOS the '?' modifier does not change to matching the least repetitions, > > it is shortest match; the re_format(7) man page is wrong. Tested using the > > program at the end of > > https://jpn01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fposix.rhansen.org%2Fp%2F2020-11-09&data=05%7C02%7C%7C180d18847ad1489af47b08dcdbd9def4%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C638626976033481490%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=Uim%2F%2FkvO1BdugtWR35NH2WIpTXETX3atA5DO70ijhgA%3D&reserved=0<https://jpn01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fposix.rhansen.org%2Fp%2F2020-11-09&data=05%7C02%7C%7C180d18847ad1489af47b08dcdbd9def4%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C638626976033496623%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=TOPna2HntXPsCzsNSilrW0QwFdddL9tnGyyu9nMRAC0%3D&reserved=0><https://posix.rhansen.org/p/2020-11-09> > > with > > REG_MINIMAL removed: > > <pre>$ ./a.out '([ab]{6}|a)*?b' aaaabbbb > > regexec() returned 0 > > rm_so 0, rm_eo 5</pre> > > (Least repetitions would give rm_eo 7.) > > > > Same test with grep, using -o to see what matched: > > <pre>$ echo aaaabbbb | grep -E -o '([ab]{6}|a)*?b' > > aaaab > > b > > b > > b</pre> > > This behaviour makes sense as the whole point of REG_MINIMAL and the '?' > > modifier is to change to the opposite greediness, and the opposite of > > longest is shortest. Having the default as longest and REG_MINIMAL/'?' as > > least repetitions would produce the same output in the above tests with and > > without the '?', making them pointless in such cases. -- Geoff Clare <g.cl...@opengroup.org> The Open Group, Apex Plaza, Forbury Road, Reading, RG1 1AX, England