How do we then resolve the ambiguity matrix of greedy/lazy partial/entire match? I can't derive concrete step/outline from the current wording.
获取Outlook for Android<https://aka.ms/AAb9ysg> ________________________________ From: [email protected] <[email protected]> on behalf of Geoff Clare via austin-group-l at The Open Group <[email protected]> Sent: Monday, September 23, 2024 10:12:48 PM To: [email protected] <[email protected]> Subject: Re: [1003.1(2024)/Issue8 0001857]: Several problems with the new "lazy" regex quantifier. Niu Danny wrote, on 23 Sep 2024: > > Just for clarification, > > Do you agree that the behavior I wrote down matches that from the > implementation you use? What you wrote does not match macOS behaviour. > Do you disagree that most/least- repetition should replace longest/shortest > as terminology when used in the standard? > Yes I disagree. The standard should continue to say longest/shortest. Regards, Geoff. > ________________________________ > From: [email protected] <[email protected]> on behalf > of Austin Group Bug Tracker via austin-group-l at The Open Group > <[email protected]> > Sent: Monday, September 23, 2024 4:56:40 PM > To: [email protected] <[email protected]> > Subject: [1003.1(2024)/Issue8 0001857]: Several problems with the new "lazy" > regex quantifier. > > > A NOTE has been added to this issue. > ====================================================================== > https://jpn01.safelinks.protection.outlook.com/?url=https%3A%2F%2Faustingroupbugs.net%2Fview.php%3Fid%3D1857&data=05%7C02%7C%7C180d18847ad1489af47b08dcdbd9def4%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C638626976033423054%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=so0onpXlWKu6FWkERb3gs5yGnLTESXFQOgVsr079R0U%3D&reserved=0<https://jpn01.safelinks.protection.outlook.com/?url=https%3A%2F%2Faustingroupbugs.net%2Fview.php%3Fid%3D1857&data=05%7C02%7C%7C180d18847ad1489af47b08dcdbd9def4%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C638626976033440702%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=qteIZ69f5mN2ImAKsEW40A383aP6bhdla3tXeHZNafk%3D&reserved=0><https://austingroupbugs.net/view.php?id=1857> > ====================================================================== > Reported By: dannyniu > Assigned To: > ====================================================================== > Project: 1003.1(2024)/Issue8 > Issue ID: 1857 > Category: Base Definitions and Headers > Type: Error > Severity: Objection > Priority: normal > Status: New > Name: DannyNiu/NJF > Organization: Individual > User Reference: > Section: 9.1 Regular Expression Definitions # and others. > Page Number: 179-180 and others > Line Number: 6366-6368 and others. > Interp Status: --- > Final Accepted Text: > ====================================================================== > Date Submitted: 2024-09-14 12:54 UTC > Last Modified: 2024-09-23 08:56 UTC > ====================================================================== > Summary: Several problems with the new "lazy" regex > quantifier. > ====================================================================== > > ---------------------------------------------------------------------- > (0006880) geoffclare (manager) - 2024-09-23 08:56 > > https://jpn01.safelinks.protection.outlook.com/?url=https%3A%2F%2Faustingroupbugs.net%2Fview.php%3Fid%3D1857%23c6880&data=05%7C02%7C%7C180d18847ad1489af47b08dcdbd9def4%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C638626976033453821%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=RYKjIPTOMeLM3s4bTjEnXWV5%2B9XKrd3BF0rjIA3pjDg%3D&reserved=0<https://jpn01.safelinks.protection.outlook.com/?url=https%3A%2F%2Faustingroupbugs.net%2Fview.php%3Fid%3D1857%23c6880&data=05%7C02%7C%7C180d18847ad1489af47b08dcdbd9def4%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C638626976033466845%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=Xaca8Gi9GaetxwWaOvwEiVEhZjnaqxtftYo3ppLx5W0%3D&reserved=0><https://austingroupbugs.net/view.php?id=1857#c6880> > ---------------------------------------------------------------------- > > For quantifiers without the `?` lazy quantifier, the most number of > possible repetition is the fittest in terms of length; likewise, for > quantifiers with the `?` lazy quantifier, the least number of possible > repetition is the fittest in terms of length. > > This would change the established-for-decades "longest" requirement to > "most repetitions", which is not the same thing. And it turns out that on > macOS the '?' modifier does not change to matching the least repetitions, > it is shortest match; the re_format(7) man page is wrong. Tested using the > program at the end of > https://jpn01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fposix.rhansen.org%2Fp%2F2020-11-09&data=05%7C02%7C%7C180d18847ad1489af47b08dcdbd9def4%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C638626976033481490%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=Uim%2F%2FkvO1BdugtWR35NH2WIpTXETX3atA5DO70ijhgA%3D&reserved=0<https://jpn01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fposix.rhansen.org%2Fp%2F2020-11-09&data=05%7C02%7C%7C180d18847ad1489af47b08dcdbd9def4%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C638626976033496623%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=TOPna2HntXPsCzsNSilrW0QwFdddL9tnGyyu9nMRAC0%3D&reserved=0><https://posix.rhansen.org/p/2020-11-09> > with > REG_MINIMAL removed: > <pre>$ ./a.out '([ab]{6}|a)*?b' aaaabbbb > regexec() returned 0 > rm_so 0, rm_eo 5</pre> > (Least repetitions would give rm_eo 7.) > > Same test with grep, using -o to see what matched: > <pre>$ echo aaaabbbb | grep -E -o '([ab]{6}|a)*?b' > aaaab > b > b > b</pre> > This behaviour makes sense as the whole point of REG_MINIMAL and the '?' > modifier is to change to the opposite greediness, and the opposite of > longest is shortest. Having the default as longest and REG_MINIMAL/'?' as > least repetitions would produce the same output in the above tests with and > without the '?', making them pointless in such cases. -- Geoff Clare <[email protected]> The Open Group, Apex Plaza, Forbury Road, Reading, RG1 1AX, England
