Just for clarification,

Do you agree that the behavior I wrote down matches that from the 
implementation you use?

Do you disagree that most/least- repetition should replace longest/shortest as 
terminology when used in the standard?

Thanks Geoff.

获取Outlook for Android<https://aka.ms/AAb9ysg>
________________________________
From: austin-group-l@opengroup.org <austin-group-l@opengroup.org> on behalf of 
Austin Group Bug Tracker via austin-group-l at The Open Group 
<austin-group-l@opengroup.org>
Sent: Monday, September 23, 2024 4:56:40 PM
To: austin-group-l@opengroup.org <austin-group-l@opengroup.org>
Subject: [1003.1(2024)/Issue8 0001857]: Several problems with the new "lazy" 
regex quantifier.


A NOTE has been added to this issue.
======================================================================
https://jpn01.safelinks.protection.outlook.com/?url=https%3A%2F%2Faustingroupbugs.net%2Fview.php%3Fid%3D1857&data=05%7C02%7C%7Cac18dae9f2004a85542c08dcdbadf49e%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C638626787382941308%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=BR47HNymVbSQLaMwEn9t%2Fyw6K0%2Bps%2F6vb2by1Q2gEiQ%3D&reserved=0<https://austingroupbugs.net/view.php?id=1857>
======================================================================
Reported By:                dannyniu
Assigned To:
======================================================================
Project:                    1003.1(2024)/Issue8
Issue ID:                   1857
Category:                   Base Definitions and Headers
Type:                       Error
Severity:                   Objection
Priority:                   normal
Status:                     New
Name:                       DannyNiu/NJF
Organization:               Individual
User Reference:
Section:                    9.1 Regular Expression Definitions # and others.
Page Number:                179-180 and others
Line Number:                6366-6368 and others.
Interp Status:              ---
Final Accepted Text:
======================================================================
Date Submitted:             2024-09-14 12:54 UTC
Last Modified:              2024-09-23 08:56 UTC
======================================================================
Summary:                    Several problems with the new "lazy" regex
quantifier.
======================================================================

----------------------------------------------------------------------
 (0006880) geoffclare (manager) - 2024-09-23 08:56
 
https://jpn01.safelinks.protection.outlook.com/?url=https%3A%2F%2Faustingroupbugs.net%2Fview.php%3Fid%3D1857%23c6880&data=05%7C02%7C%7Cac18dae9f2004a85542c08dcdbadf49e%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C638626787382961273%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=GJ5S92iychuDnZrllSQAznOsqFiwHINIEzU6HyuKU4Q%3D&reserved=0<https://austingroupbugs.net/view.php?id=1857#c6880>
----------------------------------------------------------------------
> For quantifiers without the `?` lazy quantifier, the most number of
possible repetition is the fittest in terms of length; likewise, for
quantifiers with the `?` lazy quantifier, the least number of possible
repetition is the fittest in terms of length.

This would change the established-for-decades "longest" requirement to
"most repetitions", which is not the same thing. And it turns out that on
macOS the '?' modifier does not change to matching the least repetitions,
it is shortest match; the re_format(7) man page is wrong. Tested using the
program at the end of 
https://jpn01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fposix.rhansen.org%2Fp%2F2020-11-09&data=05%7C02%7C%7Cac18dae9f2004a85542c08dcdbadf49e%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C638626787382974911%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=1Y0kq1gBg7NjtIK6pupL6XAHH4aR1u0emZqfQ5F8x9Y%3D&reserved=0<https://posix.rhansen.org/p/2020-11-09>
 with
REG_MINIMAL removed:
<pre>$ ./a.out '([ab]{6}|a)*?b' aaaabbbb
regexec() returned 0
rm_so 0, rm_eo 5</pre>
(Least repetitions would give rm_eo 7.)

Same test with grep, using -o to see what matched:
<pre>$ echo aaaabbbb | grep -E -o '([ab]{6}|a)*?b'
aaaab
b
b
b</pre>
This behaviour makes sense as the whole point of REG_MINIMAL and the '?'
modifier is to change to the opposite greediness, and the opposite of
longest is shortest. Having the default as longest and REG_MINIMAL/'?' as
least repetitions would produce the same output in the above tests with and
without the '?', making them pointless in such cases.

Issue History
Date Modified    Username       Field                    Change
======================================================================
2024-09-14 12:54 dannyniu       New Issue
2024-09-14 12:54 dannyniu       Name                      => DannyNiu/NJF
2024-09-14 12:54 dannyniu       Organization              => Individual
2024-09-14 12:54 dannyniu       Section                   => 9.1 Regular
Expression Definitions # and others.
2024-09-14 12:54 dannyniu       Page Number               => 179-180 and others
2024-09-14 12:54 dannyniu       Line Number               => 6366-6368 and
others.
2024-09-20 08:05 dannyniu       Note Added: 0006879
2024-09-20 08:07 dannyniu       Note Edited: 0006879
2024-09-20 08:13 dannyniu       Note Edited: 0006879
2024-09-23 08:56 geoffclare     Note Added: 0006880
======================================================================


            • ... Hans Åberg via austin-group-l at The Open Group
              • ... Steffen Nurpmeso via austin-group-l at The Open Group
              • ... Hans Åberg via austin-group-l at The Open Group
              • ... Steffen Nurpmeso via austin-group-l at The Open Group
              • ... Hans Åberg via austin-group-l at The Open Group
        • ... Stephane Chazelas via austin-group-l at The Open Group
          • ... Geoff Clare via austin-group-l at The Open Group
  • [1003.1(2024... Austin Group Bug Tracker via austin-group-l at The Open Group
  • [1003.1(2024... Austin Group Bug Tracker via austin-group-l at The Open Group
    • Re: [10... Niu Danny via austin-group-l at The Open Group
    • Re: [10... Niu Danny via austin-group-l at The Open Group
      • Re:... Geoff Clare via austin-group-l at The Open Group
        • ... Niu Danny via austin-group-l at The Open Group
          • ... Geoff Clare via austin-group-l at The Open Group
  • [1003.1(2024... Austin Group Bug Tracker via austin-group-l at The Open Group
    • Re: [10... Steffen Nurpmeso via austin-group-l at The Open Group
      • Re:... Geoff Clare via austin-group-l at The Open Group
        • ... Harald van Dijk via austin-group-l at The Open Group
        • ... Garrett Wollman via austin-group-l at The Open Group
          • ... Steffen Nurpmeso via austin-group-l at The Open Group
          • ... Geoff Clare via austin-group-l at The Open Group

Reply via email to