The following issue has been SUBMITTED. ====================================================================== https://www.austingroupbugs.net/view.php?id=1857 ====================================================================== Reported By: dannyniu Assigned To: ====================================================================== Project: 1003.1(2024)/Issue8 Issue ID: 1857 Category: Base Definitions and Headers Type: Error Severity: Objection Priority: normal Status: New Name: DannyNiu/NJF Organization: Individual User Reference: Section: 9.1 Regular Expression Definitions # and others. Page Number: 179-180 and others Line Number: 6366-6368 and others. Interp Status: --- Final Accepted Text: ====================================================================== Date Submitted: 2024-09-14 12:54 UTC Last Modified: 2024-09-14 12:54 UTC ====================================================================== Summary: Several problems with the new "lazy" regex quantifier. Description: 1. Newly added text describe "shortest" match with the word "longest": ====
lines 6366-6368 on page 180: > However, matching the ERE "(.*?).*" against "abcdef", the subpattern "(.*?)" matches the empty string, since that is the **longest** possible match for the ERE ".*?". The qualifier "?" modifies the quantifier to make them "lazy", and the "longest" make the intention of the standard writer confusing. Maybe it should be "shortest"? 2. the supposed length of subpatterns. ==== Lines 6362-6363 on page 180: > Consistent with the whole match being the longest of the leftmost matches, each subpattern, from left to right, shall match the longest possible string This is okay for longest matches and without the "lazy" qualifier. In a conceptual implementation of ERE, a back-tracking recursive-decending matcher greedily match, from left to right, each subpattern - so that they're longest before the final match. For each new match that're longer than the previous, the right-most subpatterns are contracted first, before left-side ones. Thus after all iterations, the result conform to the requirement laid out for subpatterns naturally. However, when "lazy" qualifier's applied to a component in a subpattern (one that's parenthesized), without other "greedy" quantifiers applied to other component(s) of the subpattern, the subpattern can only be "shortest". Therefore, the rules regarding the matched lengths of subpatterns with "lazy" qualifier(s) needs to be updated. Next, when `REG_MINIMAL` is applied to the whole regex, the quantifiers become "lazy" by default, therefore absant any qualifier, subpatterns matched can only be "shortest". Thus re-iterating the need to update the rules for matching subpatterns when "lazy" qualfiers/specifiers are used. Desired Action: Various. The rules needs to be worked out carefully, and I have no definitive desired action at this moment. Also, this issue need to be related #793 and #1329, unless it's against procedure to relate new bugs to closed ones. ====================================================================== Issue History Date Modified Username Field Change ====================================================================== 2024-09-14 12:54 dannyniu New Issue 2024-09-14 12:54 dannyniu Name => DannyNiu/NJF 2024-09-14 12:54 dannyniu Organization => Individual 2024-09-14 12:54 dannyniu Section => 9.1 Regular Expression Definitions # and others. 2024-09-14 12:54 dannyniu Page Number => 179-180 and others 2024-09-14 12:54 dannyniu Line Number => 6366-6368 and others. ======================================================================
