A NOTE has been added to this issue. ====================================================================== https://www.austingroupbugs.net/view.php?id=1857 ====================================================================== Reported By: dannyniu Assigned To: ====================================================================== Project: 1003.1(2024)/Issue8 Issue ID: 1857 Category: Base Definitions and Headers Tags: tc1-2024 Type: Error Severity: Objection Priority: normal Status: Interpretation Required Name: DannyNiu/NJF Organization: Individual User Reference: Section: 9.1 Regular Expression Definitions # and others. Page Number: 179-180 and others Line Number: 6366-6368 and others. Interp Status: Approved Final Accepted Text: https://www.austingroupbugs.net/view.php?id=1857#c6919 Resolution: Accepted As Marked Fixed in Version: ====================================================================== Date Submitted: 2024-09-14 12:54 UTC Last Modified: 2025-03-20 15:57 UTC ====================================================================== Summary: Several problems with the new "lazy" regex quantifier. ====================================================================== Relationships ID Summary ---------------------------------------------------------------------- related to 0001877 ISO editors Issue 8 comment 068 ======================================================================
---------------------------------------------------------------------- (0007130) geoffclare (manager) - 2025-03-20 15:57 https://www.austingroupbugs.net/view.php?id=1857#c7130 ---------------------------------------------------------------------- On page 179 line 6348 section 9.1, add a sentence: <blockquote>The matching process is described in [xref to 9.2].</blockquote> and move the remaining paragraphs of the "matching" definition to after page 181 line 6413 section 9.2. On page 179 line 6357 section 9.1, change: <blockquote>If the pattern permits a variable number of matching characters and thus there is more than one such sequence starting at that point, the longest such sequence is matched. For example, the BRE "bb*" matches the second to fourth characters of the string "abbbc", and the ERE "(wee|week)(knights|night)" matches all ten characters of the string "weeknights". Consistent with the whole match being the longest of the leftmost matches, each subpattern, from left to right, shall match the longest possible string.</blockquote> to: <blockquote>If the pattern permits a variable number of matching characters and thus there is more than one such sequence starting at that point, the match shall be made according to the following rules: 1. For a BRE, or an ERE that does not use the repetition modifier '?', the match shall be the leftmost longest. 2. If an ERE contains repetitions with and without the repetition modifier '?', the precedence between the repetitions shall be: a. Each leftmost shortest match shall match the leftmost shortest sequence in the string, in descending priority from left to right. b. Consistent with rule 2a, the length matched by the entire regular expression shall be the leftmost longest. c. Consistent with rules 2a and 2b, each leftmost longest match shall match the leftmost longest sequence in the string, in descending priority from left to right. d. If an attempt is made to match the same sequence of the string using repetitions both with and without the repetition modifier '?', the behavior is unspecified. For example, the ERE ([0-9]+)+? has unspecified behavior. According to these rules, the BRE "bb*" matches the second to fourth characters of the string "abbbc", and the ERE "(wee|week)(knights|night)" matches all ten characters of the string "weeknights". However, the ERE "(aaa??)*" matches only the first four characters of the string "aaaaa", not all five, because in order to match all five, "a??" would match with length one instead of zero; the ERE "(aaa??)*|(aaa?)*" matches all five because the longest match is one which does not use any minimal repetitions. Consistent with the match for the entire regular expression being made according to the above rules, each BRE or ERE in a concatenated set, from left to right, shall match according to the above rules, applied to that BRE or ERE.</blockquote> On page 180 line 6367 section 9.1, change: <blockquote>the subpattern "(.*?)" matches the empty string, since that is the longest possible match for the ERE ".*?"</blockquote> to: <blockquote>the subexpression "(.*?)" matches the empty string, since the minimal repetition ".*?" has priority and the empty string is the shortest possible match (zero length) for that repetition.</blockquote> On page 179 line 6370 section 9.1, change: <blockquote>the longest sequence shall be measured</blockquote> to: <blockquote>the sequence length shall be measured</blockquote> After page 191 line 6814 section 9.5, add a note: <blockquote><small><b>Note:</b>The grammar defines syntax only and places no requirements on implementations as to how the parsed BRE or ERE is used for matching. The matching process is described in [xref to 9.2].</small></blockquote> After XRAT page 3716 line 127617 section A.9.4.6, add a paragraph: <blockquote>Note that the repetition modifier '?' (<question-mark>) is specified as changing the matching behavior for the modified repetition from the leftmost longest possible match to the leftmost shortest possible match. This does not necessarily give the same result as matching with the least repetitions. For example, the ERE "([ab]{6}|a)*?b" matches the first five characters of the string "aaaabbbb" as this is the shortest for the minimal repetition "*?". Matching with the least repetitions would match the first seven characters by using one repetition of "[ab]{6}" instead of four repetitions of "a". This distinction is only possible because the alternatives in an ERE alternation are chosen according to which gives the longest (or shortest) match. Other types of regular expression exist (notably in <i>perl</i>, <i>php</i>, and <i>python</i>) where the alternatives are tried in order; for those there is no difference between longest and most repetitions or between shortest and least repetitions.</blockquote> Issue History Date Modified Username Field Change ====================================================================== 2024-09-14 12:54 dannyniu New Issue 2024-09-14 12:54 dannyniu Name => DannyNiu/NJF 2024-09-14 12:54 dannyniu Organization => Individual 2024-09-14 12:54 dannyniu Section => 9.1 Regular Expression Definitions # and others. 2024-09-14 12:54 dannyniu Page Number => 179-180 and others 2024-09-14 12:54 dannyniu Line Number => 6366-6368 and others. 2024-09-20 08:05 dannyniu Note Added: 0006879 2024-09-20 08:07 dannyniu Note Edited: 0006879 2024-09-20 08:13 dannyniu Note Edited: 0006879 2024-09-23 08:56 geoffclare Note Added: 0006880 2024-09-24 10:46 geoffclare Note Added: 0006881 2024-09-24 10:46 geoffclare Note Edited: 0006881 2024-09-24 11:54 dannyniu Note Added: 0006882 2024-09-24 12:08 dannyniu Note Edited: 0006882 2024-09-24 12:09 dannyniu Note Edited: 0006882 2024-09-24 12:11 dannyniu Note Edited: 0006882 2024-09-24 12:12 dannyniu Note Edited: 0006882 2024-09-24 14:04 geoffclare Note Added: 0006883 2024-09-25 08:28 dannyniu Note Added: 0006884 2024-09-25 08:30 dannyniu Note Edited: 0006884 2024-09-25 08:33 dannyniu Note Edited: 0006884 2024-09-25 08:42 dannyniu Note Edited: 0006884 2024-09-25 08:43 dannyniu Note Edited: 0006884 2024-09-25 11:36 dannyniu Note Edited: 0006884 2024-09-25 13:17 geoffclare Note Added: 0006885 2024-09-25 15:08 dannyniu Note Added: 0006886 2024-09-25 15:17 dannyniu Note Edited: 0006886 2024-09-25 15:23 dannyniu Note Edited: 0006886 2024-09-25 15:27 dannyniu Note Edited: 0006886 2024-09-25 22:10 steffen Note Added: 0006887 2024-09-25 22:33 steffen Note Added: 0006888 2024-09-25 22:36 steffen Note Added: 0006889 2024-09-26 04:02 dannyniu Note Edited: 0006886 2024-09-26 06:50 dannyniu Note Edited: 0006886 2024-09-26 08:41 geoffclare Note Added: 0006890 2024-09-26 11:43 dannyniu Note Added: 0006891 2024-09-26 11:50 dannyniu Note Edited: 0006891 2024-09-26 12:16 geoffclare Note Added: 0006892 2024-09-26 12:17 geoffclare Note Edited: 0006892 2024-09-26 13:27 geoffclare Note Edited: 0006881 2024-09-26 13:28 geoffclare Note Edited: 0006881 2024-09-26 13:30 geoffclare Note Edited: 0006892 2024-09-27 07:09 geoffclare Note Edited: 0006885 2024-09-27 11:34 dannyniu Note Added: 0006896 2024-09-27 11:37 dannyniu Note Edited: 0006896 2024-09-27 15:51 steffen Note Added: 0006897 2024-09-30 09:26 geoffclare Note Added: 0006898 2024-10-02 02:07 dannyniu Note Added: 0006899 2024-10-03 09:12 geoffclare Note Added: 0006900 2024-10-03 09:14 geoffclare Note Edited: 0006900 2024-10-03 09:15 geoffclare Note Edited: 0006900 2024-10-17 15:24 geoffclare Note Added: 0006919 2024-10-17 15:25 geoffclare Interp Status => Pending 2024-10-17 15:25 geoffclare Final Accepted Text => https://www.austingroupbugs.net/view.php?id=1857#c6919 2024-10-17 15:25 geoffclare Status New => Interpretation Required 2024-10-17 15:25 geoffclare Resolution Open => Accepted As Marked 2024-10-17 15:26 geoffclare Tag Attached: tc1-2024 2024-10-17 16:22 agadmin Interp Status Pending => Proposed 2024-10-17 16:22 agadmin Note Added: 0006920 2024-11-19 11:53 agadmin Interp Status Proposed => Approved 2024-11-19 11:53 agadmin Note Added: 0006963 2024-11-19 12:11 geoffclare Relationship added related to 0001877 2024-12-01 12:43 dannyniu Note Added: 0006979 2024-12-03 15:11 geoffclare Note Added: 0006982 2024-12-25 14:40 dannyniu Note Added: 0007033 2025-02-27 05:18 dannyniu Note Added: 0007087 2025-03-04 14:56 dannyniu Note Added: 0007090 2025-03-05 11:43 dannyniu Note Added: 0007091 2025-03-06 14:26 geoffclare Note Added: 0007094 2025-03-20 15:57 geoffclare Note Added: 0007130 ======================================================================
