A NOTE has been added to this issue. ====================================================================== https://austingroupbugs.net/view.php?id=1857 ====================================================================== Reported By: dannyniu Assigned To: ====================================================================== Project: 1003.1(2024)/Issue8 Issue ID: 1857 Category: Base Definitions and Headers Type: Error Severity: Objection Priority: normal Status: New Name: DannyNiu/NJF Organization: Individual User Reference: Section: 9.1 Regular Expression Definitions # and others. Page Number: 179-180 and others Line Number: 6366-6368 and others. Interp Status: --- Final Accepted Text: ====================================================================== Date Submitted: 2024-09-14 12:54 UTC Last Modified: 2024-09-24 10:46 UTC ====================================================================== Summary: Several problems with the new "lazy" regex quantifier. ======================================================================
---------------------------------------------------------------------- (0006881) geoffclare (manager) - 2024-09-24 10:46 https://austingroupbugs.net/view.php?id=1857#c6881 ---------------------------------------------------------------------- Suggested changes ... On page 179 line 6348 section 9.1, add a sentence:<blockquote>The matching process is described in [xref to 9.2].</blockquote> and move the remaining paragraphs of the "matching" definition to after page 181 line 6413 section 9.2. On page 179 line 6357 section 9.1, change:<blockquote>If the pattern permits a variable number of matching characters and thus there is more than one such sequence starting at that point, the longest such sequence is matched. For example, the BRE "bb*" matches the second to fourth characters of the string "abbbc", and the ERE "(wee|week)(knights|night)" matches all ten characters of the string "weeknights". Consistent with the whole match being the longest of the leftmost matches, each subpattern, from left to right, shall match the longest possible string.</blockquote>to:<blockquote>If the pattern permits a variable number of matching characters and thus there is more than one such sequence starting at that point, the matched sequence shall be the longest such sequence for which any minimal repetitions (see [xref to 9.4.6]) used in the match have the shortest possible match. For example, the BRE "bb*" matches the second to fourth characters of the string "abbbc", and the ERE "(wee|week)(knights|night)" matches all ten characters of the string "weeknights". However, the ERE "(aaa??)*" matches only the first four characters of the string "aaaaa", not all five, because in order to match all five, "a??" would match with length one instead of zero; the ERE "(aaa??)*|(aaa?)*" matches all five because the longest match is one which does not use any minimal repetitions. Consistent with the match for the entire regular expression being the leftmost and longest for which any minimal repetitions used in the match have the shortest possible match, each BRE or ERE in a concatenated set, from left to right, shall match the longest possible string for which any minimal repetitions used in the match for that BRE or ERE have the shortest possible match.</blockquote> On page 180 line 6367 section 9.1, change:<blockquote>the subpattern "(.*?)" matches the empty string, since that is the longest possible match for the ERE ".*?"</blockquote>to:<blockquote>the subexpression "(.*?)" matches the empty string, since that is the longest possible match for which the minimal repetition ".*?" has the shortest possible match (zero length).</blockquote> On page 179 line 6370 section 9.1, change:<blockquote>the longest sequence shall be measured</blockquote>to:<blockquote>the sequence length shall be measured</blockquote> After page 191 line 6814 section 9.5, add a note:<blockquote><small><b>Note:</b>The grammar defines syntax only and places no requirements on implementations as to how the parsed BRE or ERE is used for matching. The matching process is described in [xref to 9.2].</small></blockquote> After XRAT page 3716 line 127617 section A.9.4.6, add a paragraph:<blockquote>Note that the repetition modifier '?' (<question-mark>) is specified as changing the matching behavior for the modified repetition from the leftmost longest possible match to the leftmost shortest possible match. This does not necessarily give the same result as matching with the least repetitions. For example, the ERE "([ab]{6}|a)*?b" matches the first five characters of the string "aaaabbbb" as this is the shortest for the minimal repetition "*?". Matching with the least repetitions would match the first seven characters by using one repetition of "[ab]{6}" instead of four repetitions of "a".</blockquote> Issue History Date Modified Username Field Change ====================================================================== 2024-09-14 12:54 dannyniu New Issue 2024-09-14 12:54 dannyniu Name => DannyNiu/NJF 2024-09-14 12:54 dannyniu Organization => Individual 2024-09-14 12:54 dannyniu Section => 9.1 Regular Expression Definitions # and others. 2024-09-14 12:54 dannyniu Page Number => 179-180 and others 2024-09-14 12:54 dannyniu Line Number => 6366-6368 and others. 2024-09-20 08:05 dannyniu Note Added: 0006879 2024-09-20 08:07 dannyniu Note Edited: 0006879 2024-09-20 08:13 dannyniu Note Edited: 0006879 2024-09-23 08:56 geoffclare Note Added: 0006880 2024-09-24 10:46 geoffclare Note Added: 0006881 ======================================================================