On Mon, 14 Jul 2025 07:54:31 GMT, Xueming Shen <sher...@openjdk.org> wrote:
>> Regex class should conform to **_Level 1_** of [Unicode Technical Standard >> #18: Unicode Regular Expressions](http://www.unicode.org/reports/tr18/), >> plus RL2.1 Canonical Equivalents and RL2.2 Extended Grapheme Clusters. >> >> This PR primarily addresses conformance with RL1.5: Simple Loose Matches, >> which requires that simple case folding be applied to literals and >> (optionally) to character classes. When applied to character classes, each >> class is expected to be closed under simple case folding. See the standard >> for a detailed explanation of what it means for a class to be “closed.” >> >> To conform with Level 1 of UTS #18, specifically RL1.5: Simple Loose >> Matches, simple case folding must be applied to literals and (optionally) to >> character classes. When applied to character classes, each character class >> is expected to **be closed under simple case folding**. See the standard >> for the detailed explanation and example of "closed". >> >> **RL1.5 states**: >> >> To meet this requirement, an implementation that supports case-sensitive >> matching should >> >> 1. Provide at least the simple, default Unicode case-insensitive >> matching, and >> 2. Specify which character properties or constructs are closed under the >> matching. >> >> **In the Pattern implementation**, 5 types of constructs may be affected by >> case sensitivity: >> >> 1. back-refs >> 2. string slices (sequences) >> 3. single character, >> 4. character families (Unicode Properties ...), and >> 5. character class ranges >> >> **Note**: Single characters and families may appear independently or within >> a character class. >> >> For case-insensitive (loose) matching, the implementation already applies >> Character.toUpperCase() and Character.toLowerCase() to **both the pattern >> and the input string** for back-refs, slices, and single characters. This >> effectively makes these constructs closed under case folding. >> >> This has been verified in the newly added test case >> **_test/jdk/java/util/regex/CaseFoldingTest.java_**. >> >> For example: >> >> Pattern.compile("(?ui)\u017f").matcher("S").matches(). => true >> Pattern.compile("(?ui)[\u017f]").matcher("S").matches() => true >> >> The character properties (families) are not "closed" and should remain >> unchanged. This is acceptable per RL1.5, if the behavior is clearly >> specified (TBD: update javadoc to reflect this). >> >> **Current Non-Conformance: Character Class Ranges**, as reported in the >> original bug report. >> >> Pattern.compile("(?ui)[\u017f-\u... > > Xueming Shen has updated the pull request incrementally with one additional > commit since the last revision: > > update to address the review comments Looks good. Thanks for adding case folding support which is long overdue 🙂 Since this is adding a new support for casefolding for character class ranges, I think CSR and a release note should be considered. make/jdk/src/classes/build/tools/generatecharacter/CaseFolding.java line 73: > 71: StandardOpenOption.CREATE, > StandardOpenOption.TRUNCATE_EXISTING); > 72: } > 73: } Needs a NL at the end test/jdk/java/util/regex/CaseFoldingTest.java line 30: > 28: * @library /lib/testlibrary/java/lang > 29: * @author Xueming Shen > 30: * @run testng CaseFoldingTest Since this is a new test, I think we prefer junit over testng test/jdk/java/util/regex/CaseFoldingTest.java line 61: > 59: > 60: var results = Files.readAllLines(UCDFiles.CASEFOLDING) > 61: .stream() Files.lines() may be more concise test/jdk/lib/testlibrary/java/lang/UCDFiles.java line 59: > 57: UCD_DIR.resolve("emoji").resolve("emoji-data.txt"); > 58: public static Path CASEFOLDING = > 59: UCD_DIR.resolve("CaseFolding.txt"); Copyright year -> 2025 ------------- PR Review: https://git.openjdk.org/jdk/pull/26285#pullrequestreview-3017279774 PR Review Comment: https://git.openjdk.org/jdk/pull/26285#discussion_r2205510750 PR Review Comment: https://git.openjdk.org/jdk/pull/26285#discussion_r2205508784 PR Review Comment: https://git.openjdk.org/jdk/pull/26285#discussion_r2205517080 PR Review Comment: https://git.openjdk.org/jdk/pull/26285#discussion_r2205521609