On Mon, 1 Jun 2026 21:34:11 GMT, Naoto Sato <[email protected]> wrote:
> Updated `ListFormat` custom pattern parsing to avoid regex matching. The > implementation now locates placeholders and delimiters with literal string > operations. > > Added coverage for invalid long custom patterns and for regex metacharacters > in custom patterns, which should be treated as literals. > > --------- > - [x] I confirm that I make this contribution in accordance with the [OpenJDK > Interim AI Policy](https://openjdk.org/legal/ai). src/java.base/share/classes/java/text/ListFormat.java line 114: > 112: private static final int THREE = 4; > 113: private static final int PATTERN_ARRAY_LENGTH = THREE + 1; > 114: private static final int PLACEHOLDER_LENGTH = 3; Nit: Would not be opposed to `PLACEHOLDER_TOKEN_LENGTH` here. To me, it makes it clearer that this is the length of the placeholder token text, not an arbitrary placeholder value. src/java.base/share/classes/java/text/ListFormat.java line 475: > 473: } > 474: > 475: if (startPattern[1] <= endPattern[0]) { For readability, keeping the `startEnd` and `endStart` vars might be preferred to calling `startPattern[1]` and `endPattern[0]` directly. src/java.base/share/classes/java/text/ListFormat.java line 581: > 579: var sb = new StringBuilder(256).append(patterns[START]); > 580: IntStream.range(2, count - 1).forEach(i -> > sb.append(middleBetween).append("{").append(i).append("}")); > 581: sb.append(patterns[END].replaceFirst("\\{0}", > "").replaceFirst("\\{1}", "\\{" + (count - 1) + "\\}")); Since we now compute and store `endBetween` and `endAfter` in `init()`, can we reuse them here instead of using regex to strip out the placeholders? i.e., sb.append(endBetween) .append("{").append(count - 1).append("}") .append(endAfter); src/java.base/share/classes/java/text/ListFormat.java line 668: > 666: * @param pattern pattern string to parse > 667: */ > 668: private static int[] findPlaceholders(String pattern) { Extra placeholder token behavior looks a little spotty (and was before as well, I believe). For example, given a `START` pattern, a duplicate "{1}" placeholder gets rejected because of the `placeholderPositions[1] + PLACEHOLDER_LENGTH == pattern.length()` check in `init()`. However, a duplicate "{0}" goes right through, for example "{0}, {0} {1}". I don't see any test cases or specification that hints at what to expect for that. Is this worth standardizing or documenting in a separate issue? test/jdk/java/text/Format/ListFormat/TestListFormat.java line 273: > 271: .getMessage(); > 272: > 273: var expected = "%s is incorrect:".formatted( I think it would be cleaner to parameterize the error messages with the corresponding `index` as an `Argument`, rather than calculating it in the test case. test/jdk/java/text/Format/ListFormat/TestListFormat.java line 282: > 280: default -> throw new RuntimeException("invalid index"); > 281: }); > 282: assertEquals(expected, msg.substring(0, Math.min(msg.length(), > expected.length()))); The `Math.min` tripped me up here. It looks to me that `expected` is always going to be shorter than `msg`, since `msg` includes the faulty pattern and `expected` is just the base error message. Should we just check that `msg` starts with `expected` using `startsWith` instead? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/31343#discussion_r3339087762 PR Review Comment: https://git.openjdk.org/jdk/pull/31343#discussion_r3339227040 PR Review Comment: https://git.openjdk.org/jdk/pull/31343#discussion_r3339061666 PR Review Comment: https://git.openjdk.org/jdk/pull/31343#discussion_r3339186316 PR Review Comment: https://git.openjdk.org/jdk/pull/31343#discussion_r3338897312 PR Review Comment: https://git.openjdk.org/jdk/pull/31343#discussion_r3338929985
