On Fri, 1 Aug 2025 22:19:22 GMT, Justin Lu <[email protected]> wrote:
>> Naoto Sato has updated the pull request with a new target base due to a
>> merge or a rebase. The pull request now contains 16 commits:
>>
>> - Merge branch 'master' into JDK-8363972-Loose-matching-dash
>> - Spec update
>> - Supplementary/CanonEq tests
>> - flipped again, which was correct
>> - flipped the size check
>> - Address review comments
>> - Merge branch 'master' into JDK-8363972-Loose-matching-dash
>> - tidying up
>> - test location
>> - spec update
>> - ... and 6 more: https://git.openjdk.org/jdk/compare/8e921aee...3682484d
>
> src/java.base/share/classes/java/text/DecimalFormat.java line 3543:
>
>> 3541: var lmsp = Pattern.compile("[" + lms + "]",
>> Pattern.CANON_EQ);
>> 3542: var a = lmsp.matcher(affix).replaceAll("-");
>> 3543: var t = lmsp.matcher(text.substring(position,
>> Math.min(tlen, position + alen)))
>
> I don't think this works. If you see my other comment regarding the test case
> and swap it such that `lenientMinusSign` is "-\u00E4" and the pattern (affix)
> includes "a\u0308" and we parse some text such as "\u00E41.5".
>
> Then we have the following,
>
>
> var lmsp = Pattern.compile("[" + lms + "]", Pattern.CANON_EQ); // Returns
> pattern [-ä]
> var a = lmsp.matcher(affix).replaceAll("-"); // Gets correctly normalized to
> "-"
> // Incorrectly matches against "ä1" and normalized to "-1" since the
> substring returned is indexed from 0 to 2.
> var t = lmsp.matcher(text.substring(position, Math.min(tlen, position +
> alen)))
>
>
> That is, I think we need to substring after we have performed the
> normalization. Something such as,
>
>
> var lmsp = Pattern.compile("[" + lms + "]", Pattern.CANON_EQ);
> var a = lmsp.matcher(affix).replaceAll("-");
> var t = lmsp.matcher(text).replaceAll("-");
> return t.startsWith(a, position);
>
>
> However, we will still run into issues later in DecimalFormat, as the
> position is incremented by the original prefix length.
>
>
> } else if (gotNegative) {
> position += negativePrefix.length();
>
>
> So for "ä1.5" we would start parsing at position = 2, erroneously returning
> "0.5". So further thought may be needed.
The current implementations assume the prefix/suffix lengths in a lot of places
(no wonder), should be re-examined.
> test/jdk/java/text/Format/NumberFormat/LenientMinusSignTest.java line 128:
>
>> 126: .set(dfs, "-\u00E5");
>> 127: var df = new DecimalFormat("#.#;a\u0308#.#", dfs);
>> 128: assertEquals(df.parse("a\u03081.5"), -1.5);
>
> This one confuses me, should it not be parsing "\u00E51.5" such that the test
> can check if canonical equivalence occurs when matching the text "\u00E5" to
> the affix "a\u0308"? Otherwise, we are just parsing the exact pattern we set?
> Also I think it should be "\u00E4", not "\u00E5".
That's correct. I think the simple case is not that simple. Need to rethink.
-------------
PR Review Comment: https://git.openjdk.org/jdk/pull/26580#discussion_r2249005477
PR Review Comment: https://git.openjdk.org/jdk/pull/26580#discussion_r2249005036