Re: RFR: 8363972: Loose matching of dash/minusSign in number parsing [v7]

Justin Lu Tue, 05 Aug 2025 10:15:36 -0700

On Mon, 4 Aug 2025 23:57:28 GMT, Naoto Sato <[email protected]> wrote:


>> Enabling lenient minus sign matching when parsing numbers. In some locales, 
>> e.g. Finnish, the default minus sign is the Unicode "Minus Sign" (U+2212), 
>> which is not the "Hyphen Minus" (U+002D) that users type in from keyboard. 
>> Thus the parsing of user input numbers may fail. This change utilizes CLDR's 
>> `parseLenient` element for minus signs and loosely matches them with the 
>> hyphen-minus so that user input numbers can parse. As this is a behavioral 
>> change, a corresponding CSR has been drafted.
>
> Naoto Sato has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   refrects review comments

Latest changes look good to me. I think ignoring supplementary/normalization is 
fine and it would have been excessive otherwise. I left some more minor 
comments.

src/java.base/share/classes/java/text/CompactNumberFormat.java line 219:

> 217:  * </pre></blockquote>
> 218:  *
> 219:  * @implNote The implementation follows the LDML specification to enable 
> loose

Suggestion:

 * @implNote The JDK Reference Implementation follows the LDML specification to 
enable loose

src/java.base/share/classes/java/text/DecimalFormat.java line 421:

> 419:  * returns a numerically greater value.
> 420:  *
> 421:  * @implNote The implementation follows the LDML specification to enable 
> loose

Suggestion:

 * @implNote The JDK Reference Implementation follows the LDML specification to 
enable loose

src/java.base/share/classes/java/text/DecimalFormatSymbols.java line 719:

> 717: 
> 718:     /**
> 719:      * {@return the lenient minus signs} Multiple lenient minus signs

Do we have an idea of when a given locale would not have access to the lenient 
minus signs provided by parseLenient element? If the vast majority of times it 
will, then it is fine. Otherwise IMO, `getLenientMinusSigns()` should probably 
call out the fact that it may not always be a concatenation of multiple minus 
_signs_, and it may be a single minus _sign_ (as assigned by `minusSignText`). 
Since that detail is only apparent by digging around the code that assigns 
`lenientMinusSigns`.

src/java.base/share/classes/java/text/DecimalFormatSymbols.java line 720:

> 718:     /**
> 719:      * {@return the lenient minus signs} Multiple lenient minus signs
> 720:      * are concatenated to form the returned string.

The surrounding package private methods use since tags.

Suggestion:

     * are concatenated to form the returned string.
     * @since 26

src/java.base/share/classes/java/text/DecimalFormatSymbols.java line 852:

> 850: 
> 851:         // Lenient minus signs
> 852:         lenientMinusSigns = numberElements.length < 14 ? minusSignText : 
> numberElements[13];

BTW, if I remove this check and always assign to `numberElements[13]`, I do not 
observe any failures in the java_text/Format suite. It would be nice to have an 
idea of why this check is needed. (I understand it is following the same length 
checks of monetarySeparator and monetaryGroupingSeparator.)

-------------

PR Review: https://git.openjdk.org/jdk/pull/26580#pullrequestreview-3086137609
PR Review Comment: https://git.openjdk.org/jdk/pull/26580#discussion_r2252827372
PR Review Comment: https://git.openjdk.org/jdk/pull/26580#discussion_r2254807654
PR Review Comment: https://git.openjdk.org/jdk/pull/26580#discussion_r2254847469
PR Review Comment: https://git.openjdk.org/jdk/pull/26580#discussion_r2254792967
PR Review Comment: https://git.openjdk.org/jdk/pull/26580#discussion_r2254846860

Re: RFR: 8363972: Loose matching of dash/minusSign in number parsing [v7]

Reply via email to