Re: [External] : Re: [Text] Java 21 failure with a double

2023-10-23 Thread David Delabassee
Fyi, some background on this https://inside.java/2022/09/23/quality-heads-up/

--David


From: Alex Herbert 
Date: Friday, 20 October 2023 at 10:11
To: Commons Developers List 
Subject: [External] : Re: [Text] Java 21 failure with a double
I've implemented a fix for this in PR 467 [1]. Builds now pass on JDK 21.

This does not change the DoubleFormat class. It adds javadoc to state
that the formatting is dependent on Double.toString.

The tests have been updated to use a different range for the random
doubles so the tests across formats should be testing roughly the same
precision. They should not test the full precision output (i.e. 17
digits).

Note that this change identified another mismatch between the
reference DecimalFormat and the DoubleFormat when a zero trails the
decimal point. The DecimalFormat class can omit this trailing zero and
the decimal point:

new DecimalFormat("##0.0##E0").format(1.129e-4) => 113E-6(not 113.0E-6)

The inelegant solution I used was to remove .0 from the DecimalFormat
output and recheck the string is a match. This allows the above case
to pass. There are no other cases in the 1000 random numbers that hit
this mismatch case.

A new test has been added for full precision output. The test requires
that the output format can be parsed back to the same number, i.e. no
loss of precision occurs when formatting. This seemed simpler than
writing custom code to check against the digits output from
Double.toString to ensure all digits are present.

Alex

[1] 
https://urldefense.com/v3/__https://github.com/apache/commons-text/pull/467__;!!ACWV5N9M2RV99hQ!Iixt54NF3Y7ux56z7-3jmEH77wI5CGBHrnQV3dofFXTFXSiVzlaxerd2RkguErsQGajGkiZG7L2W89mgsDXNDz4TSVMT6A$<https://urldefense.com/v3/__https:/github.com/apache/commons-text/pull/467__;!!ACWV5N9M2RV99hQ!Iixt54NF3Y7ux56z7-3jmEH77wI5CGBHrnQV3dofFXTFXSiVzlaxerd2RkguErsQGajGkiZG7L2W89mgsDXNDz4TSVMT6A$>

On Mon, 16 Oct 2023 at 18:37, Alex Herbert  wrote:
>
> TLDR; The Double.toString(double) value is different for
> -9.354004711977437E17 on JDK 21 and earlier JDKs:
>
> JDK 21: -9.354004711977437E17
> JDK 17: -9.3540047119774374E17
>
> The DoubleFormat class is built upon Double.toString. So the test
> fails due to this change.
>
> ---
>
> More details:
>
> On JDK 21 Double.toString is dropping the last digit (a 4) as it is
> not required to uniquely represent the double from the next double up
> or down. Note the javadoc for toString states: "This decimal is
> (almost always) the shortest one that rounds to m according to the
> round to nearest rounding policy of IEEE 754 floating-point
> arithmetic." So this is not the closest decimal representation of the
> double, just the closest required for rounding to the double.
>
> I do not think this is a bug. But we should document that the
> DoubleFormat class is built upon the Double.toString representation of
> a double. This string may not be the closest decimal representation of
> the double. Thus final digit errors can be observed when using the
> entire character output of Double.toString compared to other
> formatting classes such as java.lang.DecimalFormat or the numerical
> string representation provided by java.lang.BigDecimal (see examples
> below).
>
> Q. How to fix the test?
>
> The DoubleFormatTest is using 1000 random double values with a base-2
> exponent of -100 to 100. Then testing against a plain text format of
> 0.00##. The exponent of 9.35e17 is 59. It is so large there are no
> digits after the decimal point. So the test is checking if
> DoubleFormat is accurate to 17 decimal digits of precision, which it
> is not in this case. The other tests using the random numbers test the
> formats:
>
> #,##0.0##
> 0.0##E0
> ##0.0##E0
>
> So the scientific and engineering tests are only checking 4 and 6
> decimal digits. But the plain formats are checking up to 17 digits due
> to the large exponent of the random numbers. This does not seem very
> fair.
>
> Fix 1: A simple fix would be to reduce the exponent range for random
> numbers so that the plain numbers should always have at least 4 digits
> after the decimal point.
>
> However, changing the test like this would reduce its power. It would
> not have alerted us to this failure.
>
> Fix 2: An alternative would be to change the test assertion to perform
> the current check, and if it fails perform a test that all digits from
> Double.toString are present in the formatted string in the same order
> (i.e. the class has used the entire output from Double.toString and so
> is at its limit). This is non-trivial as the double value is tested in
> all the installed locales which may change the digits and other
> formatting characters. The assertion would have to create a reverse
> parsing method based on the loc

Re: [Text] Java 21 failure with a double

2023-10-20 Thread Gary Gregory
Thank you Alex!

The PR is now merged.

Gary

On Fri, Oct 20, 2023, 4:09 AM Alex Herbert  wrote:

> I've implemented a fix for this in PR 467 [1]. Builds now pass on JDK 21.
>
> This does not change the DoubleFormat class. It adds javadoc to state
> that the formatting is dependent on Double.toString.
>
> The tests have been updated to use a different range for the random
> doubles so the tests across formats should be testing roughly the same
> precision. They should not test the full precision output (i.e. 17
> digits).
>
> Note that this change identified another mismatch between the
> reference DecimalFormat and the DoubleFormat when a zero trails the
> decimal point. The DecimalFormat class can omit this trailing zero and
> the decimal point:
>
> new DecimalFormat("##0.0##E0").format(1.129e-4) => 113E-6(not
> 113.0E-6)
>
> The inelegant solution I used was to remove .0 from the DecimalFormat
> output and recheck the string is a match. This allows the above case
> to pass. There are no other cases in the 1000 random numbers that hit
> this mismatch case.
>
> A new test has been added for full precision output. The test requires
> that the output format can be parsed back to the same number, i.e. no
> loss of precision occurs when formatting. This seemed simpler than
> writing custom code to check against the digits output from
> Double.toString to ensure all digits are present.
>
> Alex
>
> [1] https://github.com/apache/commons-text/pull/467
>
> On Mon, 16 Oct 2023 at 18:37, Alex Herbert 
> wrote:
> >
> > TLDR; The Double.toString(double) value is different for
> > -9.354004711977437E17 on JDK 21 and earlier JDKs:
> >
> > JDK 21: -9.354004711977437E17
> > JDK 17: -9.3540047119774374E17
> >
> > The DoubleFormat class is built upon Double.toString. So the test
> > fails due to this change.
> >
> > ---
> >
> > More details:
> >
> > On JDK 21 Double.toString is dropping the last digit (a 4) as it is
> > not required to uniquely represent the double from the next double up
> > or down. Note the javadoc for toString states: "This decimal is
> > (almost always) the shortest one that rounds to m according to the
> > round to nearest rounding policy of IEEE 754 floating-point
> > arithmetic." So this is not the closest decimal representation of the
> > double, just the closest required for rounding to the double.
> >
> > I do not think this is a bug. But we should document that the
> > DoubleFormat class is built upon the Double.toString representation of
> > a double. This string may not be the closest decimal representation of
> > the double. Thus final digit errors can be observed when using the
> > entire character output of Double.toString compared to other
> > formatting classes such as java.lang.DecimalFormat or the numerical
> > string representation provided by java.lang.BigDecimal (see examples
> > below).
> >
> > Q. How to fix the test?
> >
> > The DoubleFormatTest is using 1000 random double values with a base-2
> > exponent of -100 to 100. Then testing against a plain text format of
> > 0.00##. The exponent of 9.35e17 is 59. It is so large there are no
> > digits after the decimal point. So the test is checking if
> > DoubleFormat is accurate to 17 decimal digits of precision, which it
> > is not in this case. The other tests using the random numbers test the
> > formats:
> >
> > #,##0.0##
> > 0.0##E0
> > ##0.0##E0
> >
> > So the scientific and engineering tests are only checking 4 and 6
> > decimal digits. But the plain formats are checking up to 17 digits due
> > to the large exponent of the random numbers. This does not seem very
> > fair.
> >
> > Fix 1: A simple fix would be to reduce the exponent range for random
> > numbers so that the plain numbers should always have at least 4 digits
> > after the decimal point.
> >
> > However, changing the test like this would reduce its power. It would
> > not have alerted us to this failure.
> >
> > Fix 2: An alternative would be to change the test assertion to perform
> > the current check, and if it fails perform a test that all digits from
> > Double.toString are present in the formatted string in the same order
> > (i.e. the class has used the entire output from Double.toString and so
> > is at its limit). This is non-trivial as the double value is tested in
> > all the installed locales which may change the digits and other
> > formatting characters. The assertion would have to create a reverse
> > parsing method based on the locale.
> >
> > Note: I did try using the DecimalFormat used to specify formatting the
> > expected string to parse the string with DecimalFormat.parse. It
> > worked on the original root locale but it failed on locale "he"; this
> > may be another bug in DoubleFormat since the locale can parse its own
> > output:
> >
> > jshell> var df = new java.text.DecimalFormat("0.0##",
> > java.text.DecimalFormatSymbols.getInstance(Locale.forLanguageTag("he")))
> > jshell> x
> > x ==> -9.3540047119774374E17
> > jshell> 

Re: [Text] Java 21 failure with a double

2023-10-20 Thread Alex Herbert
I've implemented a fix for this in PR 467 [1]. Builds now pass on JDK 21.

This does not change the DoubleFormat class. It adds javadoc to state
that the formatting is dependent on Double.toString.

The tests have been updated to use a different range for the random
doubles so the tests across formats should be testing roughly the same
precision. They should not test the full precision output (i.e. 17
digits).

Note that this change identified another mismatch between the
reference DecimalFormat and the DoubleFormat when a zero trails the
decimal point. The DecimalFormat class can omit this trailing zero and
the decimal point:

new DecimalFormat("##0.0##E0").format(1.129e-4) => 113E-6(not 113.0E-6)

The inelegant solution I used was to remove .0 from the DecimalFormat
output and recheck the string is a match. This allows the above case
to pass. There are no other cases in the 1000 random numbers that hit
this mismatch case.

A new test has been added for full precision output. The test requires
that the output format can be parsed back to the same number, i.e. no
loss of precision occurs when formatting. This seemed simpler than
writing custom code to check against the digits output from
Double.toString to ensure all digits are present.

Alex

[1] https://github.com/apache/commons-text/pull/467

On Mon, 16 Oct 2023 at 18:37, Alex Herbert  wrote:
>
> TLDR; The Double.toString(double) value is different for
> -9.354004711977437E17 on JDK 21 and earlier JDKs:
>
> JDK 21: -9.354004711977437E17
> JDK 17: -9.3540047119774374E17
>
> The DoubleFormat class is built upon Double.toString. So the test
> fails due to this change.
>
> ---
>
> More details:
>
> On JDK 21 Double.toString is dropping the last digit (a 4) as it is
> not required to uniquely represent the double from the next double up
> or down. Note the javadoc for toString states: "This decimal is
> (almost always) the shortest one that rounds to m according to the
> round to nearest rounding policy of IEEE 754 floating-point
> arithmetic." So this is not the closest decimal representation of the
> double, just the closest required for rounding to the double.
>
> I do not think this is a bug. But we should document that the
> DoubleFormat class is built upon the Double.toString representation of
> a double. This string may not be the closest decimal representation of
> the double. Thus final digit errors can be observed when using the
> entire character output of Double.toString compared to other
> formatting classes such as java.lang.DecimalFormat or the numerical
> string representation provided by java.lang.BigDecimal (see examples
> below).
>
> Q. How to fix the test?
>
> The DoubleFormatTest is using 1000 random double values with a base-2
> exponent of -100 to 100. Then testing against a plain text format of
> 0.00##. The exponent of 9.35e17 is 59. It is so large there are no
> digits after the decimal point. So the test is checking if
> DoubleFormat is accurate to 17 decimal digits of precision, which it
> is not in this case. The other tests using the random numbers test the
> formats:
>
> #,##0.0##
> 0.0##E0
> ##0.0##E0
>
> So the scientific and engineering tests are only checking 4 and 6
> decimal digits. But the plain formats are checking up to 17 digits due
> to the large exponent of the random numbers. This does not seem very
> fair.
>
> Fix 1: A simple fix would be to reduce the exponent range for random
> numbers so that the plain numbers should always have at least 4 digits
> after the decimal point.
>
> However, changing the test like this would reduce its power. It would
> not have alerted us to this failure.
>
> Fix 2: An alternative would be to change the test assertion to perform
> the current check, and if it fails perform a test that all digits from
> Double.toString are present in the formatted string in the same order
> (i.e. the class has used the entire output from Double.toString and so
> is at its limit). This is non-trivial as the double value is tested in
> all the installed locales which may change the digits and other
> formatting characters. The assertion would have to create a reverse
> parsing method based on the locale.
>
> Note: I did try using the DecimalFormat used to specify formatting the
> expected string to parse the string with DecimalFormat.parse. It
> worked on the original root locale but it failed on locale "he"; this
> may be another bug in DoubleFormat since the locale can parse its own
> output:
>
> jshell> var df = new java.text.DecimalFormat("0.0##",
> java.text.DecimalFormatSymbols.getInstance(Locale.forLanguageTag("he")))
> jshell> x
> x ==> -9.3540047119774374E17
> jshell> df.parse(df.format(x)).doubleValue()
> $32 ==> -9.3540047119774374E17
>
> Fix 3: Implement fix 1, plus add a test for full length precision
> using only the root locale. This can quickly test the output is at the
> limit by checking the string creates the original input double using
> Double.parseDouble, i.e. the 

Re: [Text] Java 21 failure with a double

2023-10-16 Thread Alex Herbert
TLDR; The Double.toString(double) value is different for
-9.354004711977437E17 on JDK 21 and earlier JDKs:

JDK 21: -9.354004711977437E17
JDK 17: -9.3540047119774374E17

The DoubleFormat class is built upon Double.toString. So the test
fails due to this change.

---

More details:

On JDK 21 Double.toString is dropping the last digit (a 4) as it is
not required to uniquely represent the double from the next double up
or down. Note the javadoc for toString states: "This decimal is
(almost always) the shortest one that rounds to m according to the
round to nearest rounding policy of IEEE 754 floating-point
arithmetic." So this is not the closest decimal representation of the
double, just the closest required for rounding to the double.

I do not think this is a bug. But we should document that the
DoubleFormat class is built upon the Double.toString representation of
a double. This string may not be the closest decimal representation of
the double. Thus final digit errors can be observed when using the
entire character output of Double.toString compared to other
formatting classes such as java.lang.DecimalFormat or the numerical
string representation provided by java.lang.BigDecimal (see examples
below).

Q. How to fix the test?

The DoubleFormatTest is using 1000 random double values with a base-2
exponent of -100 to 100. Then testing against a plain text format of
0.00##. The exponent of 9.35e17 is 59. It is so large there are no
digits after the decimal point. So the test is checking if
DoubleFormat is accurate to 17 decimal digits of precision, which it
is not in this case. The other tests using the random numbers test the
formats:

#,##0.0##
0.0##E0
##0.0##E0

So the scientific and engineering tests are only checking 4 and 6
decimal digits. But the plain formats are checking up to 17 digits due
to the large exponent of the random numbers. This does not seem very
fair.

Fix 1: A simple fix would be to reduce the exponent range for random
numbers so that the plain numbers should always have at least 4 digits
after the decimal point.

However, changing the test like this would reduce its power. It would
not have alerted us to this failure.

Fix 2: An alternative would be to change the test assertion to perform
the current check, and if it fails perform a test that all digits from
Double.toString are present in the formatted string in the same order
(i.e. the class has used the entire output from Double.toString and so
is at its limit). This is non-trivial as the double value is tested in
all the installed locales which may change the digits and other
formatting characters. The assertion would have to create a reverse
parsing method based on the locale.

Note: I did try using the DecimalFormat used to specify formatting the
expected string to parse the string with DecimalFormat.parse. It
worked on the original root locale but it failed on locale "he"; this
may be another bug in DoubleFormat since the locale can parse its own
output:

jshell> var df = new java.text.DecimalFormat("0.0##",
java.text.DecimalFormatSymbols.getInstance(Locale.forLanguageTag("he")))
jshell> x
x ==> -9.3540047119774374E17
jshell> df.parse(df.format(x)).doubleValue()
$32 ==> -9.3540047119774374E17

Fix 3: Implement fix 1, plus add a test for full length precision
using only the root locale. This can quickly test the output is at the
limit by checking the string creates the original input double using
Double.parseDouble, i.e. the DoubleFormat has captured the entire
information required to uniquely recreate the original double.

Alex

---

Here is the failure:

[ERROR] Failures:
[ERROR]   
DoubleFormatTest.testPlain_localeFormatComparison:491->checkLocalizedFormats:127->checkLocalizedFormat:121->assertLocalized
FormatsAreEqual:40 Unexpected output for locale [und] and double value
-9.354004711977437E17 ==> expected: <-935400471197743740.0> bu
t was: <-935400471197743700.0>

The Double.toString value is the minimum string required to uniquely
identify the double value; this is the string required to round trip
the number to a string and back via Double.parseDouble(String).
However, it may not be the closest decimal representation of the
value.

In the failed test the double value is -9.354004711977437E17 which can
be interpreted differently. I obtained the value that failed using
Double.doubleToLongBits. This has different representations depending
on how it is output:

jshell
|  Welcome to JShell -- Version 17.0.6
|  For an introduction type: /help intro

jshell> double x = Double.longBitsToDouble(-4347673023486037259L)
x ==> -9.3540047119774374E17

jshell> new BigDecimal(x)
$2 ==> -935400471197743744

// Using the DecimalFormat from the failing test:
jshell> new java.text.DecimalFormat("0.0##",
java.text.DecimalFormatSymbols.getInstance(Locale.ROOT)).format(x)
$3 ==> "-935400471197743740.0"

jshell> Math.nextUp(x)
$4 ==> -9.3540047119774362E17

jshell> Math.nextDown(x)
$5 ==> -9.3540047119774387E17


Note the difference for JDK 21:


Re: [Text] Java 21 failure with a double

2023-10-15 Thread Romain Manni-Bucau
You can always test the java version and adjust the asserts but, without
being a [text] expert, I assume the toString usage will create more
troubles than helping so can need to be dropped or totally embraced in the
main impl.
Another option is to request if the JDK dev want to fix it since it as a
lot of side effects - thinking to API - so it can be something they come
back on.

Romain Manni-Bucau
@rmannibucau  |  Blog
 | Old Blog
 | Github  |
LinkedIn  | Book



Le dim. 15 oct. 2023 à 21:42, Gary Gregory  a
écrit :

> Good find Roman, that suspiciously looks like the culprit and causes a
> mismatch with the tests use of DecimalFormatter.
>
> Do you see a way to make the test take this into account?
>
> Ty,
> Gary
>
>
> On Sun, Oct 15, 2023, 2:13 PM Romain Manni-Bucau 
> wrote:
>
> > Hi Gary,
> >
> > Can't it be
> >
> >
> https://github.com/openjdk/jdk/commit/fb605944b5b734c8b47a9122e7ab3d3dcf55f71e
> > ?
> >
> > Long story short, commons-text uses Double.toString which changed due to
> > double (resp float) precision so it impacts tests.
> >
> > Romain Manni-Bucau
> > @rmannibucau  |  Blog
> >  | Old Blog
> >  | Github <
> > https://github.com/rmannibucau> |
> > LinkedIn  | Book
> > <
> >
> https://www.packtpub.com/application-development/java-ee-8-high-performance
> > >
> >
> >
> > Le dim. 15 oct. 2023 à 19:57, Gary Gregory  a
> > écrit :
> >
> > > Hi all,
> > >
> > > Can someone explain the 1 test failure on Java 21 in DoubleFornatTest?
> A
> > > one digit difference?
> > >
> > > Gary
> > >
> >
>


Re: [Text] Java 21 failure with a double

2023-10-15 Thread Gary Gregory
Good find Roman, that suspiciously looks like the culprit and causes a
mismatch with the tests use of DecimalFormatter.

Do you see a way to make the test take this into account?

Ty,
Gary


On Sun, Oct 15, 2023, 2:13 PM Romain Manni-Bucau 
wrote:

> Hi Gary,
>
> Can't it be
>
> https://github.com/openjdk/jdk/commit/fb605944b5b734c8b47a9122e7ab3d3dcf55f71e
> ?
>
> Long story short, commons-text uses Double.toString which changed due to
> double (resp float) precision so it impacts tests.
>
> Romain Manni-Bucau
> @rmannibucau  |  Blog
>  | Old Blog
>  | Github <
> https://github.com/rmannibucau> |
> LinkedIn  | Book
> <
> https://www.packtpub.com/application-development/java-ee-8-high-performance
> >
>
>
> Le dim. 15 oct. 2023 à 19:57, Gary Gregory  a
> écrit :
>
> > Hi all,
> >
> > Can someone explain the 1 test failure on Java 21 in DoubleFornatTest? A
> > one digit difference?
> >
> > Gary
> >
>


Re: [Text] Java 21 failure with a double

2023-10-15 Thread Romain Manni-Bucau
Hi Gary,

Can't it be
https://github.com/openjdk/jdk/commit/fb605944b5b734c8b47a9122e7ab3d3dcf55f71e
?

Long story short, commons-text uses Double.toString which changed due to
double (resp float) precision so it impacts tests.

Romain Manni-Bucau
@rmannibucau  |  Blog
 | Old Blog
 | Github  |
LinkedIn  | Book



Le dim. 15 oct. 2023 à 19:57, Gary Gregory  a
écrit :

> Hi all,
>
> Can someone explain the 1 test failure on Java 21 in DoubleFornatTest? A
> one digit difference?
>
> Gary
>


[Text] Java 21 failure with a double

2023-10-15 Thread Gary Gregory
Hi all,

Can someone explain the 1 test failure on Java 21 in DoubleFornatTest? A
one digit difference?

Gary