Re: Comparing Floating Point numbers

Dave Fisher Mon, 25 Sep 2023 07:44:02 -0700

As someone who did Fortran for over 30 using code libraries some written 60 
years ago. The epsilon approach is baked in.


I think there are methods in some Java test frameworks where you can do an 
Assert that will automatically handle epsilon internally.

Best,
Dave

Sent from my iPhone

> On Sep 25, 2023, at 7:10 AM, Sood, Harinder <hs...@owlcyberdefense.com> wrote:
> 
> Floating point comparison is typically done with +- delta as Mike suggested.
> 
> 
> Sincerely,
> Harinder Sood
> 
> 
> Senior Program Mnager
>  hs...@owlcyberdefense.com
>  240 805 4219
>  owlcyberdefense.com
> 
> The information contained in this transmission is for the personal and 
> confidential use of the individual or entity to which it is addressed. If the 
> reader is not the intended recipient, you are hereby notified that any 
> review, dissemination, or copying of this communication is strictly 
> prohibited. If you have received this transmission in error, please notify 
> the sender immediately.
> 
> -----Original Message-----
> From: Steve Lawrence <slawre...@apache.org> 
> Sent: Monday, September 25, 2023 10:01 AM
> To: dev@daffodil.apache.org
> Subject: Re: Comparing Floating Point numbers
> 
> Based on what I've learned today from reading that bug, I think in Java at 
> least, float -> string -> float will give you back the exact same float. Note 
> that string -> float -> string could give a different result, since multiple 
> strings can map to the same float, but a float maps to only a single string.
> 
> Java doesn't actually round when converting a float to a string. Instead it 
> finds the shortest string representation that can be uniquely mapped back to 
> the original float. The bug I linked below is about how Java didn't find the 
> shortest. For example, "1.0E23" and "9.999999999999999E22" both map to the 
> exact same float. Older versions of Java output the latter, newer version of 
> Java output the former.
> 
> But even though it didn't output the shortest, and are different decimal 
> representations, both of those strings uniquely map back to the exact same 
> float.
> 
> So although users must be aware that float values in the infoset might not be 
> accurate to the original value, we can be guaranteed that if an infoset 
> string is converted to a Java float, it will at least be the exact same float 
> value that Daffodil wrote to the infoset.
> 
> And so for the purposes of TDML comparisons, it should all be exactly the 
> same.
> 
> For these reasons, I don't actually think creating a special raw 
> representation actually fixes anything. The decimal string already maps back 
> to the original float exactly. The real issues with floats is when you start 
> doing math, or the original data was a string. That's when you start losing 
> precision and get funky results. I don't there are any actual issues with the 
> infoset conversion itself.
> 
>> On 2023-09-25 08:57 AM, McGann, Mike wrote:
>> One thing to note is that by putting a value in a TDML document such as 
>> "12.34e56" it is actually a string. Comparing that to a floating-point value 
>> is going to require a conversion from string and that could invoke a 
>> rounding step if it cannot be accurately represented by a float. If you 
>> really want to compare two floating points exactly, using a binary 
>> representation is probably the best such as putting in something like 
>> 0x1234p56. At least that is how I think I understand it. Floating point math 
>> is a deep rabbit hole that can be followed. That is probably overkill for 
>> TDML.
>> 
>> // Mike
>> 
>> -----Original Message-----
>> From: Steve Lawrence <slawre...@apache.org>
>> Sent: Monday, September 25, 2023 08:07
>> To: dev@daffodil.apache.org
>> Subject: Re: Comparing Floating Point numbers
>> 
>> +1 for type aware comparisons. It should be a very small change to 
>> +this
>> function:
>> 
>> https://github.com/apache/daffodil/blob/main/daffodil-lib/src/main/sca
>> la/org/apache/daffodil/lib/xml/XMLUtils.scala#L1098
>> 
>> And just need to add xsi:type to a few expected infosets that are 
>> sensitive to the issue.
>> 
>> Note that I *think* this might be the bug that caused the change:
>> 
>> https://bugs.openjdk.org/browse/JDK-4511638
>> 
>> Based on that, it sounds like the issue is that Java wasn't creating 
>> the shortest possible decimal representation, but the representation 
>> it did create still parses back to the same floating point 
>> representation. So we *probably* don't even really need epsilon 
>> comparison, we just need type aware comparison, and can still expect 
>> the floating point values to be exactly the same.
>> 
>> Although epsilon comparison is the right way to compare floats, my 
>> concern is that we might add some bug in Daffodil where we do math 
>> wrong and end up with a very very very slightly wrong answer and it 
>> would be hidden. But if our epsilon is small enough, maybe that amount 
>> precision error is fine?
>> 
>> Note that according to that JDK issue, the change was made in Java 19, 
>> so if we add any conditional logic on java version, we should check if 
>> it's at least 19. I guess if we do need epsilon comparisons we could 
>> only do it for java 19 and newer. Older versions would expect exact 
>> values and so would catch any off by very very small amount bugs. That 
>> might be adding unnecessary complication though.
>> 
>> 
>>> On 2023-09-24 12:09 PM, Mike Beckerle wrote:
>>> So Java 21 produces different floating point values in a few cases. 
>>> Some of our tests (4) are sensitive to this.
>>> 
>>> The "right way" to compare floating point numbers is like this:
>>> 
>>> If(Math.abs(A - B) < epsilon)
>>> 
>>> The TDML runner has outstanding bug
>>> https://issues.apache.org/jira/browse/DAFFODIL-2402 which is to add 
>>> the ability to put xsi:type="double" for example on the expected 
>>> infoset, and this instructructs the (schema unaware) TDML runner to 
>>> do comparison using some sort of epsilon comparison like the above
>>> 
>>> Does that seem like the right fix for this?
>>> 
>>> The only alternative I can think of is some sort of conditional 
>>> infoset construction, so that the expected values can vary for different 
>>> JVMs.
>>> 
>>>> On Sat, Sep 23, 2023 at 2:13 PM Mike Beckerle <mbecke...@apache.org> wrote:
>>> 
>>>> 
>>>> JVM 21 LTS is now out.
>>>> 
>>>> So I decided to try to building Daffodil using it. My WIP PR is
>>>> https://github.com/apache/daffodil/pull/1090
>>>> 
>>>> It looks pretty close.
>>>> 
>>>> The --release 8 option for javac is now deprecated. So I 
>>>> conditionalized that.
>>>> Fixed some deprecated calls.
>>>> 
>>>> Remaining issues:
>>>> 
>>>> 2 more deprecated calls (hence fatal warnings turned off for now)
>>>> 
>>>> 5 tests fail. One each in these 3 test classes
>>>> 
>>>> org.apache.daffodil.TresysTests.test_BG000
>>>> 
>>>> org.apache.daffodil.section13.text_number_props.TestTextNumberProps.
>>>> test_textNumberPattern_exponent01
>>>> 
>>>> 
>>>> org.apache.daffodil.section05.simple_types.TestSimpleTypes.test_doub
>>>> le_binary_06
>>>> 
>>>> All 3 of those failures are floating point related like this: 
>>>> (highlighted digit isn't output any more)
>>>> 
>>>> [error] Expected (attributes stripped)
>>>> [error]    <d_02>9.8765432109876544E16</d_02>
>>>> [error] Actual (attributes ignored for diff) [error] 
>>>> <ex:d_02>9.876543210987654E16</ex:d_02>
>>>> 
>>>> The Expected has one more digit 4 at the end.
>>>> 
>>>> 1 other test failure is for reasons unknown. Possible change in 
>>>> regex behavior?
>>>> 
>>>> 
>>>> org.apache.daffodil.io.layers.TestJavaIOStreams.testBase64ScanningFo
>>>> rDelimiter1
>>>> 
>>>> One CLI test failure:
>>>> 
>>>> 
>>>> org.apache.daffodil.cli.cliTest.TestEXIEncodeDecode.test_CLI_Encode_
>>>> Decode_EXI
>>>> 
>>>> 
>>>> 
>>> 
>> 
>

Re: Comparing Floating Point numbers

Reply via email to