Re: Comparing Floating Point numbers

Mike Beckerle Mon, 25 Sep 2023 06:35:23 -0700

The base 10 problem is very real. We face this in cybersecurity systems
where people want a full data rip and rebuild according to spec, yet want
100% the exact same bits out as went in. This isn't always possible with
base10 floating point representations. Some information is lost converting
from base 2 to base 10 and back.  (Just 0.1 has this problem.)


We are probably going to need to introduce our own floating point
representation which represents the mantissa of floating point numbers as
an exact integer. E.g, 123456h78 where the "h" denotes that the 123456 is
NOT a number to be raised to an exponent, but that the mantissa of the
floating point number is, viewed as a 52 (for double) bit integer. The
challenge is you can't do anything with such a number, so we would need to
have this as the raw representation, but also exhibit, in the infoset, a
regular base 10 double float.

On Mon, Sep 25, 2023 at 8:58 AM McGann, Mike <mmcg...@owlcyberdefense.com>
wrote:

> One thing to note is that by putting a value in a TDML document such as
> "12.34e56" it is actually a string. Comparing that to a floating-point
> value is going to require a conversion from string and that could invoke a
> rounding step if it cannot be accurately represented by a float. If you
> really want to compare two floating points exactly, using a binary
> representation is probably the best such as putting in something like
> 0x1234p56. At least that is how I think I understand it. Floating point
> math is a deep rabbit hole that can be followed. That is probably overkill
> for TDML.
>
> // Mike
>
> -----Original Message-----
> From: Steve Lawrence <slawre...@apache.org>
> Sent: Monday, September 25, 2023 08:07
> To: dev@daffodil.apache.org
> Subject: Re: Comparing Floating Point numbers
>
> +1 for type aware comparisons. It should be a very small change to this
> function:
>
>
> https://github.com/apache/daffodil/blob/main/daffodil-lib/src/main/scala/org/apache/daffodil/lib/xml/XMLUtils.scala#L1098
>
> And just need to add xsi:type to a few expected infosets that are
> sensitive to the issue.
>
> Note that I *think* this might be the bug that caused the change:
>
> https://bugs.openjdk.org/browse/JDK-4511638
>
> Based on that, it sounds like the issue is that Java wasn't creating the
> shortest possible decimal representation, but the representation it did
> create still parses back to the same floating point representation. So
> we *probably* don't even really need epsilon comparison, we just need
> type aware comparison, and can still expect the floating point values to
> be exactly the same.
>
> Although epsilon comparison is the right way to compare floats, my
> concern is that we might add some bug in Daffodil where we do math wrong
> and end up with a very very very slightly wrong answer and it would be
> hidden. But if our epsilon is small enough, maybe that amount precision
> error is fine?
>
> Note that according to that JDK issue, the change was made in Java 19,
> so if we add any conditional logic on java version, we should check if
> it's at least 19. I guess if we do need epsilon comparisons we could
> only do it for java 19 and newer. Older versions would expect exact
> values and so would catch any off by very very small amount bugs. That
> might be adding unnecessary complication though.
>
>
> On 2023-09-24 12:09 PM, Mike Beckerle wrote:
> > So Java 21 produces different floating point values in a few cases. Some
> of
> > our tests (4) are sensitive to this.
> >
> > The "right way" to compare floating point numbers is like this:
> >
> > If(Math.abs(A - B) < epsilon)
> >
> > The TDML runner has outstanding bug
> > https://issues.apache.org/jira/browse/DAFFODIL-2402 which is to add the
> > ability to put xsi:type="double" for example on the expected infoset, and
> > this instructructs the (schema unaware) TDML runner to do comparison
> using
> > some sort of epsilon comparison like the above
> >
> > Does that seem like the right fix for this?
> >
> > The only alternative I can think of is some sort of conditional infoset
> > construction, so that the expected values can vary for different JVMs.
> >
> > On Sat, Sep 23, 2023 at 2:13 PM Mike Beckerle <mbecke...@apache.org>
> wrote:
> >
> >>
> >> JVM 21 LTS is now out.
> >>
> >> So I decided to try to building Daffodil using it. My WIP PR is
> >> https://github.com/apache/daffodil/pull/1090
> >>
> >> It looks pretty close.
> >>
> >> The --release 8 option for javac is now deprecated. So I conditionalized
> >> that.
> >> Fixed some deprecated calls.
> >>
> >> Remaining issues:
> >>
> >> 2 more deprecated calls (hence fatal warnings turned off for now)
> >>
> >> 5 tests fail. One each in these 3 test classes
> >>
> >> org.apache.daffodil.TresysTests.test_BG000
> >>
> >>
> org.apache.daffodil.section13.text_number_props.TestTextNumberProps.test_textNumberPattern_exponent01
> >>
> >>
> >>
> org.apache.daffodil.section05.simple_types.TestSimpleTypes.test_double_binary_06
> >>
> >> All 3 of those failures are floating point related like this:
> (highlighted
> >> digit isn't output any more)
> >>
> >> [error] Expected (attributes stripped)
> >> [error]    <d_02>9.8765432109876544E16</d_02>
> >> [error] Actual (attributes ignored for diff)
> >> [error] <ex:d_02>9.876543210987654E16</ex:d_02>
> >>
> >> The Expected has one more digit 4 at the end.
> >>
> >> 1 other test failure is for reasons unknown. Possible change in regex
> >> behavior?
> >>
> >>
> >>
> org.apache.daffodil.io.layers.TestJavaIOStreams.testBase64ScanningForDelimiter1
> >>
> >> One CLI test failure:
> >>
> >>
> >>
> org.apache.daffodil.cli.cliTest.TestEXIEncodeDecode.test_CLI_Encode_Decode_EXI
> >>
> >>
> >>
> >
>
>

Re: Comparing Floating Point numbers

Reply via email to