Let me summarize.

(1) Based on SL's observations, if you start from data containing binary
floating point numbers, round trip will be fine. We do not need an epsilon
approach, as converting the base 10 back to binary for comparison will
generate equal results. So all we need is xsi:type="double" so that we know
to do a floating point, not a string comparison. (This corresponds to
defaulting epsilon to 0.0d0 for such a test.)

(2) But if you start doing math, then the epsilon approach starts to
matter.  An example: Lat/lon are typically stored as fixed-point integers
in many formats. Converting those to floating point degrees in the infoset
involves math, and that math may work differently on different JVMs so the
epsilon approach makes sense. I would claim we not only need
xsi:type="double" (or xsi:type="float"), but also a test needs to be able
to specify an optional epsilon value.

(3) Lastly, some data is textual, and data will start out as base 10
textual floating point numbers. In that case, they aren't guaranteed to
round trip, might be different for different JVMs, so here the epsilon
approach is also needed.

I would suggest we not burden test writers with having to invent these
epsilon numbers. Rather there should be a few pre-canned settings such as
"zero" intended for use with binary data (Case (1)), "epsilon1" means a
very small epsilon used for each of double and float comparison (properly
chosen for each of those) intended to be as small as possible, while
tolerating JVM variations.  and  if we find it needed (but not at first) we
can add "epsilonCustom" which allows you to specify an epsilon value.

On Mon, Sep 25, 2023 at 10:45 AM McGann, Mike <mmcg...@owlcyberdefense.com>
wrote:

> I think two different topics came into this thread. One is "doing the
> math", and the other is "round-tripping".
>
> When doing the math, 1.1 + 2.2 = 3.300000003 or something similar in
> floating point. There it is fine to round or epsilon difference to check
> that it is 3.3. Trying to check more than that is where things start to get
> difficult. If for some reason the next JDK version or some obscure
> floating-point library makes that 3.3000000033 that should be fine, and I
> don't think there should be separate tests to check for those differences.
>
> For round tripping where a real value might be stored as BCD digits or
> something like that, converting into and out of an IEEE float might produce
> different values. In that case, avoiding the IEEE float conversion might be
> desirable, or using decimal types, or using integers. If you have to
> convert to IEEE float in this case, you may want to be able to check that
> is converts into the float as you expect and then the binary representation
> may make sense.
>
> // Mike
>
> -----Original Message-----
> From: Steve Lawrence <slawre...@apache.org>
> Sent: Monday, September 25, 2023 10:01
> To: dev@daffodil.apache.org
> Subject: Re: Comparing Floating Point numbers
>
> Based on what I've learned today from reading that bug, I think in Java at
> least, float -> string -> float will give you back the exact same float.
> Note that string -> float -> string could give a different result, since
> multiple strings can map to the same float, but a float maps to only a
> single string.
>
> Java doesn't actually round when converting a float to a string. Instead
> it finds the shortest string representation that can be uniquely mapped
> back to the original float. The bug I linked below is about how Java didn't
> find the shortest. For example, "1.0E23" and "9.999999999999999E22" both
> map to the exact same float. Older versions of Java output the latter,
> newer version of Java output the former.
>
> But even though it didn't output the shortest, and are different decimal
> representations, both of those strings uniquely map back to the exact same
> float.
>
> So although users must be aware that float values in the infoset might not
> be accurate to the original value, we can be guaranteed that if an infoset
> string is converted to a Java float, it will at least be the exact same
> float value that Daffodil wrote to the infoset.
>
> And so for the purposes of TDML comparisons, it should all be exactly the
> same.
>
> For these reasons, I don't actually think creating a special raw
> representation actually fixes anything. The decimal string already maps
> back to the original float exactly. The real issues with floats is when you
> start doing math, or the original data was a string. That's when you start
> losing precision and get funky results. I don't there are any actual issues
> with the infoset conversion itself.
>
> On 2023-09-25 08:57 AM, McGann, Mike wrote:
> > One thing to note is that by putting a value in a TDML document such as
> "12.34e56" it is actually a string. Comparing that to a floating-point
> value is going to require a conversion from string and that could invoke a
> rounding step if it cannot be accurately represented by a float. If you
> really want to compare two floating points exactly, using a binary
> representation is probably the best such as putting in something like
> 0x1234p56. At least that is how I think I understand it. Floating point
> math is a deep rabbit hole that can be followed. That is probably overkill
> for TDML.
> >
> > // Mike
> >
> > -----Original Message-----
> > From: Steve Lawrence <slawre...@apache.org>
> > Sent: Monday, September 25, 2023 08:07
> > To: dev@daffodil.apache.org
> > Subject: Re: Comparing Floating Point numbers
> >
> > +1 for type aware comparisons. It should be a very small change to
> > +this
> > function:
> >
> > https://github.com/apache/daffodil/blob/main/daffodil-lib/src/main/sca
> > la/org/apache/daffodil/lib/xml/XMLUtils.scala#L1098
> >
> > And just need to add xsi:type to a few expected infosets that are
> > sensitive to the issue.
> >
> > Note that I *think* this might be the bug that caused the change:
> >
> > https://bugs.openjdk.org/browse/JDK-4511638
> >
> > Based on that, it sounds like the issue is that Java wasn't creating
> > the shortest possible decimal representation, but the representation
> > it did create still parses back to the same floating point
> > representation. So we *probably* don't even really need epsilon
> > comparison, we just need type aware comparison, and can still expect
> > the floating point values to be exactly the same.
> >
> > Although epsilon comparison is the right way to compare floats, my
> > concern is that we might add some bug in Daffodil where we do math
> > wrong and end up with a very very very slightly wrong answer and it
> > would be hidden. But if our epsilon is small enough, maybe that amount
> > precision error is fine?
> >
> > Note that according to that JDK issue, the change was made in Java 19,
> > so if we add any conditional logic on java version, we should check if
> > it's at least 19. I guess if we do need epsilon comparisons we could
> > only do it for java 19 and newer. Older versions would expect exact
> > values and so would catch any off by very very small amount bugs. That
> > might be adding unnecessary complication though.
> >
> >
> > On 2023-09-24 12:09 PM, Mike Beckerle wrote:
> >> So Java 21 produces different floating point values in a few cases.
> >> Some of our tests (4) are sensitive to this.
> >>
> >> The "right way" to compare floating point numbers is like this:
> >>
> >> If(Math.abs(A - B) < epsilon)
> >>
> >> The TDML runner has outstanding bug
> >> https://issues.apache.org/jira/browse/DAFFODIL-2402 which is to add
> >> the ability to put xsi:type="double" for example on the expected
> >> infoset, and this instructructs the (schema unaware) TDML runner to
> >> do comparison using some sort of epsilon comparison like the above
> >>
> >> Does that seem like the right fix for this?
> >>
> >> The only alternative I can think of is some sort of conditional
> >> infoset construction, so that the expected values can vary for
> different JVMs.
> >>
> >> On Sat, Sep 23, 2023 at 2:13 PM Mike Beckerle <mbecke...@apache.org>
> wrote:
> >>
> >>>
> >>> JVM 21 LTS is now out.
> >>>
> >>> So I decided to try to building Daffodil using it. My WIP PR is
> >>> https://github.com/apache/daffodil/pull/1090
> >>>
> >>> It looks pretty close.
> >>>
> >>> The --release 8 option for javac is now deprecated. So I
> >>> conditionalized that.
> >>> Fixed some deprecated calls.
> >>>
> >>> Remaining issues:
> >>>
> >>> 2 more deprecated calls (hence fatal warnings turned off for now)
> >>>
> >>> 5 tests fail. One each in these 3 test classes
> >>>
> >>> org.apache.daffodil.TresysTests.test_BG000
> >>>
> >>> org.apache.daffodil.section13.text_number_props.TestTextNumberProps.
> >>> test_textNumberPattern_exponent01
> >>>
> >>>
> >>> org.apache.daffodil.section05.simple_types.TestSimpleTypes.test_doub
> >>> le_binary_06
> >>>
> >>> All 3 of those failures are floating point related like this:
> >>> (highlighted digit isn't output any more)
> >>>
> >>> [error] Expected (attributes stripped)
> >>> [error]    <d_02>9.8765432109876544E16</d_02>
> >>> [error] Actual (attributes ignored for diff) [error]
> >>> <ex:d_02>9.876543210987654E16</ex:d_02>
> >>>
> >>> The Expected has one more digit 4 at the end.
> >>>
> >>> 1 other test failure is for reasons unknown. Possible change in
> >>> regex behavior?
> >>>
> >>>
> >>> org.apache.daffodil.io.layers.TestJavaIOStreams.testBase64ScanningFo
> >>> rDelimiter1
> >>>
> >>> One CLI test failure:
> >>>
> >>>
> >>> org.apache.daffodil.cli.cliTest.TestEXIEncodeDecode.test_CLI_Encode_
> >>> Decode_EXI
> >>>
> >>>
> >>>
> >>
> >
>
>

Reply via email to