jackye1995 commented on code in PR #9717:
URL: https://github.com/apache/iceberg/pull/9717#discussion_r1496277335
##########
open-api/rest-catalog-open-api.yaml:
##########
@@ -3324,6 +3348,184 @@ components:
type: integer
format: int64
+ BooleanTypeValue:
+ type: boolean
+
+ IntegerTypeValue:
+ type: integer
+
+ LongTypeValue:
+ type: integer
+ format: int64
+
+ FloatTypeValue:
Review Comment:
I did some experiments, I think the current SingleValueParser implementation
has some problems.
The Jackson writeNumber(float/double) method just writes the string form of
the number by calling
[Float/Double.toString()](https://docs.oracle.com/javase/8/docs/api/java/lang/Double.html#toString-double-)
method internally. The Java doc linked provides an explanation of how the
string conversion is done.
The conversion fundamentally have 2 problems, (1) data that is too big or
too small (outside 10^-3 to 10^7 range) is written in scientific notation, and
the JSON representation will be a string but not number. For example, `10^20`
is written as `"1.0E20"`. (2) the result is lossy, because it is the nearest
approximation to the true value. This is not serializing the float/double to
the exact decimal representation as we discussed above.
For example, there is a very small chance that value `3.0` is sometimes
serialized as `2.99999999999999`, and when deserialized back it is probably
still the 3.0 double value, but sometimes it will be just 2.99999999999999.
This becomes a correctness issue for use cases like row-level filtering, where
user can define a filter against a double like `a < 3` and that can produce
unexpected result. We actually saw this exact issue in the past in
LakeFormation row-level filtering with Athena, so I suggest us be very cautious
here.
In general, I think achieving the true decimal representation will be
actually more spacial and computationally intensive than just storing the
binary representation. We can easily store the binary form of a double by
[Double.doubleToRawLongBits](https://docs.oracle.com/javase/8/docs/api/java/lang/Double.html#doubleToRawLongBits-double-)
to store a long value in the serialized form, and deserialize it back using
the reverse `longBitsToDouble` method. I think we should consider using this
approach.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]