rdblue commented on a change in pull request #3332:
URL: https://github.com/apache/iceberg/pull/3332#discussion_r737653511
##########
File path: orc/src/main/java/org/apache/iceberg/orc/OrcMetrics.java
##########
@@ -196,11 +195,18 @@ private static Metrics buildOrcMetrics(final long
numOfRows, final TypeDescripti
min = Math.toIntExact((long) min);
}
} else if (columnStats instanceof DoubleColumnStatistics) {
- // since Orc includes NaN for upper/lower bounds of floating point
columns, and we don't want this behavior,
- // we have tracked metrics for such columns ourselves and thus do not
need to rely on Orc's column statistics.
- Preconditions.checkNotNull(fieldMetrics,
- "[BUG] Float or double type columns should have metrics being
tracked by Iceberg Orc writers");
- min = fieldMetrics.lowerBound();
+ if (fieldMetrics != null) {
+ // since Orc includes NaN for upper/lower bounds of floating point
columns, and we don't want this behavior,
+ // we have tracked metrics for such columns ourselves and thus do not
need to rely on Orc's column statistics.
+ min = fieldMetrics.lowerBound();
+ } else {
+ // imported files will not have metrics that were tracked by Iceberg,
so fall back to the file's metrics.
+ min = ((DoubleColumnStatistics) columnStats).getMinimum();
+ if (type.typeId() == Type.TypeID.FLOAT) {
+ float orcMin = ((Double) min).floatValue();
+ min = Float.isFinite(orcMin) ? orcMin : Float.NEGATIVE_INFINITY;
Review comment:
I don't think that `isFinite` is right. The min could be +Infinity if
all values are +Infinity. If that were the case, then this would make the stats
span all of [-Infinity, +Infinity] rather than just [+Infinity, +Infinity].
Instead, this should use `Float.isNaN`:
```java
min = Float.isNaN(orcMin) ? Float.NEGATIVE_INFINITY : orcMin;
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]