jackye1995 commented on a change in pull request #1747:
URL: https://github.com/apache/iceberg/pull/1747#discussion_r528437555
##########
File path: api/src/main/java/org/apache/iceberg/expressions/NaNUtils.java
##########
@@ -0,0 +1,40 @@
+/*
Review comment:
nit: looks like iceberg util classes use `XxxUtil` instead of utils. And
there is a `org.apache.iceberg.util` package path for utils in the api module.
##########
File path:
api/src/main/java/org/apache/iceberg/expressions/StrictMetricsEvaluator.java
##########
@@ -144,6 +146,35 @@ public Boolean or(Boolean leftResult, Boolean rightResult)
{
return ROWS_MIGHT_NOT_MATCH;
}
+ @Override
+ @SuppressWarnings("checkstyle:CyclomaticComplexity")
+ public <T> Boolean isNaN(BoundReference<T> ref) {
+ int id = ref.fieldId();
+
+ if (nanCounts != null && nanCounts.containsKey(id) &&
Review comment:
can refactor with `containsNaNOnly` as discussed above, and the
checkstyle issue should not show up anymore.
##########
File path:
api/src/main/java/org/apache/iceberg/expressions/ManifestEvaluator.java
##########
@@ -142,6 +142,25 @@ public Boolean or(Boolean leftResult, Boolean rightResult)
{
return ROWS_MIGHT_MATCH;
}
+ @Override
+ public <T> Boolean isNaN(BoundReference<T> ref) {
+ int pos = Accessors.toPosition(ref.accessor());
+ // containsNull encodes whether at least one partition value is null,
lowerBound is null if
+ // all partition values are null.
+ ByteBuffer lowerBound = stats.get(pos).lowerBound();
+ if (lowerBound == null) {
Review comment:
Seems like #1803 is missing `PartitionFieldSummary.containsNaN()`, or is
it in some other PR?
##########
File path:
api/src/main/java/org/apache/iceberg/expressions/ExpressionVisitors.java
##########
@@ -75,6 +75,14 @@ public R or(R leftResult, R rightResult) {
return null;
}
+ public <T> R isNaN(BoundReference<T> ref) {
+ throw new UnsupportedOperationException(this.getClass().getName() + "
does not implement isNaN");
Review comment:
why not be consistent with `isNull` and `notNull` and return null?
##########
File path:
api/src/main/java/org/apache/iceberg/expressions/InclusiveMetricsEvaluator.java
##########
@@ -150,6 +152,53 @@ public Boolean or(Boolean leftResult, Boolean rightResult)
{
return ROWS_MIGHT_MATCH;
}
+ @Override
+ public <T> Boolean isNaN(BoundReference<T> ref) {
+ Integer id = ref.fieldId();
+
+ if (nanCounts != null && nanCounts.containsKey(id) && nanCounts.get(id)
== 0) {
+ return ROWS_CANNOT_MATCH;
+ }
+
+ // when there's no nanCounts information, but we already know the column
only contains null,
+ // it's guaranteed that there's no NaN value
+ if (containsNullsOnly(id)) {
Review comment:
I agree that the `containsNaNsOnly` logic will not be very useful as Yan
said, but I think it is also valuable to have that private method just for
readability.
Then the question reduces to: do we need to consider the case that null
value metrics do not exist but NaN metrics do. For now I think the answer is
no, because in all metrics modes NaN and null counters either both exist or
both not exist.
##########
File path: api/src/main/java/org/apache/iceberg/expressions/Expressions.java
##########
@@ -123,6 +123,22 @@ public static Expression not(Expression child) {
return new UnboundPredicate<>(Expression.Operation.NOT_NULL, expr);
}
+ public static <T> UnboundPredicate<T> isNaN(String name) {
+ return new UnboundPredicate<>(Expression.Operation.IS_NAN, ref(name));
+ }
+
+ public static <T> UnboundPredicate<T> isNaN(UnboundTerm<T> expr) {
+ return new UnboundPredicate<>(Expression.Operation.IS_NAN, expr);
+ }
+
+ public static <T> UnboundPredicate<T> notNaN(String name) {
+ return new UnboundPredicate<>(Expression.Operation.NOT_NAN, ref(name));
+ }
+
+ public static <T> UnboundPredicate<T> notNaN(UnboundTerm<T> expr) {
+ return new UnboundPredicate<>(Expression.Operation.NOT_NAN, expr);
+ }
Review comment:
I do not fully understand what you mean by "rewrite logic of `or(isNaN,
in)`/`and(notNaN, notIn)`" when you talk about rewriting `in`. Can you give
some examples of what predicate are you trying to support?
##########
File path:
api/src/main/java/org/apache/iceberg/expressions/StrictMetricsEvaluator.java
##########
@@ -118,7 +120,7 @@ public Boolean or(Boolean leftResult, Boolean rightResult) {
public <T> Boolean isNull(BoundReference<T> ref) {
// no need to check whether the field is required because binding
evaluates that case
// if the column has any non-null values, the expression does not match
- Integer id = ref.fieldId();
+ int id = ref.fieldId();
Review comment:
good catch
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]