kbendick commented on a change in pull request #3273:
URL: https://github.com/apache/iceberg/pull/3273#discussion_r731156079
##########
File path: core/src/main/java/org/apache/iceberg/DataFiles.java
##########
@@ -285,7 +285,12 @@ public DataFile build() {
}
Preconditions.checkArgument(format != null, "File format is required");
Preconditions.checkArgument(fileSizeInBytes >= 0, "File size is
required");
- Preconditions.checkArgument(recordCount >= 0, "Record count is
required");
+ Preconditions.checkArgument(recordCount != null, "Record count is
required");
+ // MetricsEvaluator skips using other metrics, if record count is -1
+ Preconditions.checkArgument(recordCount >= 0 ||
+ (recordCount == -1 && valueCounts == null && columnSizes == null
&& nanValueCounts == null &&
+ lowerBounds == null && upperBounds == null),
+ "Metrics cannot be set if record count is -1.");
Review comment:
If we leave this (even temporarily), should we change the exception
message to mention something more in the range of `Metrics shouldn't be set
when the record count is -1. -1 is only valid for files which haven't been read
yet` or something?
I don't mind the -1 for now as it's consistent with some current behavior,
but I feel like users (or at least framework developers) should be made aware
this is not a relatively normal circumstance and that this means the file
hasn't been read.
Ideally, we remove the -1 as soon as possible, but I can't think of any
valid scenarios presently where we have metrics but not record count.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]