[GitHub] [iceberg] szehon-ho commented on a change in pull request #3273: Add File for Avro files throws PreconditionException

GitBox Tue, 19 Oct 2021 11:08:41 -0700


szehon-ho commented on a change in pull request #3273:
URL: https://github.com/apache/iceberg/pull/3273#discussion_r732122829




##########
File path: core/src/main/java/org/apache/iceberg/DataFiles.java
##########
@@ -285,7 +285,12 @@ public DataFile build() {
       }
       Preconditions.checkArgument(format != null, "File format is required");
       Preconditions.checkArgument(fileSizeInBytes >= 0, "File size is 
required");
-      Preconditions.checkArgument(recordCount >= 0, "Record count is 
required");
+      Preconditions.checkArgument(recordCount != null, "Record count is 
required");
+      // MetricsEvaluator skips using other metrics, if record count is -1
+      Preconditions.checkArgument(recordCount >= 0 ||
+              (recordCount == -1 && valueCounts == null && columnSizes == null 
&& nanValueCounts == null &&
+                      lowerBounds == null && upperBounds == null),
+          "Metrics cannot be set if record count is -1.");

Review comment:
       I took @rdblue suggestion and made an attempt to use the AvroIO method 
to get the row count, which internally just visits each block once.  Potential 
follow up could be making this (and even the Parquet/ORC footer reading) into 
distributed Spark jobs.  Added test.
   
   Need to rebase following the spark directory refactor




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] szehon-ho commented on a change in pull request #3273: Add File for Avro files throws PreconditionException

Reply via email to