[GitHub] [iceberg] rdblue commented on a change in pull request #1221: ISSUE-1220: add option to disable manifest reading during estimateSta…

GitBox Mon, 20 Jul 2020 14:50:04 -0700


rdblue commented on a change in pull request #1221:
URL: https://github.com/apache/iceberg/pull/1221#discussion_r457711539




##########
File path: spark2/src/main/java/org/apache/iceberg/spark/source/Reader.java
##########
@@ -276,6 +280,9 @@ public void pruneColumns(StructType newRequestedSchema) {
 
   @Override
   public Statistics estimateStatistics() {
+    if(disableEstimateStatistics) {
+      return new Stats(Long.MAX_VALUE, Long.MAX_VALUE);
+    }

Review comment:
       I just had an idea for an alternative solution to this. What about 
detecting that there are no filters and instead returning a value based on the 
[`total-records`](https://github.com/apache/iceberg/blob/master/core/src/main/java/org/apache/iceberg/SnapshotSummary.java#L37)
 value in snapshot metadata?
   
   Usually, estimating stats based on the number of rows and a guess for the 
size of a row is much better than using the actual size anyway. So if you can 
get the number of rows and come up with an estimate for the size of each row 
based on the table schema, then you wouldn't need to disable stats at all.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] rdblue commented on a change in pull request #1221: ISSUE-1220: add option to disable manifest reading during estimateSta…

Reply via email to