rdblue commented on a change in pull request #1221:
URL: https://github.com/apache/iceberg/pull/1221#discussion_r457711539
##########
File path: spark2/src/main/java/org/apache/iceberg/spark/source/Reader.java
##########
@@ -276,6 +280,9 @@ public void pruneColumns(StructType newRequestedSchema) {
@Override
public Statistics estimateStatistics() {
+ if(disableEstimateStatistics) {
+ return new Stats(Long.MAX_VALUE, Long.MAX_VALUE);
+ }
Review comment:
I just had an idea for an alternative solution to this. What about
detecting that there are no filters and instead returning a value based on the
[`total-records`](https://github.com/apache/iceberg/blob/master/core/src/main/java/org/apache/iceberg/SnapshotSummary.java#L37)
value in snapshot metadata?
Usually, estimating stats based on the number of rows and a guess for the
size of a row is much better than using the actual size anyway. So if you can
get the number of rows and come up with an estimate for the size of each row
based on the table schema, then you wouldn't need to disable stats at all.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]