jeesou commented on code in PR #11040:
URL: https://github.com/apache/iceberg/pull/11040#discussion_r1861546270
##########
api/src/main/java/org/apache/iceberg/Table.java:
##########
@@ -373,4 +374,14 @@ default Snapshot snapshot(String name) {
return null;
}
+
+ /**
+ * Returns the statistics file for the given snapshot id, if available.
+ *
+ * @return the {@link StatisticsFile} for the given snapshot id, if
available.
+ */
+ default Optional<StatisticsFile> statistics(long snapshotId) {
Review Comment:
yes @amogh-jahagirdar your suggestion is perfect, considering a generic
solution where we support multiple bolb types. The current implementation is
considering that we will only support the "apache-datasketches-theta-v1".
We recently faced this when we were dealing with presto, considering both
engines were using a common catalog, and hence the puffin file created by
presto was not use-able as it was of a different blob type
"presto-sum-data-size-bytes-v1". This change would be a more of a futuristic
change which we may take up.
Regarding the best effort search of stats @amogh-jahagirdar, I thing we need
to reconsider if we want to have some statistics always, because that would
depend on the amount of data added or deleted after the last time we ran and
Analyze. Because stale statistics could lead to wrong query plans. And what if
we let the user configure how much deviation or change is the user fine with to
continue using the older statistics. For the same I had made some changes so
that the user may decide the amount of change
https://github.com/karuppayya/iceberg/compare/fix_snapshot...jeesou:fix_snapshot_modifications?expand=1.
Kindly have a look at it @amogh-jahagirdar and @karuppayya and share your
suggestions please.
I have not considered the delete scenario, if i find any deletion happening
I am not using old stats, but that can be up to discussion as delete is a
tricky subject in this case.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]