ajantha-bhat commented on PR #12629:
URL: https://github.com/apache/iceberg/pull/12629#issuecomment-2858894944
@pvary, @gaborkaszab, @deniskuzZ: Today I spent some time about how the
incremental stats should be used by end users.
**By default, it should be incremental compute (but the incremental compute
should do full compute if the table has no partition stats available
previously). Other failure cases Incremental stats can throw error.
User will also have an option to force the full compute.**
`CALL catalog_name.system. compute_partition_stats('db.sample'); ` -- does
incremental compute if previous stats exist, else full compute.
`CALL catalog_name.system. compute_partition_stats(table => 'db.sample',
full_compute => true); ` -- full compute (doesn't care about previous stats)
Do we have any difference of opinion of this? I didn't jump into code to
avoid back and forth. The thing we should keep in mind that users should not
have to change their script for first time full compute and incremental
computes after that. That is why "the incremental compute should do full
compute if the table has no partition stats available previously".
Also, I am waiting for parent PR (Internal data #12946 ) to be merged to
rebase this PR.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]