Re: [PR] Core: Support incremental compute for partition stats [iceberg]

via GitHub Wed, 07 May 2025 07:49:49 -0700


ajantha-bhat commented on PR #12629:
URL: https://github.com/apache/iceberg/pull/12629#issuecomment-2858894944


   @pvary, @gaborkaszab, @deniskuzZ: Today I spent some time about how the 
incremental stats should be used by end users. 
   
   **By default, it should be incremental compute (but the incremental compute 
should do full compute if the table has no partition stats available 
previously). Other failure cases Incremental stats can throw error. 
   User will also have an option to force the full compute.** 
   
   `CALL catalog_name.system. compute_partition_stats('db.sample'); ` -- does 
incremental compute if previous stats exist, else full compute. 
   
   `CALL catalog_name.system. compute_partition_stats(table => 'db.sample', 
full_compute => true); ` -- full compute (doesn't care about previous stats)
   
   Do we have any difference of opinion of this? I didn't jump into code to 
avoid back and forth. The thing we should keep in mind that users should not 
have to change their script for first time full compute and incremental 
computes after that. That is why "the incremental compute should do full 
compute if the table has no partition stats available previously".
   
   Also, I am waiting for parent PR (Internal data #12946 ) to be merged to 
rebase this PR. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] Core: Support incremental compute for partition stats [iceberg]

Reply via email to