Hi all,

I have raised a PR [1] to populate the total_record_count field in
partition statistics when computable from metadata( no equality deletes, no
V2 position delete files). This follows the discussion in #12098 about
using DV cardinalities for this.

During review, a question came up : since total_record_count is derivable
from existing fields , should the iceberg core library compute and persist
it, or should this be left to engines ?

For computing in core: the spec encourages it, it avoids duplicating logic
across engines, and it’s immediately available from the stats file
For leaving to engines: it’s a derived value, implementation adds
complexity around null handling in incremental computation and it can only
be populated for partitions without eq deletes.

Would appreciate community inputs on the preferred approach.
[1]
https://github.com/apache/iceberg/pull/15979

Thanks
Hemanth Boyina

Reply via email to