Hi Hemanth, My take is that keeping `total_record_count` in the spec and populating it automatically in core when it’s reliably computable is still valuable, even if the presence of equality deletes (which remain in v3 as well) limits when the value is strictly derivable. In cases where there are no equality deletes (and the relevant position delete constraints you listed), core can compute it cheaply and consistently, which:
- avoids re-implementing/forking the same logic across engines, - makes the value immediately available from the persisted stats file (without requiring an engine-side pass), - and matches the spirit of the spec encouraging richer partition stats when possible. At the same time, because equality deletes can invalidate the derivation, it also seems reasonable that engines retain the option to recompute/override total_record_count with a “more correct” value when they have additional context or are already scanning delete metadata. So I’d lean toward: core computes and persists it when it can do so unambiguously; otherwise leave it null, and engines are free to fill/override in their own pipelines if they want. This gives us a best-effort baseline in the common cases, without forcing complexity or correctness guarantees in the hard cases. Thanks! Peter hemanth boyina <[email protected]> ezt írta (időpont: 2026. ápr. 22., Sze, 7:43): > Hi all, > > I have raised a PR [1] to populate the total_record_count field in > partition statistics when computable from metadata( no equality deletes, no > V2 position delete files). This follows the discussion in #12098 about > using DV cardinalities for this. > > During review, a question came up : since total_record_count is derivable > from existing fields , should the iceberg core library compute and persist > it, or should this be left to engines ? > > For computing in core: the spec encourages it, it avoids duplicating logic > across engines, and it’s immediately available from the stats file > For leaving to engines: it’s a derived value, implementation adds > complexity around null handling in incremental computation and it can only > be populated for partitions without eq deletes. > > Would appreciate community inputs on the preferred approach. > [1] > https://github.com/apache/iceberg/pull/15979 > > Thanks > Hemanth Boyina >
