Thanks Peter for the feedback. This aligns with the current implementation in the PR - core computes total_record_count when it can be done unambiguously(no eq deletes, no V2 position delete files) and leaves it null otherwise. Engines remain free to override with a more precise value when they have additional context.
Thanks Hemanth Boyina On Thu, 23 Apr 2026 at 3:59 PM, Péter Váry <[email protected]> wrote: > Hi Hemanth, > > My take is that keeping `total_record_count` in the spec and populating it > automatically in core when it’s reliably computable is still valuable, even > if the presence of equality deletes (which remain in v3 as well) limits > when the value is strictly derivable. > In cases where there are no equality deletes (and the relevant position > delete constraints you listed), core can compute it cheaply and > consistently, which: > > - avoids re-implementing/forking the same logic across engines, > - makes the value immediately available from the persisted stats file > (without requiring an engine-side pass), > - and matches the spirit of the spec encouraging richer partition > stats when possible. > > At the same time, because equality deletes can invalidate the derivation, > it also seems reasonable that engines retain the option to > recompute/override total_record_count with a “more correct” value when they > have additional context or are already scanning delete metadata. > So I’d lean toward: core computes and persists it when it can do so > unambiguously; otherwise leave it null, and engines are free to > fill/override in their own pipelines if they want. > This gives us a best-effort baseline in the common cases, without forcing > complexity or correctness guarantees in the hard cases. > > Thanks! > Peter > > hemanth boyina <[email protected]> ezt írta (időpont: 2026. ápr. > 22., Sze, 7:43): > >> Hi all, >> >> I have raised a PR [1] to populate the total_record_count field in >> partition statistics when computable from metadata( no equality deletes, no >> V2 position delete files). This follows the discussion in #12098 about >> using DV cardinalities for this. >> >> During review, a question came up : since total_record_count is derivable >> from existing fields , should the iceberg core library compute and persist >> it, or should this be left to engines ? >> >> For computing in core: the spec encourages it, it avoids duplicating >> logic across engines, and it’s immediately available from the stats file >> For leaving to engines: it’s a derived value, implementation adds >> complexity around null handling in incremental computation and it can only >> be populated for partitions without eq deletes. >> >> Would appreciate community inputs on the preferred approach. >> [1] >> https://github.com/apache/iceberg/pull/15979 >> >> Thanks >> Hemanth Boyina >> >
