Hi, everyone - As part of v4, we are adding aggregate column stats to the root manifests. I wrote a short discussion doc <https://docs.google.com/document/d/1glCxPNWHWmlxc5ULBcpxmsgOKR6i4Y4RErDRZD7vuJc/edit?tab=t.0> on a couple of topics in this area:
- Define aggregation rules on how to compute these aggregate stats. - Column stats at the file level are optional. So a naive aggregation can lead to false pruning. We need a mechanism to avoid it. The doc covers these with examples and has some options for the v4 spec. I'm looking for feedback on the approach, especially around using `null_count` as a sentinel vs alternatives. Please feel free to comment directly on the doc. I've also added this as an agenda item in the next v4 metadata tree community sync. Link: https://docs.google.com/document/d/1glCxPNWHWmlxc5ULBcpxmsgOKR6i4Y4RErDRZD7vuJc/edit?tab=t.0 Best, Anoop
