nastra opened a new issue, #13153:
URL: https://github.com/apache/iceberg/issues/13153
### Proposed Change
## Motivation
Column statistics are currently stored as a mapping of field id to values
across multiple columns (lower/upper bounds, value/nan/null counts, sizes).
This storage model has critical limitations as the number of columns increases
and as new types are being added to Iceberg:
Inefficient Storage due to map-based structure:
* Large memory overhead during planning/processing
* Inability to project specific stats (e.g., only null_value_counts for
column X)
* Type Erasure: Original logical/physical types are lost when stored as
binary blobs, causing:
* Lossy type inference during reads
* Schema evolution challenges (e.g., widening types)
* Rigid Schema: Stats are tied to the data_file entry record, limiting
extensibility for new stats.
## Goals
Improve the column stats representation to allow for the following:
* Projectability: Enable independent access to specific stats (e.g.,
lower_bounds without loading upper_bounds).
* Type Preservation: Store original data types to support accurate reads and
schema evolution.
* Flexible/Extensible Representation: Allow per-field stats structures
(e.g., complex types like Geo/Variant).
## Non-Goals
The following issues are out-of-scope or impractical to address
* Supporting unlimited stats for tables with extreme column counts
* Addressing Parquet column amplification in manifest files
### Proposal document
https://s.apache.org/iceberg-column-stats
### Specifications
- [x] Table
- [ ] View
- [x] REST
- [ ] Puffin
- [ ] Encryption
- [ ] Other
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]