[DISCUSS] Proposal: Delta-Encoded Schemas in v4, to Address Metadata Bloat

Talat Uyarer via dev Thu, 12 Feb 2026 07:56:31 -0800

Hi All,

I am sharing a new proposal for Iceberg Spec v4: *Delta-Encoded Schemas*. We
propose moving away from monolithic schema storage to address a growing
scalability bottleneck in high-velocity and ultra-wide table environments.


The current Iceberg Spec re-serializes and appends the entire schema object
to metadata.json for every schema operation, which leads to massive schema
data replication. For a large table with 5,000 columns+ with frequent
schema updates, this can result in metadata files exceeding GBs, causing
significant query planning latencies and OOM driver side.

*Proposal Summary:*

We propose implementing *Delta-Encoded Schema Evolution for Spec v4* using
a *"Merge-on-Read" (MoR) approach for metadata*. This approach involves
transitioning the schemas field from "Full Snapshots" to a sequence of *Base
Schemas* (type full) and *Schema Deltas* (type delta) that store
differential mutations relative to a base ID.

*Key Goals:*

   - Achieve a *99.4% reduction in the size of schema-related metadata*.
   - Drastically lower the storage and IO requirements for metadata.json.
   - Accelerate query planning by reducing the JSON payload size.
   - Preserve self-containment by keeping the schema in the metadata file,
   avoiding external sidecar files.

The full proposal, including the flat resolution model (no delta chaining),
the defined set of atomic delta operations (add, update, delete), and the
lifecycle/compaction mechanics, is available for review:

https://s.apache.org/iceberg-delta-schemas
<https://www.google.com/url?source=gmail&sa=E&q=https://s.apache.org/iceberg-delta-schemas>

I look forward to your feedback and discussion on the dev list.

Thanks
Talat

[DISCUSS] Proposal: Delta-Encoded Schemas in v4, to Address Metadata Bloat

Reply via email to