Re: [DISCUSS] Proposal: Delta-Encoded Schemas in v4, to Address Metadata Bloat

Amogh Jahagirdar Thu, 12 Feb 2026 11:50:05 -0800

Agree with Ryan that we should at least consider other options before
running to changing the format. Maybe the other options don't reasonably
solve the problem and we do have to do 4, but it's better to systematically
eliminate those. That said, I do at least think this is a legitimate issue
even for reasonably limited retention policies.

I can see that this is primarily a problem in memory AND in the sending
back to clients. On disk it should compress pretty well but maybe we get
some numbers there.
Catalogs have to decompress or load the metadata JSON into memory, and to
me it's clear that if you have a wide schema and not even a particularly
long retention, catalog servers have to load quite a lot of repeated schema
representations into memory (the example with math in the doc seems pretty
reasonable to me). So I can see the representation in it's current form
being a bottleneck for catalog implementations having to even first load up
the metadata from the file.

The other part is the serving of the representation to clients. Even if the
client requests refs mode, in the examples of wide schemas that were
provided, the server has to send back at least the schemas for the
corresponding refs (let alone "all") for a client to resolve everything it
needs. There's the overhead of that going over the wire (though tbh don't
know if that's a real problem) AND more importantly if the client (e.g. a
spark driver or trino coordinator) can hold it in memory.

I do think there's some overlap here with the other discussion about
metadata json requirements, maybe others disagree with that and think it's
independent but if catalogs were able to make their own choice on how to
store and represent all of this, and we defer how to represent all these
wide schemas in the persistent format to later (e.g. at the time of
exporting the metadata to a file). Admittedly, I used to be really against
eliminating the metadata json requirement but it's becoming clear that it
may be limiting for these kinds of legitimate cases.

Thanks,
Amogh Jahagirdar

On Thu, Feb 12, 2026 at 11:59 AM Ryan Blue <[email protected]> wrote:

> Is this a problem in memory or on disk? I would expect schemas like this
> to compress fairly well. Or maybe the issue is sending them to clients? I
> just always prefer solutions that are simpler, so in approximate order: 1)
> don't keep so many, 2) use generic compression, 3) don't send them if you
> don't need to, and 4) change the representation. I just want to make sure
> we aren't jumping to 4 when a simpler solution would work.
>
> On Thu, Feb 12, 2026 at 10:45 AM Russell Spitzer <
> [email protected]> wrote:
>
>> For very wide tables, I think this becomes a problem with single digit
>> numbers of schema changes. My theoretical thought here is we have a table
>> with 1000 columns that we add new columns to every hour or so. Unless I
>> want to keep my history locked to 24hours (or less) schema bloat is gonna
>> be a pretty big issue
>>
>> On Thu, Feb 12, 2026 at 10:37 AM Ryan Blue <[email protected]> wrote:
>>
>>> For tables where this is a problem, how are you currently managing older
>>> schemas? Older schemas do not need to be kept if there aren't any snapshots
>>> that reference them.
>>>
>>> On Thu, Feb 12, 2026 at 10:24 AM Russell Spitzer <
>>> [email protected]> wrote:
>>>
>>>> My gut instinct on this is that it's a great idea. I think we probably
>>>> need to think a bit more about how to decide on "base" schema promotion but
>>>> theoretically this seems like it should be a huge benefit for wide tables.
>>>>
>>>> On Thu, Feb 12, 2026 at 7:55 AM Talat Uyarer via dev <
>>>> [email protected]> wrote:
>>>>
>>>>> Hi All,
>>>>>
>>>>> I am sharing a new proposal for Iceberg Spec v4: *Delta-Encoded
>>>>> Schemas*. We propose moving away from monolithic schema storage to
>>>>> address a growing scalability bottleneck in high-velocity and ultra-wide
>>>>> table environments.
>>>>>
>>>>> The current Iceberg Spec re-serializes and appends the entire schema
>>>>> object to metadata.json for every schema operation, which leads to
>>>>> massive schema data replication. For a large table with 5,000
>>>>> columns+ with frequent schema updates, this can result in metadata files
>>>>> exceeding GBs, causing significant query planning latencies and OOM driver
>>>>> side.
>>>>>
>>>>> *Proposal Summary:*
>>>>>
>>>>> We propose implementing *Delta-Encoded Schema Evolution for Spec v4* using
>>>>> a *"Merge-on-Read" (MoR) approach for metadata*. This approach
>>>>> involves transitioning the schemas field from "Full Snapshots" to a
>>>>> sequence of *Base Schemas* (type full) and *Schema Deltas* (type delta)
>>>>> that store differential mutations relative to a base ID.
>>>>>
>>>>> *Key Goals:*
>>>>>
>>>>>    - Achieve a *99.4% reduction in the size of schema-related
>>>>>    metadata*.
>>>>>    - Drastically lower the storage and IO requirements for
>>>>>    metadata.json.
>>>>>    - Accelerate query planning by reducing the JSON payload size.
>>>>>    - Preserve self-containment by keeping the schema in the metadata
>>>>>    file, avoiding external sidecar files.
>>>>>
>>>>> The full proposal, including the flat resolution model (no delta
>>>>> chaining), the defined set of atomic delta operations (add, update,
>>>>> delete), and the lifecycle/compaction mechanics, is available for
>>>>> review:
>>>>>
>>>>> https://s.apache.org/iceberg-delta-schemas
>>>>> <https://www.google.com/url?source=gmail&sa=E&q=https://s.apache.org/iceberg-delta-schemas>
>>>>>
>>>>> I look forward to your feedback and discussion on the dev list.
>>>>>
>>>>> Thanks
>>>>> Talat
>>>>>
>>>>

Re: [DISCUSS] Proposal: Delta-Encoded Schemas in v4, to Address Metadata Bloat

Reply via email to