+1 selective snapshot compaction would be a good addition for streaming/low
latency commit workloads. A tradeoff is that it requires users to opt-in to
more iceberg maintenance, which isn’t always feasible as you mentioned
above.

I think both options would work in tandem:
Short term: optimize read-path (eg: lazy load snapshotLog)
Longer term: explore options such as selective snapshot compaction, storing
snapshots in separate storage from metadata.json, improvements to REST
catalog

On Wed, May 6, 2026 at 1:05 AM Péter Váry <[email protected]>
wrote:

> Another question we should consider:
> - Do we really need to keep all these snapshots?
>
> Let's consider a table with the following history: S1, S2, S3, S4. If we
> don't have equality deletes, could we create an S2' with only metadata
> changes which would contain everything form S2 and S3? If we rewrite the
> table history to S1, S2', S4, then we can reduce the number of snapshots we
> need to keep.
>
> Selective snapshot compaction is something which could be useful for many
> cases.
>
> On Tue, May 5, 2026, 17:38 Amogh Jahagirdar <[email protected]> wrote:
>
>> Thanks Grant,
>>
>> The use case where there are commits every 30 seconds and simultaneously
>> there's also a 30 day retention does seem unique to me but
>> overall I do support simple implementation changes to be able to improve
>> that situation, so I will take a deeper look at the PR.
>>
>> In particular, I'd need to check time-travel queries (and rollbacks) in
>> this model since those cases anyways would need to load the snapshot log.
>> Rollbacks should be less frequent but if time travel queries are
>> also common in this situation, the history will need to be loaded anyways,
>> limiting the benefit of optimizing the history load.
>>
>> I think there's also a tradeoff here worth considering: for tables which
>> are moving fast enough, the utility of caching table metadata is reduced.
>> So for higher-frequency write tables where reads are not as frequent, it
>> may be worth considering simply not caching the table metadata and hitting
>> the catalog directly rather than optimizing the memory footprint of
>> metadata that has a lower cache hit rate.
>>
>> Also I don't think there's anything V4 specific about this, rather it's
>> just calling out a potential implementation improvement independent of
>> table format or catalog spec.
>>
>> Thanks,
>> Amogh Jahagirdar
>>
>> On Tue, May 5, 2026 at 7:57 AM Grant Nicholas <
>> [email protected]> wrote:
>>
>>> Reviving this thread.
>>>
>>> The discussion focused mostly on optimizing
>>> the write path of metadata.json, but we’ve been seeing significant memory
>>> pressure on the read path as well.
>>>
>>> In Trino, most queries are reads and many
>>> TableMetadata instances can be cached in coordinator memory. With large
>>> numbers
>>> of snapshots (e.g. streaming workloads and 30 day retention), both
>>> `snapshots` and `snapshotLog` scale linearly and become
>>> large contributors to heap usage.
>>>
>>> Iceberg already supports lazy loading for `snapshots`, so I explored
>>> applying a similar approach to `snapshotLog`. Conceptually, these two
>>> fields have similar scaling characteristics, so it seemed reasonable
>>> to treat them consistently.
>>>
>>> I put together a prototype here:
>>> https://github.com/apache/iceberg/pull/16207
>>>
>>> Curious if others have seen similar memory pressure issues, especially
>>> in singleton coordinators where metadata is cached across
>>> many tables.
>>>
>>> Grant
>>>
>>

Reply via email to