Hi Amogh,
Thanks for the response. You're right that ExpireSnapshots correctly
identifies reachable snapshots at computation time. Java and Go
implementations both do this. The race window I'm concerned about is a
change of state between when ExpireSnapshots computes the candidate list
and when Commit executes.
Specifically:
1. ExpireSnapshots loads metadata M1, computes candidates [snap-1]
(unreferenced in M1)
2. Client concurrently commits SetSnapshotRef("feature-branch", snap-1)
(metadata is now M2)
3. Maintenance job's Commit loads fresh metadata M2 and applies
RemoveSnapshots([snap-1]) against it
4. CAS succeeds because maintenance has the latest pointer
The result is that both snap-1 and the newly added feature-branch ref are
removed, since RemoveSnapshots also cleans up refs pointing to deleted
snapshots.
Having said that, I do think this is probably best handled at the catalog
level rather than changing client library semantics or the spec. We'll
handle it in our catalog's commit path by re-validating the candidate list
against the freshly loaded metadata before applying the updates.
Thanks,
Krutika
On Tue, Feb 10, 2026 at 12:03 AM Amogh Jahagirdar <[email protected]> wrote:
> Hey Krutika,
>
> >1. A maintenance job identifies snap-1 as eligible for expiration.
> > 2. A client commits SetSnapshotRef("feature-branch", snap-1).
> > If the client commits first, the maintenance job loads the fresh
> metadata (which now contains a ref for feature-branch -> snap-1), builds
> its RemoveSnapshots([snap-1]) update, and commits. The CAS > succeeds
> because the maintenance job has the latest metadata pointer. But the
> resulting metadata is corrupt: feature-branch points to a deleted snapshot.
>
> This situation shouldn't happen. In the scenario described, after the
> client commits, it has committed an addition of a reference to a snapshot.
> The maintenance job performing snapshot removal, should not be removing
> "snap-1" because it's still referenced (assuming retention is set correctly
> etc). The Java reference implementation of expire snapshots
> implementation does a reachability of what
> <https://github.com/apache/iceberg/blob/main/core/src/main/java/org/apache/iceberg/RemoveSnapshots.java#L299>
> snapshots are still referenced, and those will not be eligible for cleanup.
> Here's the overall algorithm
> <https://iceberg.apache.org/spec/#snapshot-retention-policy>.
>
> Let me know if I misunderstood the scenario or even better if there's a
> test that can repro the issue, but I don't think we should need another
> update type, the cleanup logic in snapshot expiration already should not be
> considering valid referenced snapshots for removal.
>
> Thanks,
> Amogh Jahagirdar
>
> On Sun, Feb 8, 2026 at 10:14 PM Krutika Dhananjay <[email protected]>
> wrote:
>
>> Adding Matt Topol to the discussion.
>>
>> Thanks,
>> Krutika
>>
>> On Fri, Feb 6, 2026 at 3:55 PM Krutika Dhananjay <[email protected]>
>> wrote:
>>
>>> Hi everyone,
>>>
>>> I'd like to bring up a race condition between snapshot expiration and
>>> concurrent client commits that add refs to snapshots.
>>>
>>> *The Problem*
>>> Consider a table with snap-1 (unreferenced) and snap-2 (main). Two
>>> operations happen concurrently:
>>>
>>> 1. A maintenance job identifies snap-1 as eligible for expiration.
>>> 2. A client commits SetSnapshotRef("feature-branch", snap-1).
>>>
>>> If the client commits first, the maintenance job loads the fresh
>>> metadata (which now contains a ref for feature-branch -> snap-1), builds
>>> its RemoveSnapshots([snap-1]) update, and commits. The CAS succeeds because
>>> the maintenance job has the latest metadata pointer. But the resulting
>>> metadata is corrupt: feature-branch points to a deleted snapshot.
>>>
>>> CAS-based optimistic concurrency protects against concurrent pointer
>>> updates, but it does not validate the semantic correctness of the operation
>>> against the current state. The maintenance job's commit is structurally
>>> valid (latest pointer) but semantically invalid (removes a referenced
>>> snapshot).
>>>
>>> *Proposed Fix in iceberg-go*
>>> We addressed <https://github.com/apache/iceberg-go/pull/716> this in
>>> iceberg-go by modifying MetadataBuilder.RemoveSnapshots() to skip removing
>>> any snapshot that is currently referenced by a branch or tag. This works
>>> correctly -- the validation happens against the same metadata state that
>>> the CAS will verify hasn't changed.
>>> However, this changes the existing behavior of RemoveSnapshots, which
>>> currently removes the snapshot unconditionally and cleans up any refs that
>>> pointed to it. This would be a deviation from the behaviour in iceberg-java
>>> and iceberg-rs.
>>>
>>> *Proposed Alternative: AssertSnapshotNotReferenced*
>>> A less invasive approach that preserves the existing RemoveSnapshots
>>> behavior would be to introduce a new assertion update --
>>> AssertSnapshotNotReferenced. This would be analogous to existing
>>> assertions, but for validating that a snapshot has no refs pointing to it.
>>>
>>> During commit, a catalog-side maintenance job would include this
>>> assertion alongside its RemoveSnapshots update:
>>>
>>> updates: [
>>> AssertSnapshotNotReferenced(snap-1),
>>> RemoveSnapshots([snap-1])
>>> ]
>>>
>>> The assertion would be validated against the freshly loaded metadata
>>> during commit. If a concurrent client had linked a ref to snap-1 in the
>>> meantime, the assertion fails, the CAS-based commit is never attempted, and
>>> the maintenance job can retry with updated knowledge of the table state.
>>>
>>> This approach:
>>> - Does not change existing RemoveSnapshots semantics
>>> - Can be implemented uniformly across Java, Rust, and Go
>>>
>>> Would love to hear thoughts on the preferred way to handle this.
>>>
>>> Thanks,
>>> Krutika
>>>
>>