Hey Krutika,
>1. A maintenance job identifies snap-1 as eligible for expiration.
> 2. A client commits SetSnapshotRef("feature-branch", snap-1).
> If the client commits first, the maintenance job loads the fresh metadata
(which now contains a ref for feature-branch -> snap-1), builds its
RemoveSnapshots([snap-1]) update, and commits. The CAS > succeeds because
the maintenance job has the latest metadata pointer. But the resulting
metadata is corrupt: feature-branch points to a deleted snapshot.
This situation shouldn't happen. In the scenario described, after the
client commits, it has committed an addition of a reference to a snapshot.
The maintenance job performing snapshot removal, should not be removing
"snap-1" because it's still referenced (assuming retention is set correctly
etc). The Java reference implementation of expire snapshots implementation
does a reachability of what
<https://github.com/apache/iceberg/blob/main/core/src/main/java/org/apache/iceberg/RemoveSnapshots.java#L299>
snapshots are still referenced, and those will not be eligible for cleanup.
Here's the overall algorithm
<https://iceberg.apache.org/spec/#snapshot-retention-policy>.
Let me know if I misunderstood the scenario or even better if there's a
test that can repro the issue, but I don't think we should need another
update type, the cleanup logic in snapshot expiration already should not be
considering valid referenced snapshots for removal.
Thanks,
Amogh Jahagirdar
On Sun, Feb 8, 2026 at 10:14 PM Krutika Dhananjay <[email protected]> wrote:
> Adding Matt Topol to the discussion.
>
> Thanks,
> Krutika
>
> On Fri, Feb 6, 2026 at 3:55 PM Krutika Dhananjay <[email protected]> wrote:
>
>> Hi everyone,
>>
>> I'd like to bring up a race condition between snapshot expiration and
>> concurrent client commits that add refs to snapshots.
>>
>> *The Problem*
>> Consider a table with snap-1 (unreferenced) and snap-2 (main). Two
>> operations happen concurrently:
>>
>> 1. A maintenance job identifies snap-1 as eligible for expiration.
>> 2. A client commits SetSnapshotRef("feature-branch", snap-1).
>>
>> If the client commits first, the maintenance job loads the fresh metadata
>> (which now contains a ref for feature-branch -> snap-1), builds its
>> RemoveSnapshots([snap-1]) update, and commits. The CAS succeeds because the
>> maintenance job has the latest metadata pointer. But the resulting metadata
>> is corrupt: feature-branch points to a deleted snapshot.
>>
>> CAS-based optimistic concurrency protects against concurrent pointer
>> updates, but it does not validate the semantic correctness of the operation
>> against the current state. The maintenance job's commit is structurally
>> valid (latest pointer) but semantically invalid (removes a referenced
>> snapshot).
>>
>> *Proposed Fix in iceberg-go*
>> We addressed <https://github.com/apache/iceberg-go/pull/716> this in
>> iceberg-go by modifying MetadataBuilder.RemoveSnapshots() to skip removing
>> any snapshot that is currently referenced by a branch or tag. This works
>> correctly -- the validation happens against the same metadata state that
>> the CAS will verify hasn't changed.
>> However, this changes the existing behavior of RemoveSnapshots, which
>> currently removes the snapshot unconditionally and cleans up any refs that
>> pointed to it. This would be a deviation from the behaviour in iceberg-java
>> and iceberg-rs.
>>
>> *Proposed Alternative: AssertSnapshotNotReferenced*
>> A less invasive approach that preserves the existing RemoveSnapshots
>> behavior would be to introduce a new assertion update --
>> AssertSnapshotNotReferenced. This would be analogous to existing
>> assertions, but for validating that a snapshot has no refs pointing to it.
>>
>> During commit, a catalog-side maintenance job would include this
>> assertion alongside its RemoveSnapshots update:
>>
>> updates: [
>> AssertSnapshotNotReferenced(snap-1),
>> RemoveSnapshots([snap-1])
>> ]
>>
>> The assertion would be validated against the freshly loaded metadata
>> during commit. If a concurrent client had linked a ref to snap-1 in the
>> meantime, the assertion fails, the CAS-based commit is never attempted, and
>> the maintenance job can retry with updated knowledge of the table state.
>>
>> This approach:
>> - Does not change existing RemoveSnapshots semantics
>> - Can be implemented uniformly across Java, Rust, and Go
>>
>> Would love to hear thoughts on the preferred way to handle this.
>>
>> Thanks,
>> Krutika
>>
>