On Wed, Feb 11, 2026 at 6:50 PM Tobias <[email protected]> wrote:
> Hi Krutika and Amogh, > > I'm not sure if handling this on the implementation level of a REST > catalog is the best solution. The remove-snapshot update implementation of > iceberg-go, iceberg-rust and iceberg-java all silently remove references to > snapshots. A catalog does not know if the received update was produced by > running ExpireSnapshots or if it is an explicit call, changing the behavior > to fail instead of cleaning up referenced branches means the REST catalog > diverges in behavior from other catalogs. > > Hi Tobias, That is a valid point -- that a catalog cannot tell if a remove-snapshot update came from a client or a snapshot expiration maintenance job. If it is the former, we do want to remove both the snapshot and any refs it is associated with, but the solution I was proposing would silently retain the snapshot a client intended to delete. > Assuming that ExpireSnapshots always produces RemoveSnapshotRef alongside > RemoveSnapshots updates, a boolean field could be added to RemoveSnapshots > which would fail instead of doing the silent cleanup of snapshot references. > > This solution seems to address both the concerns, while still preserving the existing public signature for RemoveSnapshots() in iceberg-go. So existing callers will be unaffected. Thanks, Krutika > Kind regards, > Tobias > > Am Mi., 11. Feb. 2026 um 11:54 Uhr schrieb Krutika Dhananjay < > [email protected]>: > >> Hi Amogh, >> >> Thanks for the response. You're right that ExpireSnapshots correctly >> identifies reachable snapshots at computation time. Java and Go >> implementations both do this. The race window I'm concerned about is a >> change of state between when ExpireSnapshots computes the candidate list >> and when Commit executes. >> >> Specifically: >> 1. ExpireSnapshots loads metadata M1, computes candidates [snap-1] >> (unreferenced in M1) >> 2. Client concurrently commits SetSnapshotRef("feature-branch", snap-1) >> (metadata is now M2) >> 3. Maintenance job's Commit loads fresh metadata M2 and applies >> RemoveSnapshots([snap-1]) against it >> 4. CAS succeeds because maintenance has the latest pointer >> >> The result is that both snap-1 and the newly added feature-branch ref are >> removed, since RemoveSnapshots also cleans up refs pointing to deleted >> snapshots. >> >> Having said that, I do think this is probably best handled at the catalog >> level rather than changing client library semantics or the spec. We'll >> handle it in our catalog's commit path by re-validating the candidate list >> against the freshly loaded metadata before applying the updates. >> >> Thanks, >> Krutika >> >> On Tue, Feb 10, 2026 at 12:03 AM Amogh Jahagirdar <[email protected]> >> wrote: >> >>> Hey Krutika, >>> >>> >1. A maintenance job identifies snap-1 as eligible for expiration. >>> > 2. A client commits SetSnapshotRef("feature-branch", snap-1). >>> > If the client commits first, the maintenance job loads the fresh >>> metadata (which now contains a ref for feature-branch -> snap-1), builds >>> its RemoveSnapshots([snap-1]) update, and commits. The CAS > succeeds >>> because the maintenance job has the latest metadata pointer. But the >>> resulting metadata is corrupt: feature-branch points to a deleted snapshot. >>> >>> This situation shouldn't happen. In the scenario described, after the >>> client commits, it has committed an addition of a reference to a snapshot. >>> The maintenance job performing snapshot removal, should not be removing >>> "snap-1" because it's still referenced (assuming retention is set correctly >>> etc). The Java reference implementation of expire snapshots >>> implementation does a reachability of what >>> <https://github.com/apache/iceberg/blob/main/core/src/main/java/org/apache/iceberg/RemoveSnapshots.java#L299> >>> snapshots are still referenced, and those will not be eligible for cleanup. >>> Here's the overall algorithm >>> <https://iceberg.apache.org/spec/#snapshot-retention-policy>. >>> >>> Let me know if I misunderstood the scenario or even better if there's a >>> test that can repro the issue, but I don't think we should need another >>> update type, the cleanup logic in snapshot expiration already should not be >>> considering valid referenced snapshots for removal. >>> >>> Thanks, >>> Amogh Jahagirdar >>> >>> On Sun, Feb 8, 2026 at 10:14 PM Krutika Dhananjay <[email protected]> >>> wrote: >>> >>>> Adding Matt Topol to the discussion. >>>> >>>> Thanks, >>>> Krutika >>>> >>>> On Fri, Feb 6, 2026 at 3:55 PM Krutika Dhananjay <[email protected]> >>>> wrote: >>>> >>>>> Hi everyone, >>>>> >>>>> I'd like to bring up a race condition between snapshot expiration and >>>>> concurrent client commits that add refs to snapshots. >>>>> >>>>> *The Problem* >>>>> Consider a table with snap-1 (unreferenced) and snap-2 (main). Two >>>>> operations happen concurrently: >>>>> >>>>> 1. A maintenance job identifies snap-1 as eligible for expiration. >>>>> 2. A client commits SetSnapshotRef("feature-branch", snap-1). >>>>> >>>>> If the client commits first, the maintenance job loads the fresh >>>>> metadata (which now contains a ref for feature-branch -> snap-1), builds >>>>> its RemoveSnapshots([snap-1]) update, and commits. The CAS succeeds >>>>> because >>>>> the maintenance job has the latest metadata pointer. But the resulting >>>>> metadata is corrupt: feature-branch points to a deleted snapshot. >>>>> >>>>> CAS-based optimistic concurrency protects against concurrent pointer >>>>> updates, but it does not validate the semantic correctness of the >>>>> operation >>>>> against the current state. The maintenance job's commit is structurally >>>>> valid (latest pointer) but semantically invalid (removes a referenced >>>>> snapshot). >>>>> >>>>> *Proposed Fix in iceberg-go* >>>>> We addressed <https://github.com/apache/iceberg-go/pull/716> this in >>>>> iceberg-go by modifying MetadataBuilder.RemoveSnapshots() to skip removing >>>>> any snapshot that is currently referenced by a branch or tag. This works >>>>> correctly -- the validation happens against the same metadata state that >>>>> the CAS will verify hasn't changed. >>>>> However, this changes the existing behavior of RemoveSnapshots, which >>>>> currently removes the snapshot unconditionally and cleans up any refs that >>>>> pointed to it. This would be a deviation from the behaviour in >>>>> iceberg-java >>>>> and iceberg-rs. >>>>> >>>>> *Proposed Alternative: AssertSnapshotNotReferenced* >>>>> A less invasive approach that preserves the existing RemoveSnapshots >>>>> behavior would be to introduce a new assertion update -- >>>>> AssertSnapshotNotReferenced. This would be analogous to existing >>>>> assertions, but for validating that a snapshot has no refs pointing to it. >>>>> >>>>> During commit, a catalog-side maintenance job would include this >>>>> assertion alongside its RemoveSnapshots update: >>>>> >>>>> updates: [ >>>>> AssertSnapshotNotReferenced(snap-1), >>>>> RemoveSnapshots([snap-1]) >>>>> ] >>>>> >>>>> The assertion would be validated against the freshly loaded metadata >>>>> during commit. If a concurrent client had linked a ref to snap-1 in the >>>>> meantime, the assertion fails, the CAS-based commit is never attempted, >>>>> and >>>>> the maintenance job can retry with updated knowledge of the table state. >>>>> >>>>> This approach: >>>>> - Does not change existing RemoveSnapshots semantics >>>>> - Can be implemented uniformly across Java, Rust, and Go >>>>> >>>>> Would love to hear thoughts on the preferred way to handle this. >>>>> >>>>> Thanks, >>>>> Krutika >>>>> >>>>
