Hi everyone,
I'd like to bring up a race condition between snapshot expiration and
concurrent client commits that add refs to snapshots.
*The Problem*
Consider a table with snap-1 (unreferenced) and snap-2 (main). Two
operations happen concurrently:
1. A maintenance job identifies snap-1 as eligible for expiration.
2. A client commits SetSnapshotRef("feature-branch", snap-1).
If the client commits first, the maintenance job loads the fresh metadata
(which now contains a ref for feature-branch -> snap-1), builds its
RemoveSnapshots([snap-1]) update, and commits. The CAS succeeds because the
maintenance job has the latest metadata pointer. But the resulting metadata
is corrupt: feature-branch points to a deleted snapshot.
CAS-based optimistic concurrency protects against concurrent pointer
updates, but it does not validate the semantic correctness of the operation
against the current state. The maintenance job's commit is structurally
valid (latest pointer) but semantically invalid (removes a referenced
snapshot).
*Proposed Fix in iceberg-go*
We addressed <https://github.com/apache/iceberg-go/pull/716> this in
iceberg-go by modifying MetadataBuilder.RemoveSnapshots() to skip removing
any snapshot that is currently referenced by a branch or tag. This works
correctly -- the validation happens against the same metadata state that
the CAS will verify hasn't changed.
However, this changes the existing behavior of RemoveSnapshots, which
currently removes the snapshot unconditionally and cleans up any refs that
pointed to it. This would be a deviation from the behaviour in iceberg-java
and iceberg-rs.
*Proposed Alternative: AssertSnapshotNotReferenced*
A less invasive approach that preserves the existing RemoveSnapshots
behavior would be to introduce a new assertion update --
AssertSnapshotNotReferenced. This would be analogous to existing
assertions, but for validating that a snapshot has no refs pointing to it.
During commit, a catalog-side maintenance job would include this assertion
alongside its RemoveSnapshots update:
updates: [
AssertSnapshotNotReferenced(snap-1),
RemoveSnapshots([snap-1])
]
The assertion would be validated against the freshly loaded metadata during
commit. If a concurrent client had linked a ref to snap-1 in the meantime,
the assertion fails, the CAS-based commit is never attempted, and the
maintenance job can retry with updated knowledge of the table state.
This approach:
- Does not change existing RemoveSnapshots semantics
- Can be implemented uniformly across Java, Rust, and Go
Would love to hear thoughts on the preferred way to handle this.
Thanks,
Krutika