Re: Subject: [DISCUSS] Idempotency-Key design for Iceberg REST: converging on Model B

Russell Spitzer Fri, 29 May 2026 17:25:47 -0700

The problem with a client attempting to determine if it’s operations
succeeded via  load table, and the reason all this work has proceeded, is
that there is no way for a client to guaranteed path to actually determine
if a commit occurred. There are too many legitimate mechanisms to erase
history from an Iceberg table to guarantee an operation occurred.


For example, you could check if your snapshot exists in snapshot history
but this could have been erased by expire snapshots.

Or you could check if the schema was modified according to your update, but
this too could have been undone by another operation. Client A adds column
but gets time out, Client B removes the Column, Client A retries and adds
the column again.

Because of this the Iceberg client usually just bails out to he user with
an exception if it doesn’t get an actual confirmation that the commit
succeeded from the server. This leaves the “can I retry or not” as an
exercise to the end user.

In practice, actual Iceberg users work around this sort of thing by adding
all sorts of custom metadata to hopefully persist history in the table
itself in some way that can’t be touched by expire snapshots, but this is
usually very fragile and also relies on all clients behaving well. I’ve
seen folks use custom table properties for example “batch-5: committed”
then manually have their own retry logic check whether this property is
set. Then, of course, they also have to add a bunch custom logic to make
sure they clean up this state as well.

This is why Iceberg added the Idempotency path in the first place, it gives
us a guaranteed way for clients to retry in case of a network issue or
catalog issue with a guarantee they will not do duplicate work be retrying.
With this in place the client can now cleanly retry (within the idempotency
window) the same operation over and over without throwing an exception to
the end user. Only in a situation where the catalog cannot respond over a
very long time will the user actually have to do some sort of
reconciliation. You can look at the history of the Iceberg client’s retry
behavior with ambiguous server side or network errors to see how this has
been a problem in the past.

On Fri, May 29, 2026 at 1:24 PM huaxin gao <[email protected]> wrote:

> Hi Robert,
>
> Thanks for your reply!
>
> You're right that Model B does not prevent duplicate execution. The
> record is written only after success. So if a client times out while the
> first request is still running, a retry can run the handler again. There
> is no record yet to stop it. So Model B is "remember and replay a
> successful result," not "run exactly once."
>
> On the trade-off: Model A gives a stronger guarantee, but it needs
> reserve/heartbeat/purge state, which adds complexity and overhead. Model
> B is simpler and cheaper. The window it leaves open is small, and a
> client only retries after a timeout, so racing first requests should be
> rare in practice. Every design is a trade-off, and my view is that Model
> B is the right one here.
>
> It also helps to be clear about where duplicate-work protection really
> comes from. It comes from the catalog itself, not from idempotency. The
> catalog uses optimistic concurrency. If wo first attempts race, at most
> one commit wins and the other gets a 409. Idempotency sits on top of that.
> It does not replace it.
>
> So what does Model B add over "the client just calls loadTable and
> reconciles"? Two things that I think are real:
>
>   1. The 422 check. loadTable can tell a client that a table exists. It
>      cannot tell the client that the table THEY created with THIS key is
>      the one that succeeded. The record binds the key to (principal,
>      operation, resource). If the same key is reused for a different
>      request, the server returns 422. The client cannot detect this on
>      its own.
>
>   2. One server-side behavior for all mutating ops. create-table happens
>      to reconcile cleanly with loadTable. But the point of the
>      Idempotency-Key header is that the client should not have to write
>      reconciliation logic for every operation. For a known key, the
>      server turns what would be a 409 into an equivalent 2xx replay. The
>      client gets a clean success instead of an error it has to special-
>      case.
>
> There is a third, weaker benefit: once a record exists, retries stop
> seeing flip-flopping results. But that only helps after a record exists,
> which is exactly the window you pointed out is unprotected.
>
> So I'll correct my earlier wording. This is not convergence on exactly-
> once idempotency. It is a narrower guarantee: replay a recorded result,
> plus detect key misuse. It sits on top of the catalog's existing
> concurrency control. The real question for the list is simple: is that
> narrower guarantee worth shipping on its own? Or do we need Model A's
> in-flight protection to have a strong idempotency guarantee?
>
> My view is that the narrow version is worth it for now: it's the
> behavior the spec asks for, the 422 check can't be done client-side, and
> it's a small change we can strengthen toward Model A later if a real use
> case needs it. Happy to hear what others think.
>
> Best,
> Huaxin
>
> On Fri, May 29, 2026 at 7:36 AM Robert Stupp <[email protected]> wrote:
>
> > Hi Huaxin,
> >
> > Thanks for writing this up and moving the design discussion back to dev@
> .
> >
> > Since you’re asking before locking in the implementation, I think we
> should
> > clarify one point.
> >
> > Model B is certainly simpler than the lease-based approach, but I’m not
> > sure I fully understand what problem it still solves.
> >
> > As I read it, if a client times out while the original request is still
> > running, a retry with the same key may not see an idempotency record yet
> > and could run the handler again.
> > So this feels less like preventing duplicate execution and more like
> > remembering a successful result after the fact.
> >
> > For the create-table case, couldn’t a client achieve roughly the same
> > recovery by calling loadTable after an ambiguous timeout and reconciling
> > from there?
> > Since Model B also rebuilds the response from current catalog state, I’m
> > trying to understand what it gives us beyond that.
> >
> > I’m not against simplifying the design, but I think we should be clear
> > about the narrower guarantee before calling this convergence.
> >
> > Best,
> > Robert
> >
> >
> > On Fri, May 29, 2026 at 12:29 AM huaxin gao <[email protected]>
> > wrote:
> >
> > > Hi all,
> > >
> > > I've simplified the proposed design for Idempotency-Key support in
> > Polaris
> > > (Iceberg REST spec — retries with the same key must not produce
> > additional
> > > side effects), and I'd like a wider review before updating the
> > > implementation PR (#4269 <https://github.com/apache/polaris/pull/4269
> >).
> > >
> > > What changed
> > >
> > >   - Before (Model A, lease-based): reserve an idempotency row before
> > doing
> > > work → IN_PROGRESS / heartbeat → finalize after.
> > >   - After (Model B, optimistic commit): run the handler first → record
> > only
> > > after a successful (2xx) outcome. The record stores binding + status,
> not
> > > the HTTP response body. Retries with the same key re-derive an
> equivalent
> > > response from current catalog state
> > >     instead of replaying a stored payload.
> > >
> > > The design doc still compares Model A and Model B side-by-side so the
> > > trade-offs are explicit. So far the discussion has been leaning toward
> > > Model B — mutating REST operations only, 2xx-only persistence, no
> > > response-body storage, and the known
> > > trade-offs (e.g. concurrent first-request races; see the NOTES section
> in
> > > the doc).
> > >
> > > Does this direction look right before we lock in the implementation?
> > >
> > > Comments on the doc
> > > <
> > >
> >
> https://docs.google.com/document/d/1hqTejVyYXDpL5MJcVc7NyhCslKaGH82QoqMEcUYPvkE/edit?tab=t.0
> > > >
> > > or replies on this thread both work.
> > >
> > > Thanks,
> > > Huaxin
> > >
> >
>

Re: Subject: [DISCUSS] Idempotency-Key design for Iceberg REST: converging on Model B

Reply via email to