Hi Russell,

Thanks for the information! It clarifies the use case a lot (at least for
me :)

In short, I'd say the main benefit is allowing clients to avoid conflicts
(409) on re-submitting changes that got committed by the server without the
client receiving confirmation of the success.

I believe the Iceberg REST Catalog spec [1] is formally stricter than Model
B when it states "the server ensures no additional effects for requests
that carry the same Idempotency-Key". Since Model B permits request
re-execution, the possibility of additional side effects cannot be ruled
out completely based on the proposed server-side algorithm alone. The
server must assume that the client forms the (change) request in such a way
that only one execution attempt can succeed (e.g. by using "update
requirements"). This is also mentioned in  comments on the doc [2].

This is probably worth mentioning in the Polaris docs related to
our Idempotency-Key implementation.

Assuming this kind of cooperation on the client side, I believe Model B can
be considered compliant with the spec [1].

In anticipation of fresh implementation PRs for this feature, I'd like to
re-emphasize (IIRC I mentioned this before) that, I think, we should avoid
coupling Idempotency persistence with MetaStore persistence (both code-wise
and transaction-wise). Model B processes Idempotency-related data outside
the original change request's execution scope. Idempotency decisions are
made either before the request starts executing or after it is committed to
the MetaStore.

[1]
https://github.com/apache/polaris/blob/4e4eaf840bf71d431b13034b0dd6f338261d8e8b/spec/iceberg-rest-catalog-open-api.yaml#L2098

[2]
https://docs.google.com/document/d/1hqTejVyYXDpL5MJcVc7NyhCslKaGH82QoqMEcUYPvkE/edit?tab=t.0

Cheers,
Dmitri.

On Fri, May 29, 2026 at 8:26 PM Russell Spitzer <[email protected]>
wrote:

> The problem with a client attempting to determine if it’s operations
> succeeded via  load table, and the reason all this work has proceeded, is
> that there is no way for a client to guaranteed path to actually determine
> if a commit occurred. There are too many legitimate mechanisms to erase
> history from an Iceberg table to guarantee an operation occurred.
>
> For example, you could check if your snapshot exists in snapshot history
> but this could have been erased by expire snapshots.
>
> Or you could check if the schema was modified according to your update, but
> this too could have been undone by another operation. Client A adds column
> but gets time out, Client B removes the Column, Client A retries and adds
> the column again.
>
> Because of this the Iceberg client usually just bails out to he user with
> an exception if it doesn’t get an actual confirmation that the commit
> succeeded from the server. This leaves the “can I retry or not” as an
> exercise to the end user.
>
> In practice, actual Iceberg users work around this sort of thing by adding
> all sorts of custom metadata to hopefully persist history in the table
> itself in some way that can’t be touched by expire snapshots, but this is
> usually very fragile and also relies on all clients behaving well. I’ve
> seen folks use custom table properties for example “batch-5: committed”
> then manually have their own retry logic check whether this property is
> set. Then, of course, they also have to add a bunch custom logic to make
> sure they clean up this state as well.
>
> This is why Iceberg added the Idempotency path in the first place, it gives
> us a guaranteed way for clients to retry in case of a network issue or
> catalog issue with a guarantee they will not do duplicate work be retrying.
> With this in place the client can now cleanly retry (within the idempotency
> window) the same operation over and over without throwing an exception to
> the end user. Only in a situation where the catalog cannot respond over a
> very long time will the user actually have to do some sort of
> reconciliation. You can look at the history of the Iceberg client’s retry
> behavior with ambiguous server side or network errors to see how this has
> been a problem in the past.
>
> On Fri, May 29, 2026 at 1:24 PM huaxin gao <[email protected]> wrote:
>
> > Hi Robert,
> >
> > Thanks for your reply!
> >
> > You're right that Model B does not prevent duplicate execution. The
> > record is written only after success. So if a client times out while the
> > first request is still running, a retry can run the handler again. There
> > is no record yet to stop it. So Model B is "remember and replay a
> > successful result," not "run exactly once."
> >
> > On the trade-off: Model A gives a stronger guarantee, but it needs
> > reserve/heartbeat/purge state, which adds complexity and overhead. Model
> > B is simpler and cheaper. The window it leaves open is small, and a
> > client only retries after a timeout, so racing first requests should be
> > rare in practice. Every design is a trade-off, and my view is that Model
> > B is the right one here.
> >
> > It also helps to be clear about where duplicate-work protection really
> > comes from. It comes from the catalog itself, not from idempotency. The
> > catalog uses optimistic concurrency. If wo first attempts race, at most
> > one commit wins and the other gets a 409. Idempotency sits on top of
> that.
> > It does not replace it.
> >
> > So what does Model B add over "the client just calls loadTable and
> > reconciles"? Two things that I think are real:
> >
> >   1. The 422 check. loadTable can tell a client that a table exists. It
> >      cannot tell the client that the table THEY created with THIS key is
> >      the one that succeeded. The record binds the key to (principal,
> >      operation, resource). If the same key is reused for a different
> >      request, the server returns 422. The client cannot detect this on
> >      its own.
> >
> >   2. One server-side behavior for all mutating ops. create-table happens
> >      to reconcile cleanly with loadTable. But the point of the
> >      Idempotency-Key header is that the client should not have to write
> >      reconciliation logic for every operation. For a known key, the
> >      server turns what would be a 409 into an equivalent 2xx replay. The
> >      client gets a clean success instead of an error it has to special-
> >      case.
> >
> > There is a third, weaker benefit: once a record exists, retries stop
> > seeing flip-flopping results. But that only helps after a record exists,
> > which is exactly the window you pointed out is unprotected.
> >
> > So I'll correct my earlier wording. This is not convergence on exactly-
> > once idempotency. It is a narrower guarantee: replay a recorded result,
> > plus detect key misuse. It sits on top of the catalog's existing
> > concurrency control. The real question for the list is simple: is that
> > narrower guarantee worth shipping on its own? Or do we need Model A's
> > in-flight protection to have a strong idempotency guarantee?
> >
> > My view is that the narrow version is worth it for now: it's the
> > behavior the spec asks for, the 422 check can't be done client-side, and
> > it's a small change we can strengthen toward Model A later if a real use
> > case needs it. Happy to hear what others think.
> >
> > Best,
> > Huaxin
> >
> > On Fri, May 29, 2026 at 7:36 AM Robert Stupp <[email protected]> wrote:
> >
> > > Hi Huaxin,
> > >
> > > Thanks for writing this up and moving the design discussion back to
> dev@
> > .
> > >
> > > Since you’re asking before locking in the implementation, I think we
> > should
> > > clarify one point.
> > >
> > > Model B is certainly simpler than the lease-based approach, but I’m not
> > > sure I fully understand what problem it still solves.
> > >
> > > As I read it, if a client times out while the original request is still
> > > running, a retry with the same key may not see an idempotency record
> yet
> > > and could run the handler again.
> > > So this feels less like preventing duplicate execution and more like
> > > remembering a successful result after the fact.
> > >
> > > For the create-table case, couldn’t a client achieve roughly the same
> > > recovery by calling loadTable after an ambiguous timeout and
> reconciling
> > > from there?
> > > Since Model B also rebuilds the response from current catalog state,
> I’m
> > > trying to understand what it gives us beyond that.
> > >
> > > I’m not against simplifying the design, but I think we should be clear
> > > about the narrower guarantee before calling this convergence.
> > >
> > > Best,
> > > Robert
> > >
> > >
> > > On Fri, May 29, 2026 at 12:29 AM huaxin gao <[email protected]>
> > > wrote:
> > >
> > > > Hi all,
> > > >
> > > > I've simplified the proposed design for Idempotency-Key support in
> > > Polaris
> > > > (Iceberg REST spec — retries with the same key must not produce
> > > additional
> > > > side effects), and I'd like a wider review before updating the
> > > > implementation PR (#4269 <
> https://github.com/apache/polaris/pull/4269
> > >).
> > > >
> > > > What changed
> > > >
> > > >   - Before (Model A, lease-based): reserve an idempotency row before
> > > doing
> > > > work → IN_PROGRESS / heartbeat → finalize after.
> > > >   - After (Model B, optimistic commit): run the handler first →
> record
> > > only
> > > > after a successful (2xx) outcome. The record stores binding + status,
> > not
> > > > the HTTP response body. Retries with the same key re-derive an
> > equivalent
> > > > response from current catalog state
> > > >     instead of replaying a stored payload.
> > > >
> > > > The design doc still compares Model A and Model B side-by-side so the
> > > > trade-offs are explicit. So far the discussion has been leaning
> toward
> > > > Model B — mutating REST operations only, 2xx-only persistence, no
> > > > response-body storage, and the known
> > > > trade-offs (e.g. concurrent first-request races; see the NOTES
> section
> > in
> > > > the doc).
> > > >
> > > > Does this direction look right before we lock in the implementation?
> > > >
> > > > Comments on the doc
> > > > <
> > > >
> > >
> >
> https://docs.google.com/document/d/1hqTejVyYXDpL5MJcVc7NyhCslKaGH82QoqMEcUYPvkE/edit?tab=t.0
> > > > >
> > > > or replies on this thread both work.
> > > >
> > > > Thanks,
> > > > Huaxin
> > > >
> > >
> >
>

Reply via email to