Re: Subject: [DISCUSS] Idempotency-Key design for Iceberg REST: converging on Model B

huaxin gao Fri, 29 May 2026 11:25:40 -0700

Hi Robert,

Thanks for your reply!

You're right that Model B does not prevent duplicate execution. The
record is written only after success. So if a client times out while the
first request is still running, a retry can run the handler again. There
is no record yet to stop it. So Model B is "remember and replay a
successful result," not "run exactly once."

On the trade-off: Model A gives a stronger guarantee, but it needs
reserve/heartbeat/purge state, which adds complexity and overhead. Model
B is simpler and cheaper. The window it leaves open is small, and a
client only retries after a timeout, so racing first requests should be
rare in practice. Every design is a trade-off, and my view is that Model
B is the right one here.

It also helps to be clear about where duplicate-work protection really
comes from. It comes from the catalog itself, not from idempotency. The
catalog uses optimistic concurrency. If wo first attempts race, at most
one commit wins and the other gets a 409. Idempotency sits on top of that.
It does not replace it.

So what does Model B add over "the client just calls loadTable and
reconciles"? Two things that I think are real:

  1. The 422 check. loadTable can tell a client that a table exists. It
     cannot tell the client that the table THEY created with THIS key is
     the one that succeeded. The record binds the key to (principal,
     operation, resource). If the same key is reused for a different
     request, the server returns 422. The client cannot detect this on
     its own.

  2. One server-side behavior for all mutating ops. create-table happens
     to reconcile cleanly with loadTable. But the point of the
     Idempotency-Key header is that the client should not have to write
     reconciliation logic for every operation. For a known key, the
     server turns what would be a 409 into an equivalent 2xx replay. The
     client gets a clean success instead of an error it has to special-
     case.

There is a third, weaker benefit: once a record exists, retries stop
seeing flip-flopping results. But that only helps after a record exists,
which is exactly the window you pointed out is unprotected.

So I'll correct my earlier wording. This is not convergence on exactly-
once idempotency. It is a narrower guarantee: replay a recorded result,
plus detect key misuse. It sits on top of the catalog's existing
concurrency control. The real question for the list is simple: is that
narrower guarantee worth shipping on its own? Or do we need Model A's
in-flight protection to have a strong idempotency guarantee?

My view is that the narrow version is worth it for now: it's the
behavior the spec asks for, the 422 check can't be done client-side, and
it's a small change we can strengthen toward Model A later if a real use
case needs it. Happy to hear what others think.

Best,
Huaxin

On Fri, May 29, 2026 at 7:36 AM Robert Stupp <[email protected]> wrote:

> Hi Huaxin,
>
> Thanks for writing this up and moving the design discussion back to dev@.
>
> Since you’re asking before locking in the implementation, I think we should
> clarify one point.
>
> Model B is certainly simpler than the lease-based approach, but I’m not
> sure I fully understand what problem it still solves.
>
> As I read it, if a client times out while the original request is still
> running, a retry with the same key may not see an idempotency record yet
> and could run the handler again.
> So this feels less like preventing duplicate execution and more like
> remembering a successful result after the fact.
>
> For the create-table case, couldn’t a client achieve roughly the same
> recovery by calling loadTable after an ambiguous timeout and reconciling
> from there?
> Since Model B also rebuilds the response from current catalog state, I’m
> trying to understand what it gives us beyond that.
>
> I’m not against simplifying the design, but I think we should be clear
> about the narrower guarantee before calling this convergence.
>
> Best,
> Robert
>
>
> On Fri, May 29, 2026 at 12:29 AM huaxin gao <[email protected]>
> wrote:
>
> > Hi all,
> >
> > I've simplified the proposed design for Idempotency-Key support in
> Polaris
> > (Iceberg REST spec — retries with the same key must not produce
> additional
> > side effects), and I'd like a wider review before updating the
> > implementation PR (#4269 <https://github.com/apache/polaris/pull/4269>).
> >
> > What changed
> >
> >   - Before (Model A, lease-based): reserve an idempotency row before
> doing
> > work → IN_PROGRESS / heartbeat → finalize after.
> >   - After (Model B, optimistic commit): run the handler first → record
> only
> > after a successful (2xx) outcome. The record stores binding + status, not
> > the HTTP response body. Retries with the same key re-derive an equivalent
> > response from current catalog state
> >     instead of replaying a stored payload.
> >
> > The design doc still compares Model A and Model B side-by-side so the
> > trade-offs are explicit. So far the discussion has been leaning toward
> > Model B — mutating REST operations only, 2xx-only persistence, no
> > response-body storage, and the known
> > trade-offs (e.g. concurrent first-request races; see the NOTES section in
> > the doc).
> >
> > Does this direction look right before we lock in the implementation?
> >
> > Comments on the doc
> > <
> >
> https://docs.google.com/document/d/1hqTejVyYXDpL5MJcVc7NyhCslKaGH82QoqMEcUYPvkE/edit?tab=t.0
> > >
> > or replies on this thread both work.
> >
> > Thanks,
> > Huaxin
> >
>

Re: Subject: [DISCUSS] Idempotency-Key design for Iceberg REST: converging on Model B

Reply via email to