Re: Subject: [DISCUSS] Idempotency-Key design for Iceberg REST: converging on Model B

Dmitri Bourlatchkov Tue, 02 Jun 2026 11:13:03 -0700

Hi Huaxin,

Re: Question 1 (422 responses).


I do not actually see any mention of 422 responses WRT Idempotency-Key in
the IRC spec. Did I miss it?

So, if that behaviour is not strictly specified, Polaris can choose to
produce those response it in any way that makes sense for the
Polaris implementation, I think.

Re: Performance.

Yes, it is a valid concern. My previous suggestion for this approach was
not a strict requirement, but a point for discussion. Let's see if other
reviewers comment on this aspect too.

Thinking about the quick-changing table use case, I believe the more
frequent are the updates to a table the less retention period is required
by the client. If a client takes too long to re-try and other clients make
many in-between updates, that client is unlikely to commit successfully
anyway due to table metadata having evolved beyond its expectations.

All in all, in practice it should be possible to cap the size of
idempotency data, I think. This may be a bit at odds with the current IRC
spec language regarding idempotency key retention, but it might still work
effectively.

Also, UUID v7 has a time component. The server should be able to recognize
keys created outside the range of "recent" entries and flag those cases
(e.g. in log) for the Polaris Admin user to note and take corrective
actions.

As for me, I'm fine with a specialized persistence impl. for idempotency
keys too.

Cheers,
Dmitri.

On Tue, Jun 2, 2026 at 1:14 PM huaxin gao <[email protected]> wrote:

> Hi Dmitri,
>
> You're right: DELETE is a weak case for the idempotency key, so I'll
> drop that objection.
>
> Two reasons I agree. First, a client can handle DELETE failures on its
> own: reload the table, see that it's gone, and stop. Second, your
> identity point is the deeper one. The DELETE API uses the table name, not
> a physical table id. If a name is dropped and recreated, it points to a
> different table over time, so the key can't give a clean guarantee for
> DELETE anyway.
>
> So I'm fine scoping idempotency to operations that leave a surviving
> entity: create, commit/update, register, rename. For those, storing the
> key in the entity properties works, and the delete-storage problem goes
> away.
>
> That leaves two things I'd want to settle before we pick entity-property
> storage over a separate store:
>
> 1. Key reuse (422). With keys in entity properties there's no global
>    (realm, key) index. If a client reuses one key for a different
>    resource, we can't detect it and return 422. Are we okay treating that
>    as best-effort?
>
> 2. Performance on the hot path. I want to second Yufei's concern. The
>    number of keys per entity is roughly (write rate × retention window),
>    and the cost concentrates on the busiest tables. Since entity
>    properties are serialized as one blob and rewritten on each update,
>    every commit rewrites all the stored keys, not just the new one, so a
>    hot table pays growing write amplification on its commit path, plus
>    larger loads and a heavier cache footprint, even for plain table
>    loads. We can bound this with a tight retention cap, but that directly
>    shrinks the idempotency window, which is the part clients actually
>    rely on. A separate store keeps this off the hot path and lets
>    retention be tuned independently of entity size.
>
> So my main questions are: are we okay with best-effort 422, and how do we
> want to handle the hot-path cost? If both have good answers, I agree
> entity-property storage is the simpler choice. WDYT?
>
> Thanks,
> Huaxin
>
> On Tue, Jun 2, 2026 at 6:59 AM Dmitri Bourlatchkov <[email protected]>
> wrote:
>
> > Hi Huaxin,
> >
> > Good point about handling DELETE idempotently!
> >
> > However, I wonder whether it is a critical use case?
> >
> > Do you expect DELETE to benefit a lot from the Idempotency Key?
> >
> > I'd think it should be fairly straightforward for the client to reload
> the
> > table to be deleted in case of failures, discover that it is gone, and
> not
> > retry. WDYT?
> >
> > There's still the question of whether the client is deleting the table it
> > actually intends to delete. Another client could delete the current table
> > and create a new table under the same name while the first client is
> > "deliberating". The IRC API does not provide for unique table
> > identification in DELETE operations, as far as I know. The operation is
> > invoked simply on the name, which can map to different physical tables at
> > different times. Adding Idempotency Keys does not help in this context, I
> > think.
> >
> > Thanks,
> > Dmitri.
> >
> > On Mon, Jun 1, 2026 at 9:59 PM huaxin gao <[email protected]>
> wrote:
> >
> > > Hi Dmitri,
> > >
> > > I like the idea — the atomic key write closes the in-flight gap, and it
> > > avoids the Iceberg metadata and spec issues. Agreed too that losing
> keys
> > > on already-deleted entities is harmless.
> > >
> > > But I think the harder case is delete operations themselves. For drop
> > > table/view/namespace, the operation removes the entity, so there is no
> > > surviving entity to hold the key. A retry of a successful drop should
> > > return an equivalent success, but with entity-property storage the key
> > > has nowhere to live — so the retry would just see "not found" and
> behave
> > > differently. Where would a drop's key live in this model?
> > >
> > > Thanks,
> > > Huaxin
> > >
> > > On Mon, Jun 1, 2026 at 6:13 PM Yufei Gu <[email protected]> wrote:
> > >
> > > > One concern I have with storing idempotency records as entity
> > properties
> > > is
> > > > the potential performance impact. Over time, an entity could have a
> > large
> > > > number of idempotency key/value pairs. That would increase the
> entity's
> > > > size, which may affect load, update, serialization, and caching costs
> > for
> > > > normal catalog operations, even when idempotency is not involved. Use
> > > cases
> > > > such as table loading and entity in-memory caching could be affected.
> > > > Before moving in that direction, I think it would be useful to better
> > > > understand and measure the performance implications. If the entity
> size
> > > > growth turns out to be negligible in practice, the approach may still
> > be
> > > > attractive because of its transactional simplicity.
> > > >
> > > > Yufei
> > > >
> > > >
> > > > On Mon, Jun 1, 2026 at 2:17 PM Dmitri Bourlatchkov <[email protected]
> >
> > > > wrote:
> > > >
> > > > > Hi Huaxin,
> > > > >
> > > > > How about storing idempotency keys in the Polaris Entity properties
> > > (not
> > > > > Iceberg metadata)?
> > > > >
> > > > > I understand that entities can be deleted thus discarding
> previously
> > > > > recorded keys, but based on the use cases discussed so far, it does
> > not
> > > > > look like deleted entities should be a functional concern.
> > > > >
> > > > > Storing idempotency keys inside the entity will ensure that their
> > > updates
> > > > > are processed in the same logical change set as the entity changes
> > from
> > > > the
> > > > > IRC request payload.
> > > > >
> > > > > This will ensure uniform operations across all Persistence
> > > > implementations
> > > > > and will not require any Idempotency-specific Persistence changes.
> > > > >
> > > > > WDYT?
> > > > >
> > > > > Thanks,
> > > > > Dmitri.
> > > > >
> > > > > On Sun, May 31, 2026 at 2:35 PM huaxin gao <[email protected]
> >
> > > > wrote:
> > > > >
> > > > > > Hi Dmitri, Robert,
> > > > > >
> > > > > > Thanks both.
> > > > > >
> > > > > > Dmitri — I agree with both of your points.
> > > > > >
> > > > > >   - Idempotency storage will stay separate from the metastore. It
> > > will
> > > > > >     be separate in code and in transactions. We make the
> > idempotency
> > > > > >     decision before the handler runs, or after it commits — never
> > > > inside
> > > > > >     the metastore transaction.
> > > > > >   - I'll document the assumption you raised. Model B is only as
> > > strict
> > > > as
> > > > > >     the spec wants if the client builds the request so that at
> most
> > > one
> > > > > >     try can commit (for example, update requirements). The
> > catalog's
> > > > > >     optimistic concurrency makes sure of this. Model B just
> records
> > > the
> > > > > >     result on top of it. I'll say this clearly in the Polaris
> docs.
> > > > > >
> > > > > > Robert — I see why the operation-id-in-metadata idea is
> appealing.
> > If
> > > > we
> > > > > > write the id inside the commit, it is atomic with the change.
> That
> > > > would
> > > > > > close the in-flight gap for table and view operations. That is a
> > real
> > > > > > plus.
> > > > > >
> > > > > > But I don't think we should put the idempotency key in table
> > > metadata.
> > > > > > Here is why:
> > > > > >
> > > > > > 1. It only works for table and view operations. It can't help
> > > namespace
> > > > > >    operations, grants, or other writes. A separate store handles
> > all
> > > of
> > > > > >    them with one mechanism.
> > > > > >
> > > > > > 2. It mixes two concerns. Idempotency is a REST/catalog concern.
> > > Table
> > > > > >    metadata should describe the table — schema, snapshots,
> > > > partitioning,
> > > > > >    sort order. A per-request id is not table state. I'd rather
> not
> > > mix
> > > > > >    the two.
> > > > > >
> > > > > > 3. It bloats the metadata. To support retries we'd have to keep
> > > > > >    operation-ids with some retention/TTL. metadata.json is
> > rewritten
> > > on
> > > > > >    every commit and read on every table load. For tables with
> many
> > > > > >    writes, this adds real cost. And every client and engine that
> > > reads
> > > > > >    the table pays it, not just the idempotency path.
> > > > > >
> > > > > > 4. It doesn't match the spec. The Iceberg REST spec defines
> > > idempotency
> > > > > >    at the protocol layer — an Idempotency-Key header with a
> > > server-side
> > > > > >    contract. It does not store idempotency in table metadata.
> > Putting
> > > > an
> > > > > >    operation-id there would be a new mechanism that isn't in the
> > spec
> > > > > >    today. So it's a change to how the spec
> > > > > >    works, and a cross-project change too.
> > > > > >
> > > > > > So I'd prefer to keep the record in a separate idempotency store.
> > We
> > > > > > accept the in-flight gap, but it is bounded. The catalog's
> > optimistic
> > > > > > concurrency stops a duplicate commit from landing. And once a
> > record
> > > > > > exists, retries replay cleanly.
> > > > > >
> > > > > > Thanks,
> > > > > > Huaxin
> > > > > >
> > > > > > On Sat, May 30, 2026 at 3:15 AM Robert Stupp <[email protected]>
> > wrote:
> > > > > >
> > > > > > > Hi all,
> > > > > > >
> > > > > > > Thanks for the clarifications. Russell's explanation is
> > especially
> > > > > > useful.
> > > > > > > I agree, ambiguous request outcomes, for example, timeouts or
> > > network
> > > > > > > connections being reset, are hard to reason about.
> > > > > > >
> > > > > > > Clients often cannot reliably reconcile from the current state
> > > alone
> > > > > for
> > > > > > > table/view state mutating operations.
> > > > > > >
> > > > > > > I wonder whether the idempotency key should be recorded in the
> > > > > table/view
> > > > > > > metadata as an "operation-id", with an explicit retention
> > > guarantee,
> > > > > > maybe
> > > > > > > tied to a server-provided minimum TTL.
> > > > > > > The approach could reduce or change the role of a separate
> > > > > > > idempotency-record table and handling of it.
> > > > > > >
> > > > > > > Request handling could roughly look like this:
> > > > > > >   if the current history/metadata already contains that
> > > > "operation-id",
> > > > > > >     return equivalent-enough response without re-running the
> > > > operation.
> > > > > > >
> > > > > > >   try the committing operation:
> > > > > > >   if the commit succeeds:
> > > > > > >     record the "operation-id" in the table/view metadata, and
> > > > > > >     return the successful response.
> > > > > > >   if the commit runs into a conflict:
> > > > > > >     re-check whether the current metadata/history contains that
> > > > > > > "operation-id"
> > > > > > >     if so:
> > > > > > >       return equivalent-enough response.
> > > > > > >     otherwise:
> > > > > > >       return the conflict response.
> > > > > > >
> > > > > > > This is not perfect either and needs spec work, retention
> rules,
> > > and
> > > > > may
> > > > > > > only work for table and view operations.
> > > > > > >
> > > > > > > I mostly want to separate the questions:
> > > > > > > 1. What guarantees do clients actually need after an ambiguous
> > > > outcome?
> > > > > > > 2. Where should the durable evidence for the guarantee live?
> > > > > > >
> > > > > > > Robert
> > > > > > >
> > > > > > > On Sat, May 30, 2026 at 4:30 AM Dmitri Bourlatchkov <
> > > > [email protected]>
> > > > > > > wrote:
> > > > > > >
> > > > > > > > Hi Russell,
> > > > > > > >
> > > > > > > > Thanks for the information! It clarifies the use case a lot
> (at
> > > > least
> > > > > > for
> > > > > > > > me :)
> > > > > > > >
> > > > > > > > In short, I'd say the main benefit is allowing clients to
> avoid
> > > > > > conflicts
> > > > > > > > (409) on re-submitting changes that got committed by the
> server
> > > > > without
> > > > > > > the
> > > > > > > > client receiving confirmation of the success.
> > > > > > > >
> > > > > > > > I believe the Iceberg REST Catalog spec [1] is formally
> > stricter
> > > > than
> > > > > > > Model
> > > > > > > > B when it states "the server ensures no additional effects
> for
> > > > > requests
> > > > > > > > that carry the same Idempotency-Key". Since Model B permits
> > > request
> > > > > > > > re-execution, the possibility of additional side effects
> cannot
> > > be
> > > > > > ruled
> > > > > > > > out completely based on the proposed server-side algorithm
> > alone.
> > > > The
> > > > > > > > server must assume that the client forms the (change) request
> > in
> > > > > such a
> > > > > > > way
> > > > > > > > that only one execution attempt can succeed (e.g. by using
> > > "update
> > > > > > > > requirements"). This is also mentioned in  comments on the
> doc
> > > [2].
> > > > > > > >
> > > > > > > > This is probably worth mentioning in the Polaris docs related
> > to
> > > > > > > > our Idempotency-Key implementation.
> > > > > > > >
> > > > > > > > Assuming this kind of cooperation on the client side, I
> believe
> > > > > Model B
> > > > > > > can
> > > > > > > > be considered compliant with the spec [1].
> > > > > > > >
> > > > > > > > In anticipation of fresh implementation PRs for this feature,
> > I'd
> > > > > like
> > > > > > to
> > > > > > > > re-emphasize (IIRC I mentioned this before) that, I think, we
> > > > should
> > > > > > > avoid
> > > > > > > > coupling Idempotency persistence with MetaStore persistence
> > (both
> > > > > > > code-wise
> > > > > > > > and transaction-wise). Model B processes Idempotency-related
> > data
> > > > > > outside
> > > > > > > > the original change request's execution scope. Idempotency
> > > > decisions
> > > > > > are
> > > > > > > > made either before the request starts executing or after it
> is
> > > > > > committed
> > > > > > > to
> > > > > > > > the MetaStore.
> > > > > > > >
> > > > > > > > [1]
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/apache/polaris/blob/4e4eaf840bf71d431b13034b0dd6f338261d8e8b/spec/iceberg-rest-catalog-open-api.yaml#L2098
> > > > > > > >
> > > > > > > > [2]
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://docs.google.com/document/d/1hqTejVyYXDpL5MJcVc7NyhCslKaGH82QoqMEcUYPvkE/edit?tab=t.0
> > > > > > > >
> > > > > > > > Cheers,
> > > > > > > > Dmitri.
> > > > > > > >
> > > > > > > > On Fri, May 29, 2026 at 8:26 PM Russell Spitzer <
> > > > > > > [email protected]
> > > > > > > > >
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > > > The problem with a client attempting to determine if it’s
> > > > > operations
> > > > > > > > > succeeded via  load table, and the reason all this work has
> > > > > > proceeded,
> > > > > > > is
> > > > > > > > > that there is no way for a client to guaranteed path to
> > > actually
> > > > > > > > determine
> > > > > > > > > if a commit occurred. There are too many legitimate
> > mechanisms
> > > to
> > > > > > erase
> > > > > > > > > history from an Iceberg table to guarantee an operation
> > > occurred.
> > > > > > > > >
> > > > > > > > > For example, you could check if your snapshot exists in
> > > snapshot
> > > > > > > history
> > > > > > > > > but this could have been erased by expire snapshots.
> > > > > > > > >
> > > > > > > > > Or you could check if the schema was modified according to
> > your
> > > > > > update,
> > > > > > > > but
> > > > > > > > > this too could have been undone by another operation.
> Client
> > A
> > > > adds
> > > > > > > > column
> > > > > > > > > but gets time out, Client B removes the Column, Client A
> > > retries
> > > > > and
> > > > > > > adds
> > > > > > > > > the column again.
> > > > > > > > >
> > > > > > > > > Because of this the Iceberg client usually just bails out
> to
> > he
> > > > > user
> > > > > > > with
> > > > > > > > > an exception if it doesn’t get an actual confirmation that
> > the
> > > > > commit
> > > > > > > > > succeeded from the server. This leaves the “can I retry or
> > not”
> > > > as
> > > > > an
> > > > > > > > > exercise to the end user.
> > > > > > > > >
> > > > > > > > > In practice, actual Iceberg users work around this sort of
> > > thing
> > > > by
> > > > > > > > adding
> > > > > > > > > all sorts of custom metadata to hopefully persist history
> in
> > > the
> > > > > > table
> > > > > > > > > itself in some way that can’t be touched by expire
> snapshots,
> > > but
> > > > > > this
> > > > > > > is
> > > > > > > > > usually very fragile and also relies on all clients
> behaving
> > > > well.
> > > > > > I’ve
> > > > > > > > > seen folks use custom table properties for example
> “batch-5:
> > > > > > committed”
> > > > > > > > > then manually have their own retry logic check whether this
> > > > > property
> > > > > > is
> > > > > > > > > set. Then, of course, they also have to add a bunch custom
> > > logic
> > > > to
> > > > > > > make
> > > > > > > > > sure they clean up this state as well.
> > > > > > > > >
> > > > > > > > > This is why Iceberg added the Idempotency path in the first
> > > > place,
> > > > > it
> > > > > > > > gives
> > > > > > > > > us a guaranteed way for clients to retry in case of a
> network
> > > > issue
> > > > > > or
> > > > > > > > > catalog issue with a guarantee they will not do duplicate
> > work
> > > be
> > > > > > > > retrying.
> > > > > > > > > With this in place the client can now cleanly retry (within
> > the
> > > > > > > > idempotency
> > > > > > > > > window) the same operation over and over without throwing
> an
> > > > > > exception
> > > > > > > to
> > > > > > > > > the end user. Only in a situation where the catalog cannot
> > > > respond
> > > > > > > over a
> > > > > > > > > very long time will the user actually have to do some sort
> of
> > > > > > > > > reconciliation. You can look at the history of the Iceberg
> > > > client’s
> > > > > > > retry
> > > > > > > > > behavior with ambiguous server side or network errors to
> see
> > > how
> > > > > this
> > > > > > > has
> > > > > > > > > been a problem in the past.
> > > > > > > > >
> > > > > > > > > On Fri, May 29, 2026 at 1:24 PM huaxin gao <
> > > > [email protected]
> > > > > >
> > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > Hi Robert,
> > > > > > > > > >
> > > > > > > > > > Thanks for your reply!
> > > > > > > > > >
> > > > > > > > > > You're right that Model B does not prevent duplicate
> > > execution.
> > > > > The
> > > > > > > > > > record is written only after success. So if a client
> times
> > > out
> > > > > > while
> > > > > > > > the
> > > > > > > > > > first request is still running, a retry can run the
> handler
> > > > > again.
> > > > > > > > There
> > > > > > > > > > is no record yet to stop it. So Model B is "remember and
> > > > replay a
> > > > > > > > > > successful result," not "run exactly once."
> > > > > > > > > >
> > > > > > > > > > On the trade-off: Model A gives a stronger guarantee, but
> > it
> > > > > needs
> > > > > > > > > > reserve/heartbeat/purge state, which adds complexity and
> > > > > overhead.
> > > > > > > > Model
> > > > > > > > > > B is simpler and cheaper. The window it leaves open is
> > small,
> > > > > and a
> > > > > > > > > > client only retries after a timeout, so racing first
> > requests
> > > > > > should
> > > > > > > be
> > > > > > > > > > rare in practice. Every design is a trade-off, and my
> view
> > is
> > > > > that
> > > > > > > > Model
> > > > > > > > > > B is the right one here.
> > > > > > > > > >
> > > > > > > > > > It also helps to be clear about where duplicate-work
> > > protection
> > > > > > > really
> > > > > > > > > > comes from. It comes from the catalog itself, not from
> > > > > idempotency.
> > > > > > > The
> > > > > > > > > > catalog uses optimistic concurrency. If wo first attempts
> > > race,
> > > > > at
> > > > > > > most
> > > > > > > > > > one commit wins and the other gets a 409. Idempotency
> sits
> > on
> > > > top
> > > > > > of
> > > > > > > > > that.
> > > > > > > > > > It does not replace it.
> > > > > > > > > >
> > > > > > > > > > So what does Model B add over "the client just calls
> > > loadTable
> > > > > and
> > > > > > > > > > reconciles"? Two things that I think are real:
> > > > > > > > > >
> > > > > > > > > >   1. The 422 check. loadTable can tell a client that a
> > table
> > > > > > exists.
> > > > > > > It
> > > > > > > > > >      cannot tell the client that the table THEY created
> > with
> > > > THIS
> > > > > > key
> > > > > > > > is
> > > > > > > > > >      the one that succeeded. The record binds the key to
> > > > > > (principal,
> > > > > > > > > >      operation, resource). If the same key is reused for
> a
> > > > > > different
> > > > > > > > > >      request, the server returns 422. The client cannot
> > > detect
> > > > > this
> > > > > > > on
> > > > > > > > > >      its own.
> > > > > > > > > >
> > > > > > > > > >   2. One server-side behavior for all mutating ops.
> > > > create-table
> > > > > > > > happens
> > > > > > > > > >      to reconcile cleanly with loadTable. But the point
> of
> > > the
> > > > > > > > > >      Idempotency-Key header is that the client should not
> > > have
> > > > to
> > > > > > > write
> > > > > > > > > >      reconciliation logic for every operation. For a
> known
> > > key,
> > > > > the
> > > > > > > > > >      server turns what would be a 409 into an equivalent
> > 2xx
> > > > > > replay.
> > > > > > > > The
> > > > > > > > > >      client gets a clean success instead of an error it
> has
> > > to
> > > > > > > special-
> > > > > > > > > >      case.
> > > > > > > > > >
> > > > > > > > > > There is a third, weaker benefit: once a record exists,
> > > retries
> > > > > > stop
> > > > > > > > > > seeing flip-flopping results. But that only helps after a
> > > > record
> > > > > > > > exists,
> > > > > > > > > > which is exactly the window you pointed out is
> unprotected.
> > > > > > > > > >
> > > > > > > > > > So I'll correct my earlier wording. This is not
> convergence
> > > on
> > > > > > > exactly-
> > > > > > > > > > once idempotency. It is a narrower guarantee: replay a
> > > recorded
> > > > > > > result,
> > > > > > > > > > plus detect key misuse. It sits on top of the catalog's
> > > > existing
> > > > > > > > > > concurrency control. The real question for the list is
> > > simple:
> > > > is
> > > > > > > that
> > > > > > > > > > narrower guarantee worth shipping on its own? Or do we
> need
> > > > Model
> > > > > > A's
> > > > > > > > > > in-flight protection to have a strong idempotency
> > guarantee?
> > > > > > > > > >
> > > > > > > > > > My view is that the narrow version is worth it for now:
> > it's
> > > > the
> > > > > > > > > > behavior the spec asks for, the 422 check can't be done
> > > > > > client-side,
> > > > > > > > and
> > > > > > > > > > it's a small change we can strengthen toward Model A
> later
> > > if a
> > > > > > real
> > > > > > > > use
> > > > > > > > > > case needs it. Happy to hear what others think.
> > > > > > > > > >
> > > > > > > > > > Best,
> > > > > > > > > > Huaxin
> > > > > > > > > >
> > > > > > > > > > On Fri, May 29, 2026 at 7:36 AM Robert Stupp <
> > [email protected]
> > > >
> > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > Hi Huaxin,
> > > > > > > > > > >
> > > > > > > > > > > Thanks for writing this up and moving the design
> > discussion
> > > > > back
> > > > > > to
> > > > > > > > > dev@
> > > > > > > > > > .
> > > > > > > > > > >
> > > > > > > > > > > Since you’re asking before locking in the
> > implementation, I
> > > > > think
> > > > > > > we
> > > > > > > > > > should
> > > > > > > > > > > clarify one point.
> > > > > > > > > > >
> > > > > > > > > > > Model B is certainly simpler than the lease-based
> > approach,
> > > > but
> > > > > > I’m
> > > > > > > > not
> > > > > > > > > > > sure I fully understand what problem it still solves.
> > > > > > > > > > >
> > > > > > > > > > > As I read it, if a client times out while the original
> > > > request
> > > > > is
> > > > > > > > still
> > > > > > > > > > > running, a retry with the same key may not see an
> > > idempotency
> > > > > > > record
> > > > > > > > > yet
> > > > > > > > > > > and could run the handler again.
> > > > > > > > > > > So this feels less like preventing duplicate execution
> > and
> > > > more
> > > > > > > like
> > > > > > > > > > > remembering a successful result after the fact.
> > > > > > > > > > >
> > > > > > > > > > > For the create-table case, couldn’t a client achieve
> > > roughly
> > > > > the
> > > > > > > same
> > > > > > > > > > > recovery by calling loadTable after an ambiguous
> timeout
> > > and
> > > > > > > > > reconciling
> > > > > > > > > > > from there?
> > > > > > > > > > > Since Model B also rebuilds the response from current
> > > catalog
> > > > > > > state,
> > > > > > > > > I’m
> > > > > > > > > > > trying to understand what it gives us beyond that.
> > > > > > > > > > >
> > > > > > > > > > > I’m not against simplifying the design, but I think we
> > > should
> > > > > be
> > > > > > > > clear
> > > > > > > > > > > about the narrower guarantee before calling this
> > > convergence.
> > > > > > > > > > >
> > > > > > > > > > > Best,
> > > > > > > > > > > Robert
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > On Fri, May 29, 2026 at 12:29 AM huaxin gao <
> > > > > > > [email protected]>
> > > > > > > > > > > wrote:
> > > > > > > > > > >
> > > > > > > > > > > > Hi all,
> > > > > > > > > > > >
> > > > > > > > > > > > I've simplified the proposed design for
> Idempotency-Key
> > > > > support
> > > > > > > in
> > > > > > > > > > > Polaris
> > > > > > > > > > > > (Iceberg REST spec — retries with the same key must
> not
> > > > > produce
> > > > > > > > > > > additional
> > > > > > > > > > > > side effects), and I'd like a wider review before
> > > updating
> > > > > the
> > > > > > > > > > > > implementation PR (#4269 <
> > > > > > > > > https://github.com/apache/polaris/pull/4269
> > > > > > > > > > >).
> > > > > > > > > > > >
> > > > > > > > > > > > What changed
> > > > > > > > > > > >
> > > > > > > > > > > >   - Before (Model A, lease-based): reserve an
> > idempotency
> > > > row
> > > > > > > > before
> > > > > > > > > > > doing
> > > > > > > > > > > > work → IN_PROGRESS / heartbeat → finalize after.
> > > > > > > > > > > >   - After (Model B, optimistic commit): run the
> handler
> > > > > first →
> > > > > > > > > record
> > > > > > > > > > > only
> > > > > > > > > > > > after a successful (2xx) outcome. The record stores
> > > > binding +
> > > > > > > > status,
> > > > > > > > > > not
> > > > > > > > > > > > the HTTP response body. Retries with the same key
> > > re-derive
> > > > > an
> > > > > > > > > > equivalent
> > > > > > > > > > > > response from current catalog state
> > > > > > > > > > > >     instead of replaying a stored payload.
> > > > > > > > > > > >
> > > > > > > > > > > > The design doc still compares Model A and Model B
> > > > > side-by-side
> > > > > > so
> > > > > > > > the
> > > > > > > > > > > > trade-offs are explicit. So far the discussion has
> been
> > > > > leaning
> > > > > > > > > toward
> > > > > > > > > > > > Model B — mutating REST operations only, 2xx-only
> > > > > persistence,
> > > > > > no
> > > > > > > > > > > > response-body storage, and the known
> > > > > > > > > > > > trade-offs (e.g. concurrent first-request races; see
> > the
> > > > > NOTES
> > > > > > > > > section
> > > > > > > > > > in
> > > > > > > > > > > > the doc).
> > > > > > > > > > > >
> > > > > > > > > > > > Does this direction look right before we lock in the
> > > > > > > > implementation?
> > > > > > > > > > > >
> > > > > > > > > > > > Comments on the doc
> > > > > > > > > > > > <
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://docs.google.com/document/d/1hqTejVyYXDpL5MJcVc7NyhCslKaGH82QoqMEcUYPvkE/edit?tab=t.0
> > > > > > > > > > > > >
> > > > > > > > > > > > or replies on this thread both work.
> > > > > > > > > > > >
> > > > > > > > > > > > Thanks,
> > > > > > > > > > > > Huaxin
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: Subject: [DISCUSS] Idempotency-Key design for Iceberg REST: converging on Model B

Reply via email to