Hi Russell, Thanks for the information! It clarifies the use case a lot (at least for me :)
In short, I'd say the main benefit is allowing clients to avoid conflicts (409) on re-submitting changes that got committed by the server without the client receiving confirmation of the success. I believe the Iceberg REST Catalog spec [1] is formally stricter than Model B when it states "the server ensures no additional effects for requests that carry the same Idempotency-Key". Since Model B permits request re-execution, the possibility of additional side effects cannot be ruled out completely based on the proposed server-side algorithm alone. The server must assume that the client forms the (change) request in such a way that only one execution attempt can succeed (e.g. by using "update requirements"). This is also mentioned in comments on the doc [2]. This is probably worth mentioning in the Polaris docs related to our Idempotency-Key implementation. Assuming this kind of cooperation on the client side, I believe Model B can be considered compliant with the spec [1]. In anticipation of fresh implementation PRs for this feature, I'd like to re-emphasize (IIRC I mentioned this before) that, I think, we should avoid coupling Idempotency persistence with MetaStore persistence (both code-wise and transaction-wise). Model B processes Idempotency-related data outside the original change request's execution scope. Idempotency decisions are made either before the request starts executing or after it is committed to the MetaStore. [1] https://github.com/apache/polaris/blob/4e4eaf840bf71d431b13034b0dd6f338261d8e8b/spec/iceberg-rest-catalog-open-api.yaml#L2098 [2] https://docs.google.com/document/d/1hqTejVyYXDpL5MJcVc7NyhCslKaGH82QoqMEcUYPvkE/edit?tab=t.0 Cheers, Dmitri. On Fri, May 29, 2026 at 8:26 PM Russell Spitzer <[email protected]> wrote: > The problem with a client attempting to determine if it’s operations > succeeded via load table, and the reason all this work has proceeded, is > that there is no way for a client to guaranteed path to actually determine > if a commit occurred. There are too many legitimate mechanisms to erase > history from an Iceberg table to guarantee an operation occurred. > > For example, you could check if your snapshot exists in snapshot history > but this could have been erased by expire snapshots. > > Or you could check if the schema was modified according to your update, but > this too could have been undone by another operation. Client A adds column > but gets time out, Client B removes the Column, Client A retries and adds > the column again. > > Because of this the Iceberg client usually just bails out to he user with > an exception if it doesn’t get an actual confirmation that the commit > succeeded from the server. This leaves the “can I retry or not” as an > exercise to the end user. > > In practice, actual Iceberg users work around this sort of thing by adding > all sorts of custom metadata to hopefully persist history in the table > itself in some way that can’t be touched by expire snapshots, but this is > usually very fragile and also relies on all clients behaving well. I’ve > seen folks use custom table properties for example “batch-5: committed” > then manually have their own retry logic check whether this property is > set. Then, of course, they also have to add a bunch custom logic to make > sure they clean up this state as well. > > This is why Iceberg added the Idempotency path in the first place, it gives > us a guaranteed way for clients to retry in case of a network issue or > catalog issue with a guarantee they will not do duplicate work be retrying. > With this in place the client can now cleanly retry (within the idempotency > window) the same operation over and over without throwing an exception to > the end user. Only in a situation where the catalog cannot respond over a > very long time will the user actually have to do some sort of > reconciliation. You can look at the history of the Iceberg client’s retry > behavior with ambiguous server side or network errors to see how this has > been a problem in the past. > > On Fri, May 29, 2026 at 1:24 PM huaxin gao <[email protected]> wrote: > > > Hi Robert, > > > > Thanks for your reply! > > > > You're right that Model B does not prevent duplicate execution. The > > record is written only after success. So if a client times out while the > > first request is still running, a retry can run the handler again. There > > is no record yet to stop it. So Model B is "remember and replay a > > successful result," not "run exactly once." > > > > On the trade-off: Model A gives a stronger guarantee, but it needs > > reserve/heartbeat/purge state, which adds complexity and overhead. Model > > B is simpler and cheaper. The window it leaves open is small, and a > > client only retries after a timeout, so racing first requests should be > > rare in practice. Every design is a trade-off, and my view is that Model > > B is the right one here. > > > > It also helps to be clear about where duplicate-work protection really > > comes from. It comes from the catalog itself, not from idempotency. The > > catalog uses optimistic concurrency. If wo first attempts race, at most > > one commit wins and the other gets a 409. Idempotency sits on top of > that. > > It does not replace it. > > > > So what does Model B add over "the client just calls loadTable and > > reconciles"? Two things that I think are real: > > > > 1. The 422 check. loadTable can tell a client that a table exists. It > > cannot tell the client that the table THEY created with THIS key is > > the one that succeeded. The record binds the key to (principal, > > operation, resource). If the same key is reused for a different > > request, the server returns 422. The client cannot detect this on > > its own. > > > > 2. One server-side behavior for all mutating ops. create-table happens > > to reconcile cleanly with loadTable. But the point of the > > Idempotency-Key header is that the client should not have to write > > reconciliation logic for every operation. For a known key, the > > server turns what would be a 409 into an equivalent 2xx replay. The > > client gets a clean success instead of an error it has to special- > > case. > > > > There is a third, weaker benefit: once a record exists, retries stop > > seeing flip-flopping results. But that only helps after a record exists, > > which is exactly the window you pointed out is unprotected. > > > > So I'll correct my earlier wording. This is not convergence on exactly- > > once idempotency. It is a narrower guarantee: replay a recorded result, > > plus detect key misuse. It sits on top of the catalog's existing > > concurrency control. The real question for the list is simple: is that > > narrower guarantee worth shipping on its own? Or do we need Model A's > > in-flight protection to have a strong idempotency guarantee? > > > > My view is that the narrow version is worth it for now: it's the > > behavior the spec asks for, the 422 check can't be done client-side, and > > it's a small change we can strengthen toward Model A later if a real use > > case needs it. Happy to hear what others think. > > > > Best, > > Huaxin > > > > On Fri, May 29, 2026 at 7:36 AM Robert Stupp <[email protected]> wrote: > > > > > Hi Huaxin, > > > > > > Thanks for writing this up and moving the design discussion back to > dev@ > > . > > > > > > Since you’re asking before locking in the implementation, I think we > > should > > > clarify one point. > > > > > > Model B is certainly simpler than the lease-based approach, but I’m not > > > sure I fully understand what problem it still solves. > > > > > > As I read it, if a client times out while the original request is still > > > running, a retry with the same key may not see an idempotency record > yet > > > and could run the handler again. > > > So this feels less like preventing duplicate execution and more like > > > remembering a successful result after the fact. > > > > > > For the create-table case, couldn’t a client achieve roughly the same > > > recovery by calling loadTable after an ambiguous timeout and > reconciling > > > from there? > > > Since Model B also rebuilds the response from current catalog state, > I’m > > > trying to understand what it gives us beyond that. > > > > > > I’m not against simplifying the design, but I think we should be clear > > > about the narrower guarantee before calling this convergence. > > > > > > Best, > > > Robert > > > > > > > > > On Fri, May 29, 2026 at 12:29 AM huaxin gao <[email protected]> > > > wrote: > > > > > > > Hi all, > > > > > > > > I've simplified the proposed design for Idempotency-Key support in > > > Polaris > > > > (Iceberg REST spec — retries with the same key must not produce > > > additional > > > > side effects), and I'd like a wider review before updating the > > > > implementation PR (#4269 < > https://github.com/apache/polaris/pull/4269 > > >). > > > > > > > > What changed > > > > > > > > - Before (Model A, lease-based): reserve an idempotency row before > > > doing > > > > work → IN_PROGRESS / heartbeat → finalize after. > > > > - After (Model B, optimistic commit): run the handler first → > record > > > only > > > > after a successful (2xx) outcome. The record stores binding + status, > > not > > > > the HTTP response body. Retries with the same key re-derive an > > equivalent > > > > response from current catalog state > > > > instead of replaying a stored payload. > > > > > > > > The design doc still compares Model A and Model B side-by-side so the > > > > trade-offs are explicit. So far the discussion has been leaning > toward > > > > Model B — mutating REST operations only, 2xx-only persistence, no > > > > response-body storage, and the known > > > > trade-offs (e.g. concurrent first-request races; see the NOTES > section > > in > > > > the doc). > > > > > > > > Does this direction look right before we lock in the implementation? > > > > > > > > Comments on the doc > > > > < > > > > > > > > > > https://docs.google.com/document/d/1hqTejVyYXDpL5MJcVc7NyhCslKaGH82QoqMEcUYPvkE/edit?tab=t.0 > > > > > > > > > or replies on this thread both work. > > > > > > > > Thanks, > > > > Huaxin > > > > > > > > > >
