Hi Huaxin, Thanks for writing this up and moving the design discussion back to dev@.
Since you’re asking before locking in the implementation, I think we should clarify one point. Model B is certainly simpler than the lease-based approach, but I’m not sure I fully understand what problem it still solves. As I read it, if a client times out while the original request is still running, a retry with the same key may not see an idempotency record yet and could run the handler again. So this feels less like preventing duplicate execution and more like remembering a successful result after the fact. For the create-table case, couldn’t a client achieve roughly the same recovery by calling loadTable after an ambiguous timeout and reconciling from there? Since Model B also rebuilds the response from current catalog state, I’m trying to understand what it gives us beyond that. I’m not against simplifying the design, but I think we should be clear about the narrower guarantee before calling this convergence. Best, Robert On Fri, May 29, 2026 at 12:29 AM huaxin gao <[email protected]> wrote: > Hi all, > > I've simplified the proposed design for Idempotency-Key support in Polaris > (Iceberg REST spec — retries with the same key must not produce additional > side effects), and I'd like a wider review before updating the > implementation PR (#4269 <https://github.com/apache/polaris/pull/4269>). > > What changed > > - Before (Model A, lease-based): reserve an idempotency row before doing > work → IN_PROGRESS / heartbeat → finalize after. > - After (Model B, optimistic commit): run the handler first → record only > after a successful (2xx) outcome. The record stores binding + status, not > the HTTP response body. Retries with the same key re-derive an equivalent > response from current catalog state > instead of replaying a stored payload. > > The design doc still compares Model A and Model B side-by-side so the > trade-offs are explicit. So far the discussion has been leaning toward > Model B — mutating REST operations only, 2xx-only persistence, no > response-body storage, and the known > trade-offs (e.g. concurrent first-request races; see the NOTES section in > the doc). > > Does this direction look right before we lock in the implementation? > > Comments on the doc > < > https://docs.google.com/document/d/1hqTejVyYXDpL5MJcVc7NyhCslKaGH82QoqMEcUYPvkE/edit?tab=t.0 > > > or replies on this thread both work. > > Thanks, > Huaxin >
