Hi Huaxin,

Thanks for writing this up and moving the design discussion back to dev@.

Since you’re asking before locking in the implementation, I think we should
clarify one point.

Model B is certainly simpler than the lease-based approach, but I’m not
sure I fully understand what problem it still solves.

As I read it, if a client times out while the original request is still
running, a retry with the same key may not see an idempotency record yet
and could run the handler again.
So this feels less like preventing duplicate execution and more like
remembering a successful result after the fact.

For the create-table case, couldn’t a client achieve roughly the same
recovery by calling loadTable after an ambiguous timeout and reconciling
from there?
Since Model B also rebuilds the response from current catalog state, I’m
trying to understand what it gives us beyond that.

I’m not against simplifying the design, but I think we should be clear
about the narrower guarantee before calling this convergence.

Best,
Robert


On Fri, May 29, 2026 at 12:29 AM huaxin gao <[email protected]> wrote:

> Hi all,
>
> I've simplified the proposed design for Idempotency-Key support in Polaris
> (Iceberg REST spec — retries with the same key must not produce additional
> side effects), and I'd like a wider review before updating the
> implementation PR (#4269 <https://github.com/apache/polaris/pull/4269>).
>
> What changed
>
>   - Before (Model A, lease-based): reserve an idempotency row before doing
> work → IN_PROGRESS / heartbeat → finalize after.
>   - After (Model B, optimistic commit): run the handler first → record only
> after a successful (2xx) outcome. The record stores binding + status, not
> the HTTP response body. Retries with the same key re-derive an equivalent
> response from current catalog state
>     instead of replaying a stored payload.
>
> The design doc still compares Model A and Model B side-by-side so the
> trade-offs are explicit. So far the discussion has been leaning toward
> Model B — mutating REST operations only, 2xx-only persistence, no
> response-body storage, and the known
> trade-offs (e.g. concurrent first-request races; see the NOTES section in
> the doc).
>
> Does this direction look right before we lock in the implementation?
>
> Comments on the doc
> <
> https://docs.google.com/document/d/1hqTejVyYXDpL5MJcVc7NyhCslKaGH82QoqMEcUYPvkE/edit?tab=t.0
> >
> or replies on this thread both work.
>
> Thanks,
> Huaxin
>

Reply via email to