Re: [DISCUSS] Iceberg REST Catalog Idempotency

Steven Wu Wed, 29 Oct 2025 20:05:17 -0700

> When the server decides to expire a key, will it rely on the
server-receive timestamp, the client request timestamp, or the key creation
timestamp?


Yun has a good point here. Current spec PR for "idempotency-key-lifetime"
should be "server-receive timestamp".

> Now, let's say a client is non-conformant and is reusing the same key or
is not applying correctly the lifetime directive, but send the same request
to the server which has expired it (so it's a new request for the server),
it could have the potential to cause some kind of corruption then, while at
the same time adding a time component to the key would have prevented the
issue?

If we really want to protect against this scenario, should a separate
timestamp field be used? This question has been raised by Dennis and Ryan
in the community sync, and Yun above in this thread.



On Wed, Oct 29, 2025 at 7:19 PM yun zou <[email protected]> wrote:

> Hi All,
>
> It sounds like the idea is to use the timestamp component to help
> determine whether a key should expire. However, I’m not clear on how
> exactly the timestamp would be used for this purpose. When the server
> decides to expire a key, will it rely on the server-receive timestamp,
> the client request timestamp, or the key creation timestamp?
>
> If the timestamp component within the key is intended to represent the
> key creation time, that raises a couple of concerns:
> 1. A key could be created well before it’s actually used.
> 2. Clock skew between the client and server could lead to inconsistent
> expiration behavior.
>
> If the server is responsible for managing the key lifecycle, it’s
> generally more robust and consistent to rely on the server clock for
> expiration decisions rather than client-provided timestamps.
>
> Additionally, if the timestamp is an important piece of information,
> it might be cleaner to make it an explicit field instead of
> overloading the key itself with multiple purposes. Having a separate,
> well-defined field would make the specification clearer and easier to
> maintain.
>
> From the client’s perspective, requiring the use of UUIDv7 introduces
> unnecessary constraints on implementation. That said, clients are free
> to adopt UUIDv7 if they prefer. Since the server ultimately manages
> expiration, it’s generally better to keep the client logic simple and
> decoupled from server-side decisions.
>
> Best Regards,
> Yun
>
> On Wed, Oct 29, 2025 at 12:50 PM Dmitri Bourlatchkov <[email protected]>
> wrote:
> >
> > Hi All,
> >
> > From my POV (and I may be repeating what I put in GH comments), the main
> point in using UUID v7 is specifying that a timestamp should be part of the
> idempotency key. As previously discussed, having this timestamp is
> beneficial to server implementations.
> >
> > The IETF Idempotency Key draft v7 [1] allows servers to require specific
> ID generation algorithms.
> >
> > We could have a custom ID format, but UUID v7 is already defined and
> fits this use case.
> >
> > If for some reason UUID v7 becomes "weak" in the future, such an event
> will have a much greater impact than the REST Catalog API. In any case, if
> that happens, nothing prevents revisioning the REST API spec to allow for
> stronger ID generators.
> >
> > [1]
> https://datatracker.ietf.org/doc/html/draft-ietf-httpapi-idempotency-key-header-07#name-client
> >
> > Cheers,
> > Dmitri.
> >
> > On Mon, Oct 27, 2025 at 2:33 PM Yufei Gu <[email protected]> wrote:
> >>
> >> +1 on option 2: don’t mandate a specific key format.
> >>
> >> Concerns with option 1 (UUIDv7-mandatory):
> >> 1. Overspecification risk. If UUIDv7 shows weaknesses later, we’re
> stuck with a brittle contract.
> >> 2. Unnecessary constraints. It binds both client and server
> implementations. One of IRC’s goals is to simplify client work; forcing
> UUIDv7 limits client choices for marginal gain (the embedded timestamp).
> >>
> >> Here are existing implementations for reference:
> >>
> >> Stripe[1]: recommends UUIDv4 but does not enforce a format for
> idempotency keys.
> >> AWS EC2[2]: accepts any unique, case-sensitive string up to 64 ASCII
> characters for the client token.
> >>
> >> I'd propose to treat the idempotency key as an opaque string with basic
> requirements and guidance(e.g., “unique string values; UUIDv4 or v7 are
> fine”) but avoid making the format mandatory. This keeps the API
> future-proof and client-friendly while preserving server-side flexibility.
> >>
> >> 1. https://docs.stripe.com/api/expanding_objects
> >> 2.
> https://docs.aws.amazon.com/ec2/latest/devguide/ec2-api-idempotency.html
> >>
> >> Yufei
> >>
> >>
> >> On Mon, Oct 27, 2025 at 9:53 AM huaxin gao <[email protected]>
> wrote:
> >>>
> >>> Hi Yun,
> >>> Thanks for the thoughtful feedback!
> >>>
> >>> Yes, the key itself is expected to be globally unique. You’re also
> right that we don’t need to mandate UUIDs to achieve that; other schemes
> can provide global uniqueness.
> >>>
> >>> I have chosen UUID because several folks in the community prefer it as
> a common, interoperable choice. That said, I agree that mandating UUIDv7
> adds constraints on clients without clear spec-level benefit.
> >>>
> >>> I also agree we should separate spec from implementation; details like
> the key generation method can live in implementation guidance.
> >>>
> >>> From your note, it sounds like you support Option 2
> (version-agnostic)—i.e., require a “globally unique idempotency key” and
> accept any RFC 9562 UUID (with v7 as a non-normative recommendation), while
> leaving timestamp/expiry mechanics to the server-side doc. I’ll count this
> as a +1 for Option 2.
> >>>
> >>> Thanks,
> >>>
> >>> Huaxin
> >>>
> >>>
> >>> On Fri, Oct 24, 2025 at 7:00 PM yun zou <[email protected]>
> wrote:
> >>>>
> >>>> Sorry, I accidentally sent the email before complete, please ignore my
> >>>> previous email. Sorry for the noise and inconvenience.
> >>>>
> >>>> Hi Huaxin,
> >>>>
> >>>> This is a really interesting and valuable proposal — it provides a
> >>>> great way to address the issue of duplicate client requests. Thank you
> >>>> for proposing and driving this forward!
> >>>>
> >>>> One point that isn’t entirely clear to me is how the server uniquely
> >>>> identifies each request.  Are we relying solely on the idempotency-key
> >>>> being globally unique, or is there an additional identifier such as
> >>>> clientId + idempotency-key? Based on the current discussion, it sounds
> >>>> like the proposal expects the key itself to be globally unique, likely
> >>>> through the use of a UUID, but I’d like to double-check my
> >>>> understanding.
> >>>>
> >>>> If we are indeed relying on the client to generate a globally unique
> >>>> ID, that approach makes sense. However, it doesn’t seem necessary to
> >>>> mandate the use of UUIDs, as there are other valid methods for
> >>>> achieving global uniqueness. Imposing a further restriction to UUIDv7
> >>>> would place additional constraints on the client implementation.
> >>>>
> >>>> From a specification perspective, I think it would be better to
> >>>> separate the spec from the implementation. In other words, we should
> >>>> make it clear that the key must be globally unique, but we don’t need
> >>>> to specify that it must be a UUID or UUIDv7.
> >>>>
> >>>> Best Regards,
> >>>> Yun
> >>>>
> >>>> On Fri, Oct 24, 2025 at 4:41 PM huaxin gao <[email protected]>
> wrote:
> >>>> >
> >>>> > Hi all,
> >>>> >
> >>>> > Thank you for taking the time to review my proposal and PR—I really
> appreciate the input.
> >>>> >
> >>>> > There’s one remaining issue I’d like to settle. In the Iceberg
> Catalog Community sync, many preferred mandating UUIDv7 for the idempotency
> key. At the same time, there are some concerns:
> >>>> >
> >>>> > If we need a timestamp, it should be a separate field; we shouldn’t
> use the UUIDv7 timestamp.
> >>>> >
> >>>> > If we use the UUID timestamp for expiry, we’d have to require keys
> to be generated at request time, which feels over-engineered.
> >>>> >
> >>>> > If we want to use the UUIDv7 timestamp, it should be for debugging
> only.
> >>>> >
> >>>> > Based on that, here’s a draft update to the spec:
> >>>> >
> >>>> > Key Requirements:
> >>>> > - Key format: UUIDv7 in string format as defined in RFC 9562.
> >>>> >   See
> https://datatracker.ietf.org/doc/html/rfc9562#name-example-of-a-uuidv7-value
> .
> >>>> > - The idempotency key must be globally unique (no reuse across
> different operations).
> >>>> > - Catalogs SHOULD NOT expire keys before the end of the advertised
> token lifetime.
> >>>> > - If Idempotency-Key is used, clients MUST reuse the same key when
> retrying the same
> >>>> >   logical operation and MUST generate a new key for a different
> operation.
> >>>> > - Server behavior: Servers MUST validate the syntactic validity of
> UUIDv7 (per RFC 9562).
> >>>> >   Servers MUST NOT make behavioral decisions based on the UUID’s
> internal timestamp fields.
> >>>> >   The idempotency key is an opaque, unique identifier used only for
> lookup/deduplication.
> >>>> >
> >>>> > This reads a bit awkward to me: we mandate UUIDv7 but prohibit
> using its timestamp, which seems to undercut the reason to require v7 in
> the first place.
> >>>> >
> >>>> > I’d appreciate feedback on whether we should:
> >>>> >
> >>>> > Option 1 — Require v7.
> >>>> > Keep UUIDv7 required, with the server restrictions above (syntactic
> v7 validation only; no behavioral decisions based on the embedded
> timestamp).
> >>>> >
> >>>> > Option 2 — Version-agnostic.
> >>>> > Make the client spec version-agnostic (require RFC 9562 UUID
> textual form; allow v7 as a recommendation). Leave any timestamp/lifetime
> mechanics to a server-side (Polaris idempotency) document.
> >>>> >
> >>>> > Thanks again for the thoughtful discussion.
> >>>> >
> >>>> > Best,
> >>>> >
> >>>> > Huaxin
> >>>> >
> >>>> >
> >>>> > On Mon, Sep 29, 2025 at 5:47 PM Dmitri Bourlatchkov <
> [email protected]> wrote:
> >>>> >>
> >>>> >> Hi Huaxin,
> >>>> >>
> >>>> >> Sorry about the delay. I posted some comments on
> https://github.com/apache/iceberg/pull/14196 Some of them I might have
> mentioned on the doc too, so apologies if they got answered in the doc and
> I missed it.
> >>>> >>
> >>>> >> Cheers,
> >>>> >> Dmitri.
> >>>> >>
> >>>> >> On Thu, Sep 25, 2025 at 12:27 PM huaxin gao <
> [email protected]> wrote:
> >>>> >>>
> >>>> >>> Thank you all for taking the time to review and discuss! I’ve
> responded to all questions and updated the proposal. If there are no
> additional concerns, I’ll proceed to start a VOTE thread.
> >>>> >>>
> >>>> >>> Thanks,
> >>>> >>> Huaxin
> >>>> >>>
> >>>> >>> On Mon, Sep 22, 2025 at 1:30 AM Maninder Parmar <
> [email protected]> wrote:
> >>>> >>>>
> >>>> >>>> +1, for low level retry which ensures that the idempotent key is
> never committed twice. I also agree that canonicalizing the request body
> where the client can change it due to conflict resolution and retry would
> be hard to get right.
> >>>> >>>>
> >>>> >>>> On Sat, Sep 20, 2025 at 5:58 AM Dennis Huo <[email protected]>
> wrote:
> >>>> >>>>>
> >>>> >>>>> +1 to this being mostly targeting a "low-level" retry semantic.
> Expanding on that though I'd say even "client-side retries" really have two
> distinct flavors:
> >>>> >>>>>
> >>>> >>>>> A. Business-logic-agnostic retries, e.g. in a common low-level
> HTTP client library - behaviorally, these should behave largely the same as
> "network infra retries". The key distinction is that in this case any
> content hashing would be *post* serialization and even agnostic to
> request-body content-type (i.e. not JSON-specific).
> >>>> >>>>> B. Application-specific retries, such as when Iceberg client
> will potentially rebase on a new snapshot
> >>>> >>>>>
> >>>> >>>>> I think this aligns with what Peter and others mentioned
> earlier where trying to canonicalize the *semantic* content of a request is
> probably brittle/risky. And as Yufei mentions, case 2.B (client-side real
> application-layer retries) should be using a new idempotency-key if it's
> ever doing the retry at the later that requires re-serializating JSON.
> >>>> >>>>>
> >>>> >>>>> Overall though I agree making the content-hash checking
> optional is a good idea.
> >>>> >>>>>
> >>>> >>>>> On Fri, Sep 19, 2025 at 4:33 PM huaxin gao <
> [email protected]> wrote:
> >>>> >>>>>>
> >>>> >>>>>> Thanks, Peter and Yufei. I agree the main use case is
> network‑infrastructure retries. To keep the specification simple and move
> the proposal forward, let’s make the baseline key‑only idempotency. If
> there’s demand, we can add an optional payload‑binding mode (canonical JSON
> + SHA‑256), advertised via /v1/config.
> >>>> >>>>>>
> >>>> >>>>>> Thanks,
> >>>> >>>>>>
> >>>> >>>>>> Huaxin
> >>>> >>>>>>
> >>>> >>>>>>
> >>>> >>>>>> On Fri, Sep 19, 2025 at 1:31 PM Yufei Gu <[email protected]>
> wrote:
> >>>> >>>>>>>
> >>>> >>>>>>> "Network infrastructure retries" would be the dominant use
> case. I'd NOT recommend clients retry with the same idempotency key if it
> regenerated the request, instead, clients should reload before retry in
> that case.
> >>>> >>>>>>>
> >>>> >>>>>>> Yufei
> >>>> >>>>>>>
> >>>> >>>>>>>
> >>>> >>>>>>> On Fri, Sep 19, 2025 at 2:05 AM Péter Váry <
> [email protected]> wrote:
> >>>> >>>>>>>>
> >>>> >>>>>>>> Hi Huaxin,
> >>>> >>>>>>>>
> >>>> >>>>>>>> Could you clarify the specific use cases we intend to
> support regarding retry checking? Here are a couple of possibilities I had
> in mind:
> >>>> >>>>>>>>
> >>>> >>>>>>>> Network infrastructure retries – where the exact same
> request is retried.
> >>>> >>>>>>>> Client-side retries – where the client regenerates the
> request using the same program logic, resulting in identical content.
> >>>> >>>>>>>>
> >>>> >>>>>>>> If there are no security or other concerns, I’d suggest
> keeping the specification simple and avoiding mechanisms that surface
> client-side implementation errors. The cleanest approach might be to ignore
> the request content and rely solely on a user-provided key.
> >>>> >>>>>>>>
> >>>> >>>>>>>> Alternatively, we could include an optional error code in
> the response, which implementations may use to signal conflicts. The actual
> conflict detection logic can be left to the implementations—we don’t need
> to define it in the specification. If we go this route, we should also
> offer a way to disable these checks, since there will inevitably be cases
> where semantically identical requests are incorrectly flagged as
> conflicting.
> >>>> >>>>>>>>
> >>>> >>>>>>>> Thanks,
> >>>> >>>>>>>> Peter
> >>>> >>>>>>>>
> >>>> >>>>>>>> huaxin gao <[email protected]> ezt írta (időpont:
> 2025. szept. 19., P, 1:38):
> >>>> >>>>>>>>>
> >>>> >>>>>>>>> Thanks Steven for the +1 and for raising the fingerprint
> question! Great points!
> >>>> >>>>>>>>>
> >>>> >>>>>>>>> What we need to protect against:
> >>>> >>>>>>>>>
> >>>> >>>>>>>>> Same logical request, different bytes across retries
> (pretty vs compact JSON, map key order, ...).
> >>>> >>>>>>>>> Accidental key reuse with a changed payload.
> >>>> >>>>>>>>>
> >>>> >>>>>>>>> Options and tradeoffs:
> >>>> >>>>>>>>>
> >>>> >>>>>>>>> Exact byte checksum (e.g., SHA‑256 over raw body)
> >>>> >>>>>>>>>
> >>>> >>>>>>>>> Pro: trivial, fast
> >>>> >>>>>>>>> Con: too strict; benign diffs cause false mismatches
> >>>> >>>>>>>>>
> >>>> >>>>>>>>> Canonical JSON over full request, then hash (proposed)
> >>>> >>>>>>>>>
> >>>> >>>>>>>>> Pro: stable across whitespace/key order; simple to
> implement for typed payloads
> >>>> >>>>>>>>> Con: slightly more work than raw checksum;
> >>>> >>>>>>>>>
> >>>> >>>>>>>>> Checksum of selected fields / field-by-field match
> >>>> >>>>>>>>>
> >>>> >>>>>>>>> Pro: can be faster for huge payloads; can ignore noisy
> fields
> >>>> >>>>>>>>> Con: could misses legitimate differences
> >>>> >>>>>>>>>
> >>>> >>>>>>>>> Request digest/signature
> >>>> >>>>>>>>>
> >>>> >>>>>>>>> Pro: very strong
> >>>> >>>>>>>>> Con: heavyweight
> >>>> >>>>>>>>>
> >>>> >>>>>>>>> Maybe we could make this configurable:
> >>>> >>>>>>>>>
> >>>> >>>>>>>>> canonical-json-sha256 (default)
> >>>> >>>>>>>>> raw-bytes-sha256 (strict)
> >>>> >>>>>>>>> trust-client-key (no fingerprint check)
> >>>> >>>>>>>>>
> >>>> >>>>>>>>> On the IETF draft status:
> >>>> >>>>>>>>>
> >>>> >>>>>>>>> I have also noted the draft’s expiry. We will align with
> its semantics for now and can adjust if a new version lands.
> >>>> >>>>>>>>>
> >>>> >>>>>>>>> Thanks,
> >>>> >>>>>>>>>
> >>>> >>>>>>>>> Huaxin
> >>>> >>>>>>>>>
> >>>> >>>>>>>>>
> >>>> >>>>>>>>> On Thu, Sep 18, 2025 at 4:01 PM Steven Wu <
> [email protected]> wrote:
> >>>> >>>>>>>>>>
> >>>> >>>>>>>>>> +1 for the feature that can make retry safe for 500s and
> improve the client fault-tolerance of transient server failures.
> >>>> >>>>>>>>>>
> >>>> >>>>>>>>>> Peter and Dimitri raised a good question on the
> fingerprint. The IETF draft doesn't actually define the fingerprint algo.
> We can also go with simple checksum of the entire request payload, which
> would be cheap to compute. Do we anticipate any anticipated scenarios where
> clients may rewrite the payload in different forms of serialized bytes
> during retries?
> >>>> >>>>>>>>>>
> >>>> >>>>>>>>>>    *  Checksum of the entire request payload.
> >>>> >>>>>>>>>>    *  Checksum of selected element(s) in the request
> payload.
> >>>> >>>>>>>>>>    *  Field value match for each field in the request
> payload.
> >>>> >>>>>>>>>>    *  Field value match for selected element(s) in the
> request payload.
> >>>> >>>>>>>>>>    *  Request digest/signature
> >>>> >>>>>>>>>>
> >>>> >>>>>>>>>>
> >>>> >>>>>>>>>> BTW, the IETF draft seems to have expired without approval
> >>>> >>>>>>>>>>
> https://datatracker.ietf.org/doc/draft-ietf-httpapi-idempotency-key-header/
> >>>> >>>>>>>>>>
> >>>> >>>>>>>>>> On Thu, Sep 18, 2025 at 3:46 PM huaxin gao <
> [email protected]> wrote:
> >>>> >>>>>>>>>>>
> >>>> >>>>>>>>>>> Thanks Peter and Dmitri for the thoughtful feedback! I
> really appreciate you taking a close look at my proposal. I agree that
> "semantic equality" is tricky, that's why the scope here is intentionally
> narrow.
> >>>> >>>>>>>>>>>
> >>>> >>>>>>>>>>> Just to clarify scope: I’m not trying to solve general
> semantic equivalence. For these specific, typed request payloads, I
> serialize to a deterministic JSON and hash it. That normalizes benign diffs
> (map order, whitespace) without trying to infer meaning. The goal is a
> stable fingerprint so that if a key is accidentally reused with a changed
> payload, we surface that instead of silently diverging.
> >>>> >>>>>>>>>>>
> >>>> >>>>>>>>>>> To make this feel less brittle, I’ll add tests for the
> practical cases (ordering/whitespace, nested maps, a clear null‑vs‑missing
> rule, numeric formatting), plus end‑to‑end tests in the in‑memory REST
> fixture with failure injection (in‑flight dup, finalize failure ->
> reconcile, etc.). Happy to walk through these if helpful.
> >>>> >>>>>>>>>>>
> >>>> >>>>>>>>>>> I’m also open to adding a config switch for
> “trust‑client‑key only” if that’s preferred in some environments. My intent
> is to stay aligned with the IETF Idempotency‑Key guidance (first request
> wins; conflicting reuse is rejected, and reusing a key with a different
> request payload is rejected via an idempotency fingerprint) while keeping
> things as simple as possible and protecting us from accidental key misuse.
> Would love to align on the lightest approach that meets those goals.
> >>>> >>>>>>>>>>>
> >>>> >>>>>>>>>>> Thanks,
> >>>> >>>>>>>>>>>
> >>>> >>>>>>>>>>> Huaxin
> >>>> >>>>>>>>>>>
> >>>> >>>>>>>>>>>
> >>>> >>>>>>>>>>> On Thu, Sep 18, 2025 at 6:17 AM Dmitri Bourlatchkov <
> [email protected]> wrote:
> >>>> >>>>>>>>>>>>
> >>>> >>>>>>>>>>>> Hi All,
> >>>> >>>>>>>>>>>>
> >>>> >>>>>>>>>>>> I agree that checking request contents is almost
> redundant in this case.
> >>>> >>>>>>>>>>>>
> >>>> >>>>>>>>>>>> If the randomness quality of Idempotency-Key value is
> good, collisions are very unlikely on the server side. Given that, any
> content checks the server performs are essentially validating that clients
> correctly reuse the generated Idempotency-Key value. (this is mostly the
> same as my comment on the related Polaris discussion).
> >>>> >>>>>>>>>>>>
> >>>> >>>>>>>>>>>> I'd like to propose making the content check optional so
> that servers may or may not implement it according to their design
> principles and constraints and emphasizing that clients should use unique
> keys (e.g. UUIDs)... basically going with option 2 from Peter's email.
> >>>> >>>>>>>>>>>>
> >>>> >>>>>>>>>>>> I believe this is in line with the SHOULD word used for
> this case in the IETF draft [1] (section 2.7).
> >>>> >>>>>>>>>>>>
> >>>> >>>>>>>>>>>> [1]
> https://datatracker.ietf.org/doc/html/draft-ietf-httpapi-idempotency-key-header-06
> >>>> >>>>>>>>>>>>
> >>>> >>>>>>>>>>>> Thanks,
> >>>> >>>>>>>>>>>> Dmitri.
> >>>> >>>>>>>>>>>>
> >>>> >>>>>>>>>>>> On Thu, Sep 18, 2025 at 7:56 AM Péter Váry <
> [email protected]> wrote:
> >>>> >>>>>>>>>>>>>
> >>>> >>>>>>>>>>>>> Thanks Huaxin for the proposal, and sorry for the late
> review - I had a bit of a busy week.
> >>>> >>>>>>>>>>>>> I have one main question, which I have also added as a
> comment to the doc:
> >>>> >>>>>>>>>>>>> - Why do we try to compare the request contents when
> the Idempotency-Key is the same for the requests? The comparison algorithm
> is a bit complicated, and seems brittle to me. Consistent field ordering,
> maps, and maybe even inconsistency in upper case/lower case letters might
> mean technically the same request.
> >>>> >>>>>>>>>>>>>
> >>>> >>>>>>>>>>>>> In my previous roles (admittedly more than 10 years
> ago) I was extensively working on APIs like this, and we have never really
> succeeded in creating a good enough "are these 2 requests are really the
> same semantically" checks.
> >>>> >>>>>>>>>>>>>
> >>>> >>>>>>>>>>>>> I would simplify these requirements, unless there are
> serious arguments for the existence of these checks:
> >>>> >>>>>>>>>>>>>
> >>>> >>>>>>>>>>>>> Either check for exact matches - without any magic -
> this could be used for detecting issues where the duplication happens on
> the network side, or
> >>>> >>>>>>>>>>>>> Rely entirely on the clients to provide the correct
> Idempotency-Key.
> >>>> >>>>>>>>>>>>>
> >>>> >>>>>>>>>>>>> I would prefer the 2nd.
> >>>> >>>>>>>>>>>>> Otherwise I agree with the contents of the proposal. It
> is nicely done! (edited)
> >>>> >>>>>>>>>>>>>
> >>>> >>>>>>>>>>>>> Yufei Gu <[email protected]> ezt írta (időpont:
> 2025. szept. 18., Cs, 2:54):
> >>>> >>>>>>>>>>>>>>
> >>>> >>>>>>>>>>>>>> Thanks for the proposal. It's a nice feature to make
> retry more reliable and efficient. Left some comments.
> >>>> >>>>>>>>>>>>>>
> >>>> >>>>>>>>>>>>>> Yufei
> >>>> >>>>>>>>>>>>>>
> >>>> >>>>>>>>>>>>>>
> >>>> >>>>>>>>>>>>>> On Mon, Sep 15, 2025 at 3:53 PM Kevin Liu <
> [email protected]> wrote:
> >>>> >>>>>>>>>>>>>>>
> >>>> >>>>>>>>>>>>>>>
> >>>> >>>>>>>>>>>>>>> Thanks for writing up the proposal! Makes sense to
> add idempotency to mutation requests.
> >>>> >>>>>>>>>>>>>>>
> >>>> >>>>>>>>>>>>>>> It would be helpful to add this feature to both the
> catalog test framework and the iceberg-rest-fixture. The latter is used by
> the subprojects for testing and would come in handy when we want to test
> out the client implementation.
> >>>> >>>>>>>>>>>>>>>
> >>>> >>>>>>>>>>>>>>> For other reviewers, the Stripe documentation on
> idempotency was a helpful read,
> https://docs.stripe.com/api/idempotent_requests.
> >>>> >>>>>>>>>>>>>>>
> >>>> >>>>>>>>>>>>>>>
> >>>> >>>>>>>>>>>>>>> Best,
> >>>> >>>>>>>>>>>>>>> Kevin Liu
> >>>> >>>>>>>>>>>>>>>
> >>>> >>>>>>>>>>>>>>> On Mon, Sep 15, 2025 at 11:38 AM Szehon Ho <
> [email protected]> wrote:
> >>>> >>>>>>>>>>>>>>>>
> >>>> >>>>>>>>>>>>>>>> Hi,
> >>>> >>>>>>>>>>>>>>>>
> >>>> >>>>>>>>>>>>>>>> Sounds like fairly standard practice and makes sense
> to me in the first read.
> >>>> >>>>>>>>>>>>>>>>
> >>>> >>>>>>>>>>>>>>>> Thanks,
> >>>> >>>>>>>>>>>>>>>> Szehon
> >>>> >>>>>>>>>>>>>>>>
> >>>> >>>>>>>>>>>>>>>> On Mon, Sep 15, 2025 at 10:09 AM Russell Spitzer <
> [email protected]> wrote:
> >>>> >>>>>>>>>>>>>>>>>
> >>>> >>>>>>>>>>>>>>>>> I think based on the feedback on the proposal and
> in recent syncs we should probably move forward with the actual Spec Change
> PR so we can see what this looks like and move on to a discussion of how
> the Catalog test framework should test this.
> >>>> >>>>>>>>>>>>>>>>>
> >>>> >>>>>>>>>>>>>>>>> On 2025/08/22 18:26:23 huaxin gao wrote:
> >>>> >>>>>>>>>>>>>>>>> > Hi all,
> >>>> >>>>>>>>>>>>>>>>> >
> >>>> >>>>>>>>>>>>>>>>> > I’d like to propose a change to Iceberg’s REST
> API to make mutation
> >>>> >>>>>>>>>>>>>>>>> > requests safely retryable.
> >>>> >>>>>>>>>>>>>>>>> >
> >>>> >>>>>>>>>>>>>>>>> > *The Problem*
> >>>> >>>>>>>>>>>>>>>>> > If a POST mutation (e.g., updateTable) succeeds
> in the catalog but the
> >>>> >>>>>>>>>>>>>>>>> > client doesn’t receive the response (timeout,
> connection closed, etc.), a
> >>>> >>>>>>>>>>>>>>>>> > second attempt can hit 409 Conflict. The client
> interprets the 409 as a
> >>>> >>>>>>>>>>>>>>>>> > failed commit and deletes the associated metadata
> files, causing
> >>>> >>>>>>>>>>>>>>>>> > catalog/storage inconsistency.
> >>>> >>>>>>>>>>>>>>>>> >
> >>>> >>>>>>>>>>>>>>>>> > *The Proposed Solution*
> >>>> >>>>>>>>>>>>>>>>> > Introduces an optional Idempotency-Key HTTP
> header on REST mutation
> >>>> >>>>>>>>>>>>>>>>> > endpoints and has the Iceberg client pass it
> through.
> >>>> >>>>>>>>>>>>>>>>> >
> >>>> >>>>>>>>>>>>>>>>> > *Semantics *(first processed request wins):
> >>>> >>>>>>>>>>>>>>>>> >
> >>>> >>>>>>>>>>>>>>>>> >    -
> >>>> >>>>>>>>>>>>>>>>> >
> >>>> >>>>>>>>>>>>>>>>> >    Same key + same canonical payload -> return
> the original result (no
> >>>> >>>>>>>>>>>>>>>>> >    re-execution).
> >>>> >>>>>>>>>>>>>>>>> >    -
> >>>> >>>>>>>>>>>>>>>>> >
> >>>> >>>>>>>>>>>>>>>>> >    Same key + different payload -> 422
> (Unprocessable Content).
> >>>> >>>>>>>>>>>>>>>>> >
> >>>> >>>>>>>>>>>>>>>>> > *Capability discovery:* catalogs can advertise
> support and retention so
> >>>> >>>>>>>>>>>>>>>>> > clients know when a retry is safe, e.g.
> >>>> >>>>>>>>>>>>>>>>> >
> >>>> >>>>>>>>>>>>>>>>> > {
> >>>> >>>>>>>>>>>>>>>>> >   "idempotency-tokens-respected": true,
> >>>> >>>>>>>>>>>>>>>>> >   "idempotency-token-lifetime": "30m" }
> >>>> >>>>>>>>>>>>>>>>> >
> >>>> >>>>>>>>>>>>>>>>> > *Scope in Iceberg:* update the OpenAPI to include
> the header, and add
> >>>> >>>>>>>>>>>>>>>>> > client pass-through + honoring capability
> discovery. No server
> >>>> >>>>>>>>>>>>>>>>> > implementation is mandated—catalogs (e.g.,
> Polaris) can implement
> >>>> >>>>>>>>>>>>>>>>> > storage/TTL/replay as they choose.
> >>>> >>>>>>>>>>>>>>>>> >
> >>>> >>>>>>>>>>>>>>>>> > *Standards alignment:* uses the industry-standard
> header name and matches
> >>>> >>>>>>>>>>>>>>>>> > the IETF HTTPAPI Idempotency-Key draft
> >>>> >>>>>>>>>>>>>>>>> > <
> https://datatracker.ietf.org/doc/html/draft-ietf-httpapi-idempotency-key-header
> >
> >>>> >>>>>>>>>>>>>>>>> > semantics.
> >>>> >>>>>>>>>>>>>>>>> >
> >>>> >>>>>>>>>>>>>>>>> > *Compatibility:* fully backward compatible.
> Servers that don’t support it
> >>>> >>>>>>>>>>>>>>>>> > can ignore the header; clients can detect support
> via capability discovery.
> >>>> >>>>>>>>>>>>>>>>> >
> >>>> >>>>>>>>>>>>>>>>> > Here is the proposal
> >>>> >>>>>>>>>>>>>>>>> > <
> https://docs.google.com/document/d/1WyiIk08JRe8AjWh63txIP4i2xcIUHYQWFrF_1CCS3uw/edit?tab=t.0
> >.
> >>>> >>>>>>>>>>>>>>>>> > Looking forward to your thoughts.
> >>>> >>>>>>>>>>>>>>>>> >
> >>>> >>>>>>>>>>>>>>>>> > Thanks,
> >>>> >>>>>>>>>>>>>>>>> >
> >>>> >>>>>>>>>>>>>>>>> > Huaxin
> >>>> >>>>>>>>>>>>>>>>> >
>

Re: [DISCUSS] Iceberg REST Catalog Idempotency

Reply via email to