Re: [DISCUSS] Iceberg REST Catalog Idempotency

yun zou Wed, 29 Oct 2025 19:19:50 -0700

Hi All,

It sounds like the idea is to use the timestamp component to help
determine whether a key should expire. However, I’m not clear on how
exactly the timestamp would be used for this purpose. When the server
decides to expire a key, will it rely on the server-receive timestamp,
the client request timestamp, or the key creation timestamp?


If the timestamp component within the key is intended to represent the
key creation time, that raises a couple of concerns:
1. A key could be created well before it’s actually used.
2. Clock skew between the client and server could lead to inconsistent
expiration behavior.

If the server is responsible for managing the key lifecycle, it’s
generally more robust and consistent to rely on the server clock for
expiration decisions rather than client-provided timestamps.

Additionally, if the timestamp is an important piece of information,
it might be cleaner to make it an explicit field instead of
overloading the key itself with multiple purposes. Having a separate,
well-defined field would make the specification clearer and easier to
maintain.

>From the client’s perspective, requiring the use of UUIDv7 introduces
unnecessary constraints on implementation. That said, clients are free
to adopt UUIDv7 if they prefer. Since the server ultimately manages
expiration, it’s generally better to keep the client logic simple and
decoupled from server-side decisions.

Best Regards,
Yun

On Wed, Oct 29, 2025 at 12:50 PM Dmitri Bourlatchkov <[email protected]> wrote:
>
> Hi All,
>
> From my POV (and I may be repeating what I put in GH comments), the main 
> point in using UUID v7 is specifying that a timestamp should be part of the 
> idempotency key. As previously discussed, having this timestamp is beneficial 
> to server implementations.
>
> The IETF Idempotency Key draft v7 [1] allows servers to require specific ID 
> generation algorithms.
>
> We could have a custom ID format, but UUID v7 is already defined and fits 
> this use case.
>
> If for some reason UUID v7 becomes "weak" in the future, such an event will 
> have a much greater impact than the REST Catalog API. In any case, if that 
> happens, nothing prevents revisioning the REST API spec to allow for stronger 
> ID generators.
>
> [1] 
> https://datatracker.ietf.org/doc/html/draft-ietf-httpapi-idempotency-key-header-07#name-client
>
> Cheers,
> Dmitri.
>
> On Mon, Oct 27, 2025 at 2:33 PM Yufei Gu <[email protected]> wrote:
>>
>> +1 on option 2: don’t mandate a specific key format.
>>
>> Concerns with option 1 (UUIDv7-mandatory):
>> 1. Overspecification risk. If UUIDv7 shows weaknesses later, we’re stuck 
>> with a brittle contract.
>> 2. Unnecessary constraints. It binds both client and server implementations. 
>> One of IRC’s goals is to simplify client work; forcing UUIDv7 limits client 
>> choices for marginal gain (the embedded timestamp).
>>
>> Here are existing implementations for reference:
>>
>> Stripe[1]: recommends UUIDv4 but does not enforce a format for idempotency 
>> keys.
>> AWS EC2[2]: accepts any unique, case-sensitive string up to 64 ASCII 
>> characters for the client token.
>>
>> I'd propose to treat the idempotency key as an opaque string with basic 
>> requirements and guidance(e.g., “unique string values; UUIDv4 or v7 are 
>> fine”) but avoid making the format mandatory. This keeps the API 
>> future-proof and client-friendly while preserving server-side flexibility.
>>
>> 1. https://docs.stripe.com/api/expanding_objects
>> 2. https://docs.aws.amazon.com/ec2/latest/devguide/ec2-api-idempotency.html
>>
>> Yufei
>>
>>
>> On Mon, Oct 27, 2025 at 9:53 AM huaxin gao <[email protected]> wrote:
>>>
>>> Hi Yun,
>>> Thanks for the thoughtful feedback!
>>>
>>> Yes, the key itself is expected to be globally unique. You’re also right 
>>> that we don’t need to mandate UUIDs to achieve that; other schemes can 
>>> provide global uniqueness.
>>>
>>> I have chosen UUID because several folks in the community prefer it as a 
>>> common, interoperable choice. That said, I agree that mandating UUIDv7 adds 
>>> constraints on clients without clear spec-level benefit.
>>>
>>> I also agree we should separate spec from implementation; details like the 
>>> key generation method can live in implementation guidance.
>>>
>>> From your note, it sounds like you support Option 2 
>>> (version-agnostic)—i.e., require a “globally unique idempotency key” and 
>>> accept any RFC 9562 UUID (with v7 as a non-normative recommendation), while 
>>> leaving timestamp/expiry mechanics to the server-side doc. I’ll count this 
>>> as a +1 for Option 2.
>>>
>>> Thanks,
>>>
>>> Huaxin
>>>
>>>
>>> On Fri, Oct 24, 2025 at 7:00 PM yun zou <[email protected]> wrote:
>>>>
>>>> Sorry, I accidentally sent the email before complete, please ignore my
>>>> previous email. Sorry for the noise and inconvenience.
>>>>
>>>> Hi Huaxin,
>>>>
>>>> This is a really interesting and valuable proposal — it provides a
>>>> great way to address the issue of duplicate client requests. Thank you
>>>> for proposing and driving this forward!
>>>>
>>>> One point that isn’t entirely clear to me is how the server uniquely
>>>> identifies each request.  Are we relying solely on the idempotency-key
>>>> being globally unique, or is there an additional identifier such as
>>>> clientId + idempotency-key? Based on the current discussion, it sounds
>>>> like the proposal expects the key itself to be globally unique, likely
>>>> through the use of a UUID, but I’d like to double-check my
>>>> understanding.
>>>>
>>>> If we are indeed relying on the client to generate a globally unique
>>>> ID, that approach makes sense. However, it doesn’t seem necessary to
>>>> mandate the use of UUIDs, as there are other valid methods for
>>>> achieving global uniqueness. Imposing a further restriction to UUIDv7
>>>> would place additional constraints on the client implementation.
>>>>
>>>> From a specification perspective, I think it would be better to
>>>> separate the spec from the implementation. In other words, we should
>>>> make it clear that the key must be globally unique, but we don’t need
>>>> to specify that it must be a UUID or UUIDv7.
>>>>
>>>> Best Regards,
>>>> Yun
>>>>
>>>> On Fri, Oct 24, 2025 at 4:41 PM huaxin gao <[email protected]> wrote:
>>>> >
>>>> > Hi all,
>>>> >
>>>> > Thank you for taking the time to review my proposal and PR—I really 
>>>> > appreciate the input.
>>>> >
>>>> > There’s one remaining issue I’d like to settle. In the Iceberg Catalog 
>>>> > Community sync, many preferred mandating UUIDv7 for the idempotency key. 
>>>> > At the same time, there are some concerns:
>>>> >
>>>> > If we need a timestamp, it should be a separate field; we shouldn’t use 
>>>> > the UUIDv7 timestamp.
>>>> >
>>>> > If we use the UUID timestamp for expiry, we’d have to require keys to be 
>>>> > generated at request time, which feels over-engineered.
>>>> >
>>>> > If we want to use the UUIDv7 timestamp, it should be for debugging only.
>>>> >
>>>> > Based on that, here’s a draft update to the spec:
>>>> >
>>>> > Key Requirements:
>>>> > - Key format: UUIDv7 in string format as defined in RFC 9562.
>>>> >   See 
>>>> > https://datatracker.ietf.org/doc/html/rfc9562#name-example-of-a-uuidv7-value.
>>>> > - The idempotency key must be globally unique (no reuse across different 
>>>> > operations).
>>>> > - Catalogs SHOULD NOT expire keys before the end of the advertised token 
>>>> > lifetime.
>>>> > - If Idempotency-Key is used, clients MUST reuse the same key when 
>>>> > retrying the same
>>>> >   logical operation and MUST generate a new key for a different 
>>>> > operation.
>>>> > - Server behavior: Servers MUST validate the syntactic validity of 
>>>> > UUIDv7 (per RFC 9562).
>>>> >   Servers MUST NOT make behavioral decisions based on the UUID’s 
>>>> > internal timestamp fields.
>>>> >   The idempotency key is an opaque, unique identifier used only for 
>>>> > lookup/deduplication.
>>>> >
>>>> > This reads a bit awkward to me: we mandate UUIDv7 but prohibit using its 
>>>> > timestamp, which seems to undercut the reason to require v7 in the first 
>>>> > place.
>>>> >
>>>> > I’d appreciate feedback on whether we should:
>>>> >
>>>> > Option 1 — Require v7.
>>>> > Keep UUIDv7 required, with the server restrictions above (syntactic v7 
>>>> > validation only; no behavioral decisions based on the embedded 
>>>> > timestamp).
>>>> >
>>>> > Option 2 — Version-agnostic.
>>>> > Make the client spec version-agnostic (require RFC 9562 UUID textual 
>>>> > form; allow v7 as a recommendation). Leave any timestamp/lifetime 
>>>> > mechanics to a server-side (Polaris idempotency) document.
>>>> >
>>>> > Thanks again for the thoughtful discussion.
>>>> >
>>>> > Best,
>>>> >
>>>> > Huaxin
>>>> >
>>>> >
>>>> > On Mon, Sep 29, 2025 at 5:47 PM Dmitri Bourlatchkov <[email protected]> 
>>>> > wrote:
>>>> >>
>>>> >> Hi Huaxin,
>>>> >>
>>>> >> Sorry about the delay. I posted some comments on 
>>>> >> https://github.com/apache/iceberg/pull/14196 Some of them I might have 
>>>> >> mentioned on the doc too, so apologies if they got answered in the doc 
>>>> >> and I missed it.
>>>> >>
>>>> >> Cheers,
>>>> >> Dmitri.
>>>> >>
>>>> >> On Thu, Sep 25, 2025 at 12:27 PM huaxin gao <[email protected]> 
>>>> >> wrote:
>>>> >>>
>>>> >>> Thank you all for taking the time to review and discuss! I’ve 
>>>> >>> responded to all questions and updated the proposal. If there are no 
>>>> >>> additional concerns, I’ll proceed to start a VOTE thread.
>>>> >>>
>>>> >>> Thanks,
>>>> >>> Huaxin
>>>> >>>
>>>> >>> On Mon, Sep 22, 2025 at 1:30 AM Maninder Parmar 
>>>> >>> <[email protected]> wrote:
>>>> >>>>
>>>> >>>> +1, for low level retry which ensures that the idempotent key is 
>>>> >>>> never committed twice. I also agree that canonicalizing the request 
>>>> >>>> body where the client can change it due to conflict resolution and 
>>>> >>>> retry would be hard to get right.
>>>> >>>>
>>>> >>>> On Sat, Sep 20, 2025 at 5:58 AM Dennis Huo <[email protected]> wrote:
>>>> >>>>>
>>>> >>>>> +1 to this being mostly targeting a "low-level" retry semantic. 
>>>> >>>>> Expanding on that though I'd say even "client-side retries" really 
>>>> >>>>> have two distinct flavors:
>>>> >>>>>
>>>> >>>>> A. Business-logic-agnostic retries, e.g. in a common low-level HTTP 
>>>> >>>>> client library - behaviorally, these should behave largely the same 
>>>> >>>>> as "network infra retries". The key distinction is that in this case 
>>>> >>>>> any content hashing would be *post* serialization and even agnostic 
>>>> >>>>> to request-body content-type (i.e. not JSON-specific).
>>>> >>>>> B. Application-specific retries, such as when Iceberg client will 
>>>> >>>>> potentially rebase on a new snapshot
>>>> >>>>>
>>>> >>>>> I think this aligns with what Peter and others mentioned earlier 
>>>> >>>>> where trying to canonicalize the *semantic* content of a request is 
>>>> >>>>> probably brittle/risky. And as Yufei mentions, case 2.B (client-side 
>>>> >>>>> real application-layer retries) should be using a new 
>>>> >>>>> idempotency-key if it's ever doing the retry at the later that 
>>>> >>>>> requires re-serializating JSON.
>>>> >>>>>
>>>> >>>>> Overall though I agree making the content-hash checking optional is 
>>>> >>>>> a good idea.
>>>> >>>>>
>>>> >>>>> On Fri, Sep 19, 2025 at 4:33 PM huaxin gao <[email protected]> 
>>>> >>>>> wrote:
>>>> >>>>>>
>>>> >>>>>> Thanks, Peter and Yufei. I agree the main use case is 
>>>> >>>>>> network‑infrastructure retries. To keep the specification simple 
>>>> >>>>>> and move the proposal forward, let’s make the baseline key‑only 
>>>> >>>>>> idempotency. If there’s demand, we can add an optional 
>>>> >>>>>> payload‑binding mode (canonical JSON + SHA‑256), advertised via 
>>>> >>>>>> /v1/config.
>>>> >>>>>>
>>>> >>>>>> Thanks,
>>>> >>>>>>
>>>> >>>>>> Huaxin
>>>> >>>>>>
>>>> >>>>>>
>>>> >>>>>> On Fri, Sep 19, 2025 at 1:31 PM Yufei Gu <[email protected]> 
>>>> >>>>>> wrote:
>>>> >>>>>>>
>>>> >>>>>>> "Network infrastructure retries" would be the dominant use case. 
>>>> >>>>>>> I'd NOT recommend clients retry with the same idempotency key if 
>>>> >>>>>>> it regenerated the request, instead, clients should reload before 
>>>> >>>>>>> retry in that case.
>>>> >>>>>>>
>>>> >>>>>>> Yufei
>>>> >>>>>>>
>>>> >>>>>>>
>>>> >>>>>>> On Fri, Sep 19, 2025 at 2:05 AM Péter Váry 
>>>> >>>>>>> <[email protected]> wrote:
>>>> >>>>>>>>
>>>> >>>>>>>> Hi Huaxin,
>>>> >>>>>>>>
>>>> >>>>>>>> Could you clarify the specific use cases we intend to support 
>>>> >>>>>>>> regarding retry checking? Here are a couple of possibilities I 
>>>> >>>>>>>> had in mind:
>>>> >>>>>>>>
>>>> >>>>>>>> Network infrastructure retries – where the exact same request is 
>>>> >>>>>>>> retried.
>>>> >>>>>>>> Client-side retries – where the client regenerates the request 
>>>> >>>>>>>> using the same program logic, resulting in identical content.
>>>> >>>>>>>>
>>>> >>>>>>>> If there are no security or other concerns, I’d suggest keeping 
>>>> >>>>>>>> the specification simple and avoiding mechanisms that surface 
>>>> >>>>>>>> client-side implementation errors. The cleanest approach might be 
>>>> >>>>>>>> to ignore the request content and rely solely on a user-provided 
>>>> >>>>>>>> key.
>>>> >>>>>>>>
>>>> >>>>>>>> Alternatively, we could include an optional error code in the 
>>>> >>>>>>>> response, which implementations may use to signal conflicts. The 
>>>> >>>>>>>> actual conflict detection logic can be left to the 
>>>> >>>>>>>> implementations—we don’t need to define it in the specification. 
>>>> >>>>>>>> If we go this route, we should also offer a way to disable these 
>>>> >>>>>>>> checks, since there will inevitably be cases where semantically 
>>>> >>>>>>>> identical requests are incorrectly flagged as conflicting.
>>>> >>>>>>>>
>>>> >>>>>>>> Thanks,
>>>> >>>>>>>> Peter
>>>> >>>>>>>>
>>>> >>>>>>>> huaxin gao <[email protected]> ezt írta (időpont: 2025. 
>>>> >>>>>>>> szept. 19., P, 1:38):
>>>> >>>>>>>>>
>>>> >>>>>>>>> Thanks Steven for the +1 and for raising the fingerprint 
>>>> >>>>>>>>> question! Great points!
>>>> >>>>>>>>>
>>>> >>>>>>>>> What we need to protect against:
>>>> >>>>>>>>>
>>>> >>>>>>>>> Same logical request, different bytes across retries (pretty vs 
>>>> >>>>>>>>> compact JSON, map key order, ...).
>>>> >>>>>>>>> Accidental key reuse with a changed payload.
>>>> >>>>>>>>>
>>>> >>>>>>>>> Options and tradeoffs:
>>>> >>>>>>>>>
>>>> >>>>>>>>> Exact byte checksum (e.g., SHA‑256 over raw body)
>>>> >>>>>>>>>
>>>> >>>>>>>>> Pro: trivial, fast
>>>> >>>>>>>>> Con: too strict; benign diffs cause false mismatches
>>>> >>>>>>>>>
>>>> >>>>>>>>> Canonical JSON over full request, then hash (proposed)
>>>> >>>>>>>>>
>>>> >>>>>>>>> Pro: stable across whitespace/key order; simple to implement for 
>>>> >>>>>>>>> typed payloads
>>>> >>>>>>>>> Con: slightly more work than raw checksum;
>>>> >>>>>>>>>
>>>> >>>>>>>>> Checksum of selected fields / field-by-field match
>>>> >>>>>>>>>
>>>> >>>>>>>>> Pro: can be faster for huge payloads; can ignore noisy fields
>>>> >>>>>>>>> Con: could misses legitimate differences
>>>> >>>>>>>>>
>>>> >>>>>>>>> Request digest/signature
>>>> >>>>>>>>>
>>>> >>>>>>>>> Pro: very strong
>>>> >>>>>>>>> Con: heavyweight
>>>> >>>>>>>>>
>>>> >>>>>>>>> Maybe we could make this configurable:
>>>> >>>>>>>>>
>>>> >>>>>>>>> canonical-json-sha256 (default)
>>>> >>>>>>>>> raw-bytes-sha256 (strict)
>>>> >>>>>>>>> trust-client-key (no fingerprint check)
>>>> >>>>>>>>>
>>>> >>>>>>>>> On the IETF draft status:
>>>> >>>>>>>>>
>>>> >>>>>>>>> I have also noted the draft’s expiry. We will align with its 
>>>> >>>>>>>>> semantics for now and can adjust if a new version lands.
>>>> >>>>>>>>>
>>>> >>>>>>>>> Thanks,
>>>> >>>>>>>>>
>>>> >>>>>>>>> Huaxin
>>>> >>>>>>>>>
>>>> >>>>>>>>>
>>>> >>>>>>>>> On Thu, Sep 18, 2025 at 4:01 PM Steven Wu <[email protected]> 
>>>> >>>>>>>>> wrote:
>>>> >>>>>>>>>>
>>>> >>>>>>>>>> +1 for the feature that can make retry safe for 500s and 
>>>> >>>>>>>>>> improve the client fault-tolerance of transient server failures.
>>>> >>>>>>>>>>
>>>> >>>>>>>>>> Peter and Dimitri raised a good question on the fingerprint. 
>>>> >>>>>>>>>> The IETF draft doesn't actually define the fingerprint algo. We 
>>>> >>>>>>>>>> can also go with simple checksum of the entire request payload, 
>>>> >>>>>>>>>> which would be cheap to compute. Do we anticipate any 
>>>> >>>>>>>>>> anticipated scenarios where clients may rewrite the payload in 
>>>> >>>>>>>>>> different forms of serialized bytes during retries?
>>>> >>>>>>>>>>
>>>> >>>>>>>>>>    *  Checksum of the entire request payload.
>>>> >>>>>>>>>>    *  Checksum of selected element(s) in the request payload.
>>>> >>>>>>>>>>    *  Field value match for each field in the request payload.
>>>> >>>>>>>>>>    *  Field value match for selected element(s) in the request 
>>>> >>>>>>>>>> payload.
>>>> >>>>>>>>>>    *  Request digest/signature
>>>> >>>>>>>>>>
>>>> >>>>>>>>>>
>>>> >>>>>>>>>> BTW, the IETF draft seems to have expired without approval
>>>> >>>>>>>>>> https://datatracker.ietf.org/doc/draft-ietf-httpapi-idempotency-key-header/
>>>> >>>>>>>>>>
>>>> >>>>>>>>>> On Thu, Sep 18, 2025 at 3:46 PM huaxin gao 
>>>> >>>>>>>>>> <[email protected]> wrote:
>>>> >>>>>>>>>>>
>>>> >>>>>>>>>>> Thanks Peter and Dmitri for the thoughtful feedback! I really 
>>>> >>>>>>>>>>> appreciate you taking a close look at my proposal. I agree 
>>>> >>>>>>>>>>> that "semantic equality" is tricky, that's why the scope here 
>>>> >>>>>>>>>>> is intentionally narrow.
>>>> >>>>>>>>>>>
>>>> >>>>>>>>>>> Just to clarify scope: I’m not trying to solve general 
>>>> >>>>>>>>>>> semantic equivalence. For these specific, typed request 
>>>> >>>>>>>>>>> payloads, I serialize to a deterministic JSON and hash it. 
>>>> >>>>>>>>>>> That normalizes benign diffs (map order, whitespace) without 
>>>> >>>>>>>>>>> trying to infer meaning. The goal is a stable fingerprint so 
>>>> >>>>>>>>>>> that if a key is accidentally reused with a changed payload, 
>>>> >>>>>>>>>>> we surface that instead of silently diverging.
>>>> >>>>>>>>>>>
>>>> >>>>>>>>>>> To make this feel less brittle, I’ll add tests for the 
>>>> >>>>>>>>>>> practical cases (ordering/whitespace, nested maps, a clear 
>>>> >>>>>>>>>>> null‑vs‑missing rule, numeric formatting), plus end‑to‑end 
>>>> >>>>>>>>>>> tests in the in‑memory REST fixture with failure injection 
>>>> >>>>>>>>>>> (in‑flight dup, finalize failure -> reconcile, etc.). Happy to 
>>>> >>>>>>>>>>> walk through these if helpful.
>>>> >>>>>>>>>>>
>>>> >>>>>>>>>>> I’m also open to adding a config switch for “trust‑client‑key 
>>>> >>>>>>>>>>> only” if that’s preferred in some environments. My intent is 
>>>> >>>>>>>>>>> to stay aligned with the IETF Idempotency‑Key guidance (first 
>>>> >>>>>>>>>>> request wins; conflicting reuse is rejected, and reusing a key 
>>>> >>>>>>>>>>> with a different request payload is rejected via an 
>>>> >>>>>>>>>>> idempotency fingerprint) while keeping things as simple as 
>>>> >>>>>>>>>>> possible and protecting us from accidental key misuse. Would 
>>>> >>>>>>>>>>> love to align on the lightest approach that meets those goals.
>>>> >>>>>>>>>>>
>>>> >>>>>>>>>>> Thanks,
>>>> >>>>>>>>>>>
>>>> >>>>>>>>>>> Huaxin
>>>> >>>>>>>>>>>
>>>> >>>>>>>>>>>
>>>> >>>>>>>>>>> On Thu, Sep 18, 2025 at 6:17 AM Dmitri Bourlatchkov 
>>>> >>>>>>>>>>> <[email protected]> wrote:
>>>> >>>>>>>>>>>>
>>>> >>>>>>>>>>>> Hi All,
>>>> >>>>>>>>>>>>
>>>> >>>>>>>>>>>> I agree that checking request contents is almost redundant in 
>>>> >>>>>>>>>>>> this case.
>>>> >>>>>>>>>>>>
>>>> >>>>>>>>>>>> If the randomness quality of Idempotency-Key value is good, 
>>>> >>>>>>>>>>>> collisions are very unlikely on the server side. Given that, 
>>>> >>>>>>>>>>>> any content checks the server performs are essentially 
>>>> >>>>>>>>>>>> validating that clients correctly reuse the generated 
>>>> >>>>>>>>>>>> Idempotency-Key value. (this is mostly the same as my comment 
>>>> >>>>>>>>>>>> on the related Polaris discussion).
>>>> >>>>>>>>>>>>
>>>> >>>>>>>>>>>> I'd like to propose making the content check optional so that 
>>>> >>>>>>>>>>>> servers may or may not implement it according to their design 
>>>> >>>>>>>>>>>> principles and constraints and emphasizing that clients 
>>>> >>>>>>>>>>>> should use unique keys (e.g. UUIDs)... basically going with 
>>>> >>>>>>>>>>>> option 2 from Peter's email.
>>>> >>>>>>>>>>>>
>>>> >>>>>>>>>>>> I believe this is in line with the SHOULD word used for this 
>>>> >>>>>>>>>>>> case in the IETF draft [1] (section 2.7).
>>>> >>>>>>>>>>>>
>>>> >>>>>>>>>>>> [1] 
>>>> >>>>>>>>>>>> https://datatracker.ietf.org/doc/html/draft-ietf-httpapi-idempotency-key-header-06
>>>> >>>>>>>>>>>>
>>>> >>>>>>>>>>>> Thanks,
>>>> >>>>>>>>>>>> Dmitri.
>>>> >>>>>>>>>>>>
>>>> >>>>>>>>>>>> On Thu, Sep 18, 2025 at 7:56 AM Péter Váry 
>>>> >>>>>>>>>>>> <[email protected]> wrote:
>>>> >>>>>>>>>>>>>
>>>> >>>>>>>>>>>>> Thanks Huaxin for the proposal, and sorry for the late 
>>>> >>>>>>>>>>>>> review - I had a bit of a busy week.
>>>> >>>>>>>>>>>>> I have one main question, which I have also added as a 
>>>> >>>>>>>>>>>>> comment to the doc:
>>>> >>>>>>>>>>>>> - Why do we try to compare the request contents when the 
>>>> >>>>>>>>>>>>> Idempotency-Key is the same for the requests? The comparison 
>>>> >>>>>>>>>>>>> algorithm is a bit complicated, and seems brittle to me. 
>>>> >>>>>>>>>>>>> Consistent field ordering, maps, and maybe even 
>>>> >>>>>>>>>>>>> inconsistency in upper case/lower case letters might mean 
>>>> >>>>>>>>>>>>> technically the same request.
>>>> >>>>>>>>>>>>>
>>>> >>>>>>>>>>>>> In my previous roles (admittedly more than 10 years ago) I 
>>>> >>>>>>>>>>>>> was extensively working on APIs like this, and we have never 
>>>> >>>>>>>>>>>>> really succeeded in creating a good enough "are these 2 
>>>> >>>>>>>>>>>>> requests are really the same semantically" checks.
>>>> >>>>>>>>>>>>>
>>>> >>>>>>>>>>>>> I would simplify these requirements, unless there are 
>>>> >>>>>>>>>>>>> serious arguments for the existence of these checks:
>>>> >>>>>>>>>>>>>
>>>> >>>>>>>>>>>>> Either check for exact matches - without any magic - this 
>>>> >>>>>>>>>>>>> could be used for detecting issues where the duplication 
>>>> >>>>>>>>>>>>> happens on the network side, or
>>>> >>>>>>>>>>>>> Rely entirely on the clients to provide the correct 
>>>> >>>>>>>>>>>>> Idempotency-Key.
>>>> >>>>>>>>>>>>>
>>>> >>>>>>>>>>>>> I would prefer the 2nd.
>>>> >>>>>>>>>>>>> Otherwise I agree with the contents of the proposal. It is 
>>>> >>>>>>>>>>>>> nicely done! (edited)
>>>> >>>>>>>>>>>>>
>>>> >>>>>>>>>>>>> Yufei Gu <[email protected]> ezt írta (időpont: 2025. 
>>>> >>>>>>>>>>>>> szept. 18., Cs, 2:54):
>>>> >>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>> Thanks for the proposal. It's a nice feature to make retry 
>>>> >>>>>>>>>>>>>> more reliable and efficient. Left some comments.
>>>> >>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>> Yufei
>>>> >>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>> On Mon, Sep 15, 2025 at 3:53 PM Kevin Liu 
>>>> >>>>>>>>>>>>>> <[email protected]> wrote:
>>>> >>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>> Thanks for writing up the proposal! Makes sense to add 
>>>> >>>>>>>>>>>>>>> idempotency to mutation requests.
>>>> >>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>> It would be helpful to add this feature to both the 
>>>> >>>>>>>>>>>>>>> catalog test framework and the iceberg-rest-fixture. The 
>>>> >>>>>>>>>>>>>>> latter is used by the subprojects for testing and would 
>>>> >>>>>>>>>>>>>>> come in handy when we want to test out the client 
>>>> >>>>>>>>>>>>>>> implementation.
>>>> >>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>> For other reviewers, the Stripe documentation on 
>>>> >>>>>>>>>>>>>>> idempotency was a helpful read, 
>>>> >>>>>>>>>>>>>>> https://docs.stripe.com/api/idempotent_requests.
>>>> >>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>> Best,
>>>> >>>>>>>>>>>>>>> Kevin Liu
>>>> >>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>> On Mon, Sep 15, 2025 at 11:38 AM Szehon Ho 
>>>> >>>>>>>>>>>>>>> <[email protected]> wrote:
>>>> >>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>> Hi,
>>>> >>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>> Sounds like fairly standard practice and makes sense to 
>>>> >>>>>>>>>>>>>>>> me in the first read.
>>>> >>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>> Thanks,
>>>> >>>>>>>>>>>>>>>> Szehon
>>>> >>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>> On Mon, Sep 15, 2025 at 10:09 AM Russell Spitzer 
>>>> >>>>>>>>>>>>>>>> <[email protected]> wrote:
>>>> >>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>> I think based on the feedback on the proposal and in 
>>>> >>>>>>>>>>>>>>>>> recent syncs we should probably move forward with the 
>>>> >>>>>>>>>>>>>>>>> actual Spec Change PR so we can see what this looks like 
>>>> >>>>>>>>>>>>>>>>> and move on to a discussion of how the Catalog test 
>>>> >>>>>>>>>>>>>>>>> framework should test this.
>>>> >>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>> On 2025/08/22 18:26:23 huaxin gao wrote:
>>>> >>>>>>>>>>>>>>>>> > Hi all,
>>>> >>>>>>>>>>>>>>>>> >
>>>> >>>>>>>>>>>>>>>>> > I’d like to propose a change to Iceberg’s REST API to 
>>>> >>>>>>>>>>>>>>>>> > make mutation
>>>> >>>>>>>>>>>>>>>>> > requests safely retryable.
>>>> >>>>>>>>>>>>>>>>> >
>>>> >>>>>>>>>>>>>>>>> > *The Problem*
>>>> >>>>>>>>>>>>>>>>> > If a POST mutation (e.g., updateTable) succeeds in the 
>>>> >>>>>>>>>>>>>>>>> > catalog but the
>>>> >>>>>>>>>>>>>>>>> > client doesn’t receive the response (timeout, 
>>>> >>>>>>>>>>>>>>>>> > connection closed, etc.), a
>>>> >>>>>>>>>>>>>>>>> > second attempt can hit 409 Conflict. The client 
>>>> >>>>>>>>>>>>>>>>> > interprets the 409 as a
>>>> >>>>>>>>>>>>>>>>> > failed commit and deletes the associated metadata 
>>>> >>>>>>>>>>>>>>>>> > files, causing
>>>> >>>>>>>>>>>>>>>>> > catalog/storage inconsistency.
>>>> >>>>>>>>>>>>>>>>> >
>>>> >>>>>>>>>>>>>>>>> > *The Proposed Solution*
>>>> >>>>>>>>>>>>>>>>> > Introduces an optional Idempotency-Key HTTP header on 
>>>> >>>>>>>>>>>>>>>>> > REST mutation
>>>> >>>>>>>>>>>>>>>>> > endpoints and has the Iceberg client pass it through.
>>>> >>>>>>>>>>>>>>>>> >
>>>> >>>>>>>>>>>>>>>>> > *Semantics *(first processed request wins):
>>>> >>>>>>>>>>>>>>>>> >
>>>> >>>>>>>>>>>>>>>>> >    -
>>>> >>>>>>>>>>>>>>>>> >
>>>> >>>>>>>>>>>>>>>>> >    Same key + same canonical payload -> return the 
>>>> >>>>>>>>>>>>>>>>> > original result (no
>>>> >>>>>>>>>>>>>>>>> >    re-execution).
>>>> >>>>>>>>>>>>>>>>> >    -
>>>> >>>>>>>>>>>>>>>>> >
>>>> >>>>>>>>>>>>>>>>> >    Same key + different payload -> 422 (Unprocessable 
>>>> >>>>>>>>>>>>>>>>> > Content).
>>>> >>>>>>>>>>>>>>>>> >
>>>> >>>>>>>>>>>>>>>>> > *Capability discovery:* catalogs can advertise support 
>>>> >>>>>>>>>>>>>>>>> > and retention so
>>>> >>>>>>>>>>>>>>>>> > clients know when a retry is safe, e.g.
>>>> >>>>>>>>>>>>>>>>> >
>>>> >>>>>>>>>>>>>>>>> > {
>>>> >>>>>>>>>>>>>>>>> >   "idempotency-tokens-respected": true,
>>>> >>>>>>>>>>>>>>>>> >   "idempotency-token-lifetime": "30m" }
>>>> >>>>>>>>>>>>>>>>> >
>>>> >>>>>>>>>>>>>>>>> > *Scope in Iceberg:* update the OpenAPI to include the 
>>>> >>>>>>>>>>>>>>>>> > header, and add
>>>> >>>>>>>>>>>>>>>>> > client pass-through + honoring capability discovery. 
>>>> >>>>>>>>>>>>>>>>> > No server
>>>> >>>>>>>>>>>>>>>>> > implementation is mandated—catalogs (e.g., Polaris) 
>>>> >>>>>>>>>>>>>>>>> > can implement
>>>> >>>>>>>>>>>>>>>>> > storage/TTL/replay as they choose.
>>>> >>>>>>>>>>>>>>>>> >
>>>> >>>>>>>>>>>>>>>>> > *Standards alignment:* uses the industry-standard 
>>>> >>>>>>>>>>>>>>>>> > header name and matches
>>>> >>>>>>>>>>>>>>>>> > the IETF HTTPAPI Idempotency-Key draft
>>>> >>>>>>>>>>>>>>>>> > <https://datatracker.ietf.org/doc/html/draft-ietf-httpapi-idempotency-key-header>
>>>> >>>>>>>>>>>>>>>>> > semantics.
>>>> >>>>>>>>>>>>>>>>> >
>>>> >>>>>>>>>>>>>>>>> > *Compatibility:* fully backward compatible. Servers 
>>>> >>>>>>>>>>>>>>>>> > that don’t support it
>>>> >>>>>>>>>>>>>>>>> > can ignore the header; clients can detect support via 
>>>> >>>>>>>>>>>>>>>>> > capability discovery.
>>>> >>>>>>>>>>>>>>>>> >
>>>> >>>>>>>>>>>>>>>>> > Here is the proposal
>>>> >>>>>>>>>>>>>>>>> > <https://docs.google.com/document/d/1WyiIk08JRe8AjWh63txIP4i2xcIUHYQWFrF_1CCS3uw/edit?tab=t.0>.
>>>> >>>>>>>>>>>>>>>>> > Looking forward to your thoughts.
>>>> >>>>>>>>>>>>>>>>> >
>>>> >>>>>>>>>>>>>>>>> > Thanks,
>>>> >>>>>>>>>>>>>>>>> >
>>>> >>>>>>>>>>>>>>>>> > Huaxin
>>>> >>>>>>>>>>>>>>>>> >

Re: [DISCUSS] Iceberg REST Catalog Idempotency

Reply via email to