i Yufei, Huaxin, Thanks for the thoughtful replies, and for opening the issue for `createTableStaged` [1]. I’m glad we’re converging on the handler-level direction.
I agree benchmarking can run in parallel with implementation. For my own review, I’d prefer to wait until reproducible benchmark results are posted (using the scope from my April 14 note, or with any scope changes called out). Happy to review once that is available. Best, Robert [1] https://github.com/apache/polaris/issues/4200 On Tue, Apr 14, 2026 at 7:54 PM huaxin gao <[email protected]> wrote: > Hi Robert, > > Thanks for the feedback. Glad the handler-level design addresses the > concerns. > > I also feel we should converge on the design and implementation first, and > run benchmarks in parallel. The access patterns will be different with the > handler-level approach, so benchmarking the current filter-based code > wouldn't be representative. > > I'll open a GitHub issue to track createTableStaged support, and I'll open > a fresh PR for the handler-level implementation. > > Best, > > Huaxin > > On Tue, Apr 14, 2026 at 10:30 AM Yufei Gu <[email protected]> wrote: > > > Hi Robert, > > > > Thanks for the thorough review and for clearly laying out the concerns, > > this is very helpful. > > > > For point 1(Q3), I do not think this needs to be a hard prerequisite > before > > looking at the implementation. > > > > The handler level design and correctness concerns are largely independent > > from the exact performance characteristics. It would be more productive > to > > first converge on the right architecture, then validate and iterate on > > performance with concrete measurements. Also, while the store is reused, > > the access pattern and critical sections change with the handler level > > approach. Measuring the current implementation in isolation may not > reflect > > the actual behavior of the final design. > > > > We already have a similar situation with the NoSQL benchmark, where the > > code has been merged for a while but the benchmarking work is still in > > progress. That has not blocked design or code review, and we are > iterating > > on performance as we go. > > > > I agree that realistic benchmarking is important. But I would suggest we > > treat this as a parallel track, not a gate. Otherwise we risk blocking > > progress on the design without knowing whether the measured overhead is > > even representative of the final system. > > > > Happy to help define a benchmark plan once the design is settled. > > > > Yufei > > > > > > On Tue, Apr 14, 2026 at 6:28 AM Robert Stupp <[email protected]> wrote: > > > > > Hi Huaxin, > > > > > > yeah, the handler-level approach is the right call, I'm glad you went > > with > > > it. > > > > > > To be explicit about what your proposal aims to resolve: > > > > > > - Authorization bypass is fixed. The check now runs after > resource-level > > > authorization. > > > - No credentials in the database. Fresh credentials re-vended on replay > > via > > > the normal vending path. > > > - Phase 2 architecture problem gone. The handler has the principal, > > storage > > > config, and IAM access that the filter never could have. > > > - Caller identity is in the binding. > > > > > > Three things still need adressing: > > > > > > First, and this is a hard prerequisite before I look at any > > implementation > > > code: Q3 from April 5 is still open. The store is being reused as-is, > so > > > the overhead can be measured right now without new handler code. 3+ DB > > > round-trips per mutation on the catalog connection pool, write-hot > table, > > > polling loops under duplicate traffic - those numbers need to come > from a > > > realistic setup: networked database under concurrent load, not a local > > > Postgres. The latency concerns are specifically about Aurora and > > > CockroachDB, and the polling storm only shows up under concurrent > > duplicate > > > traffic. Please share the methodology so the results are reproducible. > > > > > > Second: please open a GitHub issue for the createTableStaged deferral. > > > > > > Third: since the filter is going away entirely and the critical > sections > > > move to the handlers, does it make sense to close #3803 and open a > fresh > > > PR? The old diff is mostly code that's being deleted, and it carries a > > lot > > > of review history. A clean PR would be easier for everyone to review - > > just > > > reference the old one for context. > > > > > > Happy to do a fresh review once the new implementation is up. > > > > > > Best, > > > Robert > > > > > > > > > On Tue, Apr 14, 2026 at 1:44 AM huaxin gao <[email protected]> > > wrote: > > > > > > > Hi Robert, > > > > > > > > Here is a revised design proposal for the idempotency replay > mechanism. > > > I'd > > > > appreciate your feedback. Thanks! > > > > > > > > *Moving idempotency to handler level* > > > > > > > > The idempotency check moves from the HTTP filter into the handler > > > methods, > > > > after authorization. The filter is removed. The caller's principal > hash > > > is > > > > included in the idempotency binding and checked on replay. > > > > > > > > *Credential-bearing mutations (createTable)* > > > > > > > > On replay, instead of returning a stored response, the handler calls > > > > buildLoadTableResponseWithDelegationCredentials with the existing > > table's > > > > metadata. This vends fresh credentials for the current caller through > > the > > > > normal credential vending path. No credentials are stored in the > > > database — > > > > the idempotency record only stores the key, principal hash, table > > > > identifier, and status code. > > > > > > > > *Non-credential mutations (dropTable, renameTable, namespace/view > > > > operations)* > > > > > > > > These do not vend credentials, so there is no leakage concern. On > > replay, > > > > the handler returns the stored status code and minimal response body. > > > > > > > > *createTableStaged* > > > > > > > > Staged tables are not persisted to the catalog, so the method can > > safely > > > > re-run on retry. Idempotency support for this endpoint can be > deferred. > > > > > > > > *What is reused from the current implementation* > > > > > > > > The IdempotencyStore interface, database schema, configuration, and > > > > background purge logic are reused. The main change is moving the > > > > check/finalize logic from the filter into the handler. > > > > > > > > Best, > > > > > > > > Huaxin > > > > > > > > On Sun, Apr 5, 2026 at 11:53 AM huaxin gao <[email protected]> > > > wrote: > > > > > > > > > Hi Robert, > > > > > > > > > > Thanks for the follow-up. > > > > > > > > > > I appreciate the clarification. I read the listed items as fix > > options > > > > and > > > > > picked option 2, and the Phase 1/Phase 2 split was purely to keep > the > > > PR > > > > at > > > > > a reviewable size, not a design choice. > > > > > > > > > > I agree that the caller's identity should be part of the binding. > > That > > > is > > > > > straightforward to add. > > > > > > > > > > On the architecture question, whether re-vending can work within > the > > > > > filter or needs a handler-level approach, I want to give this > proper > > > > > thought rather than rush an answer. I may not be able to get back > to > > > you > > > > on > > > > > this until next week (the week of 4/13). > > > > > > > > > > Thanks, > > > > > > > > > > Huaxin > > > > > > > > > > On Sat, Apr 4, 2026 at 11:42 PM Robert Stupp <[email protected]> > wrote: > > > > > > > > > >> Hi Huaxin, > > > > >> > > > > >> Thanks for responding. A few corrections and clarifications. > > > > >> > > > > >> My PR review described three consequences of the missing caller > > > > >> identity binding and the credential leakage attack vector. Those > > > > >> were not suggestions for a Phase 1/Phase 2 split. The offline > > > > >> decision to split into phases, and what each phase contains, was > > > > >> yours [1]. > > > > >> > > > > >> On the phasing: the concern is not PR size. The concern is that > > > > >> Phase 1 alone ships a full authorization bypass. The filter > replays > > > > >> cached responses before Polaris's resource-level authorization > > > > >> runs. Stripping credentials does not fix that. Any authenticated > > > > >> user with a valid idempotency key can receive a cached response > > > > >> for a resource they are not authorized to access. That is the case > > > > >> with or without credentials in the response. > > > > >> > > > > >> "We will not release it until entire idempotency coding is > > > > >> finished" is reassuring, but the architecture question remains: > > > > >> can the generic filter approach support re-vending at all? My > > > > >> email explained why I believe it cannot, since it would require > > > > >> the filter to authorize the caller, resolve storage locations, > > > > >> call cloud IAM, and manage credential caching. That is the catalog > > > > >> handler's job, not a filter's. > > > > >> > > > > >> The questions from my original email that still need answers: > > > > >> > > > > >> How does the binding prevent cross-principal cache hits without > > > > >> including the caller's identity? > > > > >> > > > > >> How does Phase 2 re-vend credentials from a generic HTTP filter > > > > >> without reimplementing the catalog handler's authorization and > > > > >> credential vending logic? > > > > >> > > > > >> Has the operational overhead (3 DB round-trips per mutation on > > > > >> the catalog's connection pool) been benchmarked on any backend? > > > > >> > > > > >> I'd like to resolve these on this thread before more code is > > > > >> written. > > > > >> > > > > >> Best, > > > > >> Robert > > > > >> > > > > >> [1] > > > https://github.com/apache/polaris/pull/3803#issuecomment-4180969838 > > > > >> > > > > >> On Sun, Apr 5, 2026 at 7:55 AM huaxin gao <[email protected] > > > > > > wrote: > > > > >> > > > > >> > Hi Robert, > > > > >> > > > > > >> > Thanks for raising this on the mailing list. I acknowledge the > > > design > > > > >> > concerns you listed. > > > > >> > > > > > >> > To clarify: I picked your option 2 (strip credentials and > re-vend > > on > > > > >> > replay) you suggested in the PR review. The offline discussion > > with > > > > >> Yufei > > > > >> > and Prashant was simply to verify that I picked the best option > > from > > > > all > > > > >> > the three suggestions you listed in the review. > > > > >> > > > > > >> > The reason I split it into two phases is that the current PR is > > > > already > > > > >> > around 3000 lines. Adding re-vending logic on top would make it > > very > > > > >> hard > > > > >> > to review. This is a big feature and we need to do it piece by > > > piece. > > > > We > > > > >> > will not release it until entire idempotency coding is finished. > > > > >> > > > > > >> > I'm happy to discuss the architecture further on this thread, > > > whether > > > > >> the > > > > >> > filter approach can support re-vending, or whether we need a > > > > >> handler-level > > > > >> > design. I may be slow to respond next week but I'll follow up > > after > > > > >> that. > > > > >> > > > > > >> > Thanks, > > > > >> > > > > > >> > Huaxin > > > > >> > > > > > >> > On Sat, Apr 4, 2026 at 9:34 AM Robert Stupp <[email protected]> > > wrote: > > > > >> > > > > > >> > > Hi all, > > > > >> > > > > > > >> > > I need to raise some fundamental design concerns with the > > current > > > > >> > > Idempotency-Key implementation in PR #3803 [1]. Some of these > > were > > > > >> > > raised in November 2025 [2] and were acknowledged [3] but > never > > > > >> > > addressed in the implementation. > > > > >> > > > > > > >> > > I raised the caller identity concern on the > > > > >> > > mailing list in November 2025 [2]. In December 2025 [4], I > > > > >> > > reiterated that all existing implementations I could find do > > > > >> > > request fingerprinting including the request body, and warned > > that > > > > >> > > the idempotency subsystem itself must not become a source of > > > > >> > > failures. The PR was opened in February 2026 without > addressing > > > > >> > > either point. > > > > >> > > > > > > >> > > The credential leakage consequences surfaced during code > review > > > > >> > > [5]. The response was an offline decision between three > > > > contributors, > > > > >> > > without mailing list discussion [6]. The decision was to split > > > > >> > > into Phase 1 (strip credentials, which produces broken > responses > > > > >> > > but does not fix the authorization bypass) and Phase 2 > (re-vend > > > > >> > > on replay, which would require a complete redesign and > > > > >> > > reimplementation). Details below. > > > > >> > > > > > > >> > > I think we need to resolve these at the design level before > more > > > > >> > > code is written. > > > > >> > > > > > > >> > > I will not do further code reviews on this PR until these > design > > > > >> > > concerns are resolved. > > > > >> > > > > > > >> > > > > > > >> > > 1. Caller identity is not part of the idempotency binding > > > > >> > > > > > > >> > > The caller's identity is not part of the binding. The > > > > >> > > filter replays cached responses before Polaris's > resource-level > > > > >> > > authorization runs. Any authenticated user with a valid > > > idempotency > > > > >> > > key can receive a cached response for a resource they are not > > > > >> > > authorized to access, including vended storage credentials. > This > > > > >> > > is a full authorization bypass. > > > > >> > > > > > > >> > > Idempotency keys are HTTP headers that end up in access logs, > > > > >> > > observability tooling, debug output. They cannot be a security > > > > >> > > boundary. I described the attack vector in my PR review [5]. > > > > >> > > > > > > >> > > The proposed Phase 2 [6] says it will "re-authorize the user > and > > > > >> > > re-vend fresh credentials on replay." That addresses > credentials > > > > >> > > but not the binding itself. > > > > >> > > > > > > >> > > > > > > >> > > 2. Credential stripping breaks Polaris's deployment model > > > > >> > > > > > > >> > > Without mailing list discussion, the decision was made offline > > > > >> > > to strip credentials before storing ("Phase 1") [6]. > > > > >> > > > > > > >> > > Polaris's purpose is to vend down-scoped credentials. Clients > > have > > > > >> > > no ambient storage access. > > > > >> > > > > > > >> > > A stripped response returns a successful response ("table > > > created") > > > > >> > > to the client, which then fails hard because it cannot access > > the > > > > >> > > object storage. Query engines (Spark, Trino, etc.) will fail > on > > > > >> > > the first storage operation against that table. Query engine > > users > > > > >> > > have no way to resolve this, they don't control the REST > catalog > > > > >> > > interaction. For stage-create, the client gets a successful > > > > >> > > response telling it to write data but has no credentials to > > write > > > > >> > > with. No fallback exists. > > > > >> > > > > > > >> > > The stripped response is spec-compliant, but spec-compliant > and > > > > >> > > operationally correct are not the same thing. > > > > >> > > > > > > >> > > > > > > >> > > 3. Phase 2 is architecturally incompatible with Phase 1 > > > > >> > > > > > > >> > > The filter is a generic HTTP filter below the application > layer. > > > It > > > > >> > > treats responses as opaque JSON. Credential vending happens > deep > > > > >> > > inside IcebergCatalogHandler, scoped to specific table > locations > > > > >> > > and principals, through cloud IAM calls (STS AssumeRole, GCS > > token > > > > >> > > exchange). > > > > >> > > > > > > >> > > Re-vending on replay would require the filter to resolve the > > > > >> > > table's storage location, authorize the caller, call cloud IAM > > to > > > > >> > > mint scoped credentials, and manage credential caching. That > is > > > > >> > > not a filter, that is reimplementing the catalog handler's > > > > >> > > authorization and credential vending logic. > > > > >> > > > > > > >> > > Phase 1 and Phase 2 need to be designed and shipped together. > > > > >> > > > > > > >> > > > > > > >> > > 4. Operational overhead > > > > >> > > > > > > >> > > Every idempotent mutation adds 3 database round-trips (INSERT, > > > > UPDATE, > > > > >> > > heartbeat UPDATEs), each borrowing its own connection > > > > >> > > (autoCommit=true, no reuse), sharing the catalog's connection > > pool > > > > >> > > and RDBMS. This added latency increases the probability of > > client > > > > >> > > timeouts, which are exactly the transient failures that > > > idempotency > > > > >> > > is supposed to prevent. > > > > >> > > > > > > >> > > The idempotency table is high-churn and write-hot, competing > > with > > > > >> > > catalog operations for connection pool capacity. The overhead > > gets > > > > >> > > worse on managed and distributed database backends. Duplicate > > > > >> > > requests trigger polling loops with repeated database reads. > > > > >> > > > > > > >> > > No benchmark data has been presented for any backend. > > > > >> > > > > > > >> > > > > > > >> > > What needs to change > > > > >> > > > > > > >> > > The current generic-filter approach cannot support these > > > > >> > > requirements. This needs a different design. > > > > >> > > > > > > >> > > A new design must address the following: > > > > >> > > > > > > >> > > a) The idempotency binding MUST include the caller's principal > > > > >> > > identity. > > > > >> > > b) Credential-bearing responses must not be stored as-is. On > > > > >> > > replay, the handler must build a fresh response with fresh > > > > >> > > credentials for the current caller. > > > > >> > > c) The operational overhead needs to be benchmarked. > > > > >> > > > > > > >> > > I am happy to review code once the design addresses these > > points. > > > > >> > > > > > > >> > > Best, > > > > >> > > Robert > > > > >> > > > > > > >> > > [1] https://github.com/apache/polaris/pull/3803 > > > > >> > > [2] > > > > https://lists.apache.org/thread/qqrgr2bzsp69629jdj8kf39m10pzwy6l > > > > >> > > (Nov 24, 2025: "Building a service-internal key from the > > > client > > > > >> > > provided idempotency key plus more data from the request > > gives > > > > >> > > better uniqueness. That more data can come from: operation > > id, > > > > >> > > all operation parameters, including the whole payload, > > quite a > > > > >> > > few HTTP request headers like user-agent, authorization, > > host, > > > > >> > > accept-* and also the client's address.") > > > > >> > > [3] > > > > https://lists.apache.org/thread/7vy44mxsq2cgtp8gsj0r8pk8blp37h5b > > > > >> > > (Dec 8, 2025: "I agree that a bare Idempotency-Key + > entity > > > > >> > > identifier is not enough to protect against buggy or > > malicious > > > > >> > > clients; fingerprinting the full request would be > > stronger.") > > > > >> > > [4] > > > > https://lists.apache.org/thread/jt3gmpnjh6z490dvxorjs6ms00kvo264 > > > > >> > > (Dec 8, 2025: "We should avoid failing Polaris if this > > > subsystem > > > > >> > > fails, or letting this subsystem be a reason for its > > existence > > > > >> > > (aka retry due to timeouts because the idempotent-request > > > > >> > > subsystem hangs).") > > > > >> > > [5] > > > > >> > > > > > > >> > > > > > > https://github.com/apache/polaris/pull/3803#pullrequestreview-4051690055 > > > > >> > > [6] > > > > >> > https://github.com/apache/polaris/pull/3803#issuecomment-4180969838 > > > > >> > > > > > > >> > > > > > >> > > > > > > > > > > > > > > >
