I'm +1 on this, though I did want to bring up a point on also achieving
this via the server sending back presigned URLs for the file locations. To
be clear, I don't think these are mutually exclusive approaches and like I
mentioned I'm +1 on a path for leveraging catalog vended storage
credentials as done in this PR; I just wanted to think through the
tradeoffs.

I think the clearest benefit for the proposed approach is that many
catalogs already have the mechanisms to vend credentials to clients, so
this and the other change for refreshing credentials for a given plan is
likely not a heavy lift for *servers *to achieve. I think the complexity
will largely be on the client implementation in this approach, where we're
going to have to work through some FileIO scoping challenges for a given
plan. In the end, it's all doable but it is some level of complexity
shifted to the client (handling the refreshing/scoping/any caching on top
of that).

Presigned URLs are supported by all the major object storage providers as
far as I checked. Clients would have to change in order to distinguish
between expected object storage URI structures and presigned URLs, but I
think that overall the client side complexity for scoping is reduced
compared to the credential vending approach. I think in this approach
complexity is shifted to the server where the server needs to sign the
objects. One could imagine at large scale of files, there's likely a lot of
additional load on the server (CPU bound signing). Also later on, if
there's desire to be able to extend the protocol to say "Hey read
everything in this directory", then a scoped credential for that is
desirable (required?).

My TLDR analysis is that credential vending in scan planning is probably
net better for larger scale scans, and is also a lighter lift for server
implementations today while presigned URLs is probably better in terms of
making it easy for a wide variety of clients to integrate. In the end, I
don't think the 2 approaches are incompatible with each other and I don't
see any one way doors so I think it's entirely reasonable to start with the
proposed approach. Wonder what others think!

Thanks,
Amogh Jahagirdar



On Wed, Nov 12, 2025 at 7:49 AM Eduard Tudenhöfner <[email protected]>
wrote:

> Hey everyone,
>
> For server-side scan planning we missed adding storage credentials, hence
> I'm proposing to add them to the response of the */plan* endpoint.
>
> The OpenAPI changes can be seen in PR #14563
> <https://github.com/apache/iceberg/pull/14563>.
>
> Looking forward to your thoughts and feedback.
>
> Thanks,
> Eduard
>

Reply via email to