AFAIK, there is no bulk API to generate pre-signed urls. Need to generate pre-signed urls one by one. Even with parallelization, it can still be slow for larger server-side planning.
Amogh has a valid concern on client integration. Is there PoC on how this can be plumbed through at the client side in iceberg-core? On Thu, Nov 13, 2025 at 3:09 AM Amogh Jahagirdar <[email protected]> wrote: > I'm +1 on this, though I did want to bring up a point on also achieving > this via the server sending back presigned URLs for the file locations. To > be clear, I don't think these are mutually exclusive approaches and like I > mentioned I'm +1 on a path for leveraging catalog vended storage > credentials as done in this PR; I just wanted to think through the > tradeoffs. > > I think the clearest benefit for the proposed approach is that many > catalogs already have the mechanisms to vend credentials to clients, so > this and the other change for refreshing credentials for a given plan is > likely not a heavy lift for *servers *to achieve. I think the complexity > will largely be on the client implementation in this approach, where we're > going to have to work through some FileIO scoping challenges for a given > plan. In the end, it's all doable but it is some level of complexity > shifted to the client (handling the refreshing/scoping/any caching on top > of that). > > Presigned URLs are supported by all the major object storage providers as > far as I checked. Clients would have to change in order to distinguish > between expected object storage URI structures and presigned URLs, but I > think that overall the client side complexity for scoping is reduced > compared to the credential vending approach. I think in this approach > complexity is shifted to the server where the server needs to sign the > objects. One could imagine at large scale of files, there's likely a lot of > additional load on the server (CPU bound signing). Also later on, if > there's desire to be able to extend the protocol to say "Hey read > everything in this directory", then a scoped credential for that is > desirable (required?). > > My TLDR analysis is that credential vending in scan planning is probably > net better for larger scale scans, and is also a lighter lift for server > implementations today while presigned URLs is probably better in terms of > making it easy for a wide variety of clients to integrate. In the end, I > don't think the 2 approaches are incompatible with each other and I don't > see any one way doors so I think it's entirely reasonable to start with the > proposed approach. Wonder what others think! > > Thanks, > Amogh Jahagirdar > > > > On Wed, Nov 12, 2025 at 7:49 AM Eduard Tudenhöfner < > [email protected]> wrote: > >> Hey everyone, >> >> For server-side scan planning we missed adding storage credentials, hence >> I'm proposing to add them to the response of the */plan* endpoint. >> >> The OpenAPI changes can be seen in PR #14563 >> <https://github.com/apache/iceberg/pull/14563>. >> >> Looking forward to your thoughts and feedback. >> >> Thanks, >> Eduard >> >
