AFAIK, there is no bulk API to generate pre-signed urls. Need to generate
pre-signed urls one by one. Even with parallelization, it can still be slow
for larger server-side planning.

Amogh has a valid concern on client integration. Is there PoC on how this
can be plumbed through at the client side in iceberg-core?

On Thu, Nov 13, 2025 at 3:09 AM Amogh Jahagirdar <[email protected]> wrote:

> I'm +1 on this, though I did want to bring up a point on also achieving
> this via the server sending back presigned URLs for the file locations. To
> be clear, I don't think these are mutually exclusive approaches and like I
> mentioned I'm +1 on a path for leveraging catalog vended storage
> credentials as done in this PR; I just wanted to think through the
> tradeoffs.
>
> I think the clearest benefit for the proposed approach is that many
> catalogs already have the mechanisms to vend credentials to clients, so
> this and the other change for refreshing credentials for a given plan is
> likely not a heavy lift for *servers *to achieve. I think the complexity
> will largely be on the client implementation in this approach, where we're
> going to have to work through some FileIO scoping challenges for a given
> plan. In the end, it's all doable but it is some level of complexity
> shifted to the client (handling the refreshing/scoping/any caching on top
> of that).
>
> Presigned URLs are supported by all the major object storage providers as
> far as I checked. Clients would have to change in order to distinguish
> between expected object storage URI structures and presigned URLs, but I
> think that overall the client side complexity for scoping is reduced
> compared to the credential vending approach. I think in this approach
> complexity is shifted to the server where the server needs to sign the
> objects. One could imagine at large scale of files, there's likely a lot of
> additional load on the server (CPU bound signing). Also later on, if
> there's desire to be able to extend the protocol to say "Hey read
> everything in this directory", then a scoped credential for that is
> desirable (required?).
>
> My TLDR analysis is that credential vending in scan planning is probably
> net better for larger scale scans, and is also a lighter lift for server
> implementations today while presigned URLs is probably better in terms of
> making it easy for a wide variety of clients to integrate. In the end, I
> don't think the 2 approaches are incompatible with each other and I don't
> see any one way doors so I think it's entirely reasonable to start with the
> proposed approach. Wonder what others think!
>
> Thanks,
> Amogh Jahagirdar
>
>
>
> On Wed, Nov 12, 2025 at 7:49 AM Eduard Tudenhöfner <
> [email protected]> wrote:
>
>> Hey everyone,
>>
>> For server-side scan planning we missed adding storage credentials, hence
>> I'm proposing to add them to the response of the */plan* endpoint.
>>
>> The OpenAPI changes can be seen in PR #14563
>> <https://github.com/apache/iceberg/pull/14563>.
>>
>> Looking forward to your thoughts and feedback.
>>
>> Thanks,
>> Eduard
>>
>

Reply via email to