[PR] feat: implement on-demand batch presigning for multipart uploads [texera]

via GitHub Sun, 26 Oct 2025 18:08:12 -0700


xuang7 opened a new pull request, #4004:
URL: https://github.com/apache/texera/pull/4004


   <!--
   Thanks for sending a pull request (PR)! Here are some tips for you:
     1. If this is your first time, please read our contributor guidelines: 
        [Contributing to 
Texera](https://github.com/apache/texera/blob/main/CONTRIBUTING.md)
     2. Ensure you have added or run the appropriate tests for your PR
     3. If the PR is work in progress, mark it a draft on GitHub.
     4. Please write your PR title to summarize what this PR proposes, we 
       are following Conventional Commits style for PR titles as well.
     5. Be sure to keep the PR description updated to reflect all changes.
   -->
   
   ### What changes were proposed in this PR?
   <!--
   Please clarify what changes you are proposing. The purpose of this section 
   is to outline the changes. Here are some tips for you:
     1. If you propose a new API, clarify the use case for a new API.
     2. If you fix a bug, you can clarify why it is a bug.
     3. If it is a refactoring, clarify what has been changed.
     3. It would be helpful to include a before-and-after comparison using 
        screenshots or GIFs.
     4. Please consider writing useful notes for better and faster reviews.
   -->
   This PR introduces on-demand (batch) presigning for multipart uploads to 
reduce failures from expired pre-signed URLs. Previously, all part URLs were 
pre-signed at the start using an experimental LakeFS API. For long uploads, 
URLs for later parts could expire (after 15 min locally or 30 min on the 
server), causing the upload to fail midway. The revised implementation uses the 
LakeFS function for the initial batch of URLs, then presigns subsequent batches 
directly using the `S3presigner`.
   
   **Changes (Backend)**
   - Adds a new method, `presignUploadParts`. This method uses `s3Presigner` to 
sign a specific list of provided partNumbers
   - The /multipart-upload endpoint coordinates new signing flow:
     -  type=init: This now only signs and returns the first batch of URLs 
(e.g., parts 1-100 or 1-20).
     -  type=sign(New operation): This endpoint receives a `pendingParts` list 
and `physicalAddress` from the client. It calls the new 
`S3StorageClient.presignUploadParts` to sign the requested batch and returns 
the new URLs.
   
   **Changes (Frontend)**
   - Switch multipart upload pipeline to RxJS expand to implement stateless, 
recursive fetching of pre-signed URL batches:
     - It starts by calling `type=init` to get the first batch.
     - On demand, call `type=sign` for the next range of pendingParts and 
stream those URLs.
   - Introduced a `urlBatchSize` variable (Configurable, default: 100) to 
control how many URLs are requested in each init and sign call.
   
   ### Any related issues, documentation, discussions?
   <!--
   Please use this section to link other resources if not mentioned already.
     1. If this PR fixes an issue, please include `Fixes #1234`, `Resolves 
#1234`
        or `Closes #1234`. If it is only related, simply mention the issue 
number.
     4. If there is design documentation, please add the link.
     5. If there is a discussion in the mailing list, please add the link.
   -->
   Fixes #3837
   Resolves URL expiration for pending parts. Fully handling interruptions 
during part uploads requires resumable uploads.
   
   ### How was this PR tested?
   <!--
   If tests were added, say they were added here. Or simply mention that if the 
PR 
   is tested with existing test cases.  Make sure to include/update test cases 
that
   check the changes thoroughly including negative and positive cases if 
possible.
   If it was tested in a way different from regular unit tests, please clarify 
how
   you tested step by step, ideally copy and paste-able, so that other 
reviewers can
   test and check, and descendants can verify in the future. If tests were not 
added, 
   please describe why they were not added and/or why it was difficult to add. 
   -->
   Tested with existing automated test cases and local manual tests.
   Additional scenarios (to be added)
   
   ### Was this PR authored or co-authored using generative AI tooling?
   <!--
   If generative AI tooling has been used in the process of authoring this PR, 
   please include the phrase: 'Generated-by: ' followed by the name of the tool 
   and its version. If no, write 'No'. 
   Please refer to the [ASF Generative Tooling 
Guidance](https://www.apache.org/legal/generative-tooling.html) for details.
   -->
   No
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[PR] feat: implement on-demand batch presigning for multipart uploads [texera]

Reply via email to