carloea2 opened a new pull request, #4114:
URL: https://github.com/apache/texera/pull/4114

   ### What changes were proposed in this PR?
   
   * **DB / schema**
   
     * Add `dataset_upload_session` and `dataset_upload_session_part` tables to 
track multipart upload sessions, per-part status, and S3/LakeFS metadata.
   * **Backend (`DatasetResource`)**
   
     * Partially new multipart upload API:
   
       * `POST /dataset/multipart-upload?type=init` → creates a LakeFS 
multipart session, stores it in DB, and returns an `uploadToken`.
       * `POST /dataset/multipart-upload/part?token=...&partNumber=...` → 
streams a single part to the presigned URL, with row-level locking and 
`PENDING/UPLOADING/COMPLETED` state transitions.
       * `POST /dataset/multipart-upload?type=finish|abort` → completes or 
aborts the LakeFS multipart upload and cleans up DB records.
     * Keep existing access control and dataset permissions enforced on all new 
endpoints.
   * **Frontend service (`dataset.service.ts`)**
   
     * Main changes in `multipartUpload(...)`:
   
       * Calls `init` to get `uploadToken`.
       * Uploads file parts via `/multipart-upload/part` streaming them with 
concurrency.
   
   * **Frontend component (`dataset-detail.component.ts`)**
   
       * Use  `uploadToken` to cancel/abort.
   
   ---
   
   ### Any related issues, documentation, discussions?
   
   Closes #4110 
   
   ---
   
   ### How was this PR tested?
   
   * Manually uploaded large files via the dataset detail page (single and 
multiple), checked:
   
     * Progress, speed, and ETA updates.
     * Abort behavior (UI state + DB session cleanup).
     * Successful completion path (all parts `COMPLETED`, LakeFS object 
present, dataset version creation works).
     * Unit testing is missing
   ---
   
   ### Was this PR authored or co-authored using generative AI tooling?
   
   GPT partial use.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to