carloea2 opened a new issue, #5938: URL: https://github.com/apache/texera/issues/5938
### Task Summary ### Task Summary Improve dataset upload retry behavior so batch uploads distinguish between incomplete multipart uploads and files that already exist in the dataset. | Case | Expected behavior | | --- | --- | | Active multipart upload session exists for the same path | Prompt the user to resume or restart the incomplete upload. | | A file with the same path and size already exists in committed or staged dataset files | Prompt the user to upload again or skip the matching file. | The completed-file prompt should use cautious wording because matching by path and size does not prove byte-for-byte equality. Implementation should include: - A backend dataset-scoped check for candidate upload paths and sizes. - Frontend logic that checks active multipart sessions first, then checks existing matching files. - Support for mixed retry batches where one file resumes and another file can be skipped. - Tests for multipart resume behavior, completed-file skip behavior, backend committed/staged matches, and invalid or unauthorized requests. Related discussion: #5744 Related PR: #5929 ### Task Type - [ ] Refactor / Cleanup - [ ] DevOps / Deployment / CI - [x] Testing / QA - [ ] Documentation - [ ] Performance - [x] Other ### Task Type - [ ] Refactor / Cleanup - [ ] DevOps / Deployment / CI - [ ] Testing / QA - [ ] Documentation - [ ] Performance - [ ] Other -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
