carloea2 commented on issue #4058: URL: https://github.com/apache/texera/issues/4058#issuecomment-3578098518
**1. Concrete bypass evidence** On hub.texera.io with `singleFileUploadMaxSizeMiB = 10 GiB`, I patched the JS size check in devtools and uploaded a **12.92 GB** `testtexera.zip`. The upload used multipart + presigned PUTs, `multipartUpload?type=finish` returned 200, and the dataset shows `Version Size: 12.92 GB`. So a user who bypasses the frontend can exceed the configured limit today. **2. POST, part limits, and URL count** Presigned POST `content-length-range` only protects a *single* request, while our large-file path uses multipart PUTs. lakeFS/S3 enforce only **per-part** bounds (e.g., 5 MiB–5 GiB, up to 10,000 parts). Because parts can vary in size, limiting the **number of presigned URLs** alone does **not** enforce a total max size. Relying on a “final size” header from the client would again trust the frontend. **3. FixedSize / MaxSize parts** As far as I can see, we can’t tell lakeFS/S3 “every part must be exactly X bytes”; they only enforce min/max per part. So a “fixed-size parts = enforced limit” scheme isn’t reliable without extra server logic. So yes, using FixedSize/MaxSize parts alone doesn’t seem feasible. **4. Finish-time check (simple backend fix)** First step I propose: on `multipartUpload?type=finish`, read the object size from lakeFS/S3 and reject anything over `singleFileUploadMaxSizeMiB` (and optionally delete/abort). This trusts only object-store metadata. Downside: we discover violations *after* all bytes are uploaded, so bandwidth and temporary storage are still consumed. **5. Watcher approach (pros/cons)** A watcher that periodically `ListParts` and aborts when `bytes_uploaded > limit` would: * **Pros:** detect oversize uploads earlier; natural place for future per-user/dataset quotas. * **Cons:** extra DB table + background job, more control-plane calls, new failure modes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
