carloea2 commented on issue #4058:
URL: https://github.com/apache/texera/issues/4058#issuecomment-3573607898

   After talking with @chenlica I realized that even with backend checks added 
for `uploadOneFileToDataset`, the **multipart + presigned URL path** still has 
a gap: a user could already have uploaded e.g. 1 TB to lakeFS/S3, and we only 
notice at `multipartUpload?type=finish`, which is too late from a bandwidth / 
resource perspective.
   
   ### Why we can’t enforce inline like single-file upload
   
   For single-file uploads, the HTTP body streams **through** `DatasetResource` 
and we can do:
   
   ```scala
   totalBytesRead += read
   if (totalBytesRead > maxLimit) throw 413
   ```
   
   For multipart uploads with **presigned PUTs**:
   
   * The browser talks **directly** to S3/lakeFS (S3 multipart API: initiate → 
upload part → complete).
   * Our backend only:
   
     * Creates presigned URLs (`type=init`), and
     * Gets metadata on `type=finish`.
   * The data stream never passes our JVM, so we have **no byte counter to 
increment** and no way to send a 413 mid-stream.
   * S3/lakeFS don’t expose an app-specific size cap inside presigned **PUT** 
URLs; POST policies with `content-length-range` don’t apply to our current 
multipart flow.
   
   So an “inline” limit identical to `uploadOneFileToDataset` is fundamentally 
impossible with presigned multipart PUTs.
   
   ### Proposed direction: server-side watcher
   
   Idea (not implemented yet; looking for feasibility feedback):
   
   * Add a small DB table tracking active multipart uploads: `(upload_id, did, 
uid, repository_name, file_path, physical_address, status, bytes_uploaded, 
max_bytes, timestamps)`.
   * On `multipart-upload?type=init`, insert a row with `status = ACTIVE` and 
`max_bytes = single_file_upload_max_size_mib`.
   * Background watcher:
   
     * Periodically (e.g. every 10–60s) calls `ListParts` (or lakeFS 
equivalent) for each ACTIVE upload and sums **real part sizes** from the object 
store.
     * Updates `bytes_uploaded`.
     * If `bytes_uploaded > max_bytes`, calls `abortPresignedMultipartUploads` 
and marks `status = LIMIT_EXCEEDED`.
   * On `type=finish`, if status is `LIMIT_EXCEEDED` or `ABORTED` we fail 
immediately instead of completing the upload.
   
   This does **not** trust any client-supplied sizes; it only trusts lakeFS/S3 
metadata.
   
   ### Downsides / tradeoffs
   
   * **Overshoot is still possible**: if a client uploads many small parts 
quickly between watcher ticks, we only detect after the interval. The overshoot 
is bounded by:
   
     * part size × “parts uploaded between checks” + watcher latency.
     * Shorter intervals reduce overshoot but increase control-plane calls 
(`ListParts` per active upload).
   * **Extra complexity**: background job, DB tracking, and a dependency on 
S3/lakeFS metadata being available and fast.
   * **Failure modes**: if the watcher is down, we still catch oversized 
uploads at `finish` (object size check + rollback), but we can’t avoid 
bandwidth cost.
   
   ### Future benefit
   
   If we go this route, the same table + watcher make it much easier later to 
add **per-user / per-dataset quotas** (e.g., total in-flight bytes, per-user 
caps). I haven’t started coding this yet because I’d like feedback on whether 
this approach sounds acceptable from a design / ops standpoint.
   
   ### Tiny diagram
   
   ```mermaid
   sequenceDiagram
       participant Client
       participant Texera as DatasetResource
       participant Watcher
       participant LakeFS
   
       Client->>Texera: multipart-upload?type=init
       Texera->>LakeFS: initiateMultipart
       LakeFS-->>Texera: uploadId + presigned URLs
       Texera->>DB: INSERT upload (ACTIVE, max_bytes)
       Texera-->>Client: uploadId + URLs
   
       Client->>LakeFS: PUT parts (direct)
   
       loop every N seconds
         Watcher->>DB: SELECT ACTIVE uploads
         Watcher->>LakeFS: ListParts(uploadId)
         LakeFS-->>Watcher: part sizes
         Watcher->>DB: UPDATE bytes_uploaded
         alt bytes_uploaded > max_bytes
           Watcher->>LakeFS: abortMultipartUpload()
           Watcher->>DB: status = LIMIT_EXCEEDED
         end
       end
   
       Client->>Texera: multipart-upload?type=finish
       Texera->>DB: read status
       alt LIMIT_EXCEEDED
         Texera-->>Client: 413 Payload Too Large
       else
         Texera->>LakeFS: completeMultipartUpload()
         LakeFS-->>Texera: final size
         Texera-->>Client: 200 OK
       end
   ```
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to