suyashj1231 commented on issue #3842: URL: https://github.com/apache/texera/issues/3842#issuecomment-4772478564
@aicam I took this issue and spent some time on the `AccessDenied` bug from the 10/23 notes. I managed to reproduce it on a `single-node` deployment, and from what I can tell it's a SigV4 host mismatch, not anything about the file itself. The whiteboard plan (presign-url carrying a filename header, fetched by the browser's download manager) is the right idea, but I hit two things that change where the presign has to happen. One, the LakeFS S3 gateway at `lakefs:8000` ignores `response-content-disposition`. Presigning a GET through the gateway gives me a 200 but no `Content-Disposition` header, so the filename never gets set. Presigning straight against MinIO does set it: `Content-Disposition: attachment; filename="Iris.csv"`, correct name. So the filename header on the board only sticks if we presign against MinIO, not the gateway. Two, the `AccessDenied` is a signed-host mismatch. SigV4 signs the `Host` header, so the URL has to be signed with the same endpoint the browser actually hits. Signing internal and fetching external fails: ``` A) signed host texera-minio:9000, fetched at localhost:9000 -> 403 SignatureDoesNotMatch B) signed host localhost:9000, fetched at localhost:9000 -> 200 OK, Content-Disposition set ``` That fits why it breaks on one deployment and not another. When the backend's S3 endpoint and the browser-facing endpoint are the same host it works, when they differ you get the rejection. Region (`us-west-2` vs `us-east-1`) made no difference in my tests. So the version that actually holds up: presign directly against MinIO with `response-content-disposition`, signed with the external pre-signed endpoint. Right now `file-service` only knows the internal `STORAGE_S3_ENDPOINT`, so it'd need the external one too, same idea as LakeFS's `BLOCKSTORE_S3_PRE_SIGNED_ENDPOINT`. I can take a shot at that if it sounds right to you. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
