nkemnitz opened a new pull request, #782:
URL: https://github.com/apache/arrow-rs-object-store/pull/782

   # Which issue does this PR close?
   
   Part of #774 (does not fully close it — see below).
   
   # Rationale for this change
   
   GCS serves large objects stored with `Content-Encoding: gzip` using chunked
   transfer with no `Content-Length` (and decompressive transcoding when the 
client
   does not accept gzip encoding). `ObjectStore::get`/`head` on GCS required
   `Content-Length` unconditionally and failed with
   `Generic { store: "GCS", source: Header { source: MissingContentLength } }`,
   even though a chunked, self-delimiting body is a valid response (RFC 9112 
§6.2
   forbids `Content-Length` alongside `Transfer-Encoding: chunked`).
   
   # What changes are included in this PR?
   
   `HeaderConfig` gains `stored_size_header: Option<&'static str>`. When
   `Content-Length` is absent, `header_meta` reads the object size from this 
header.
   GCS sets it to `x-goog-stored-content-length` (always present); S3, Azure 
and the
   HTTP store leave it `None`, so a missing `Content-Length` stays a hard error 
for
   them.
   
   # Are there any user-facing changes?
   
   `get()`/`head()` now succeed on chunked gzip GCS objects. On a 
server-decompressed
   (transcoded) read, `ObjectMeta.size` is the *stored* (compressed) size, 
since the
   decompressed length is not known without reading the body; on a passthrough 
read
   (`Accept-Encoding: gzip`) it is exact.
   
   **Not fully resolved:** some transcoded GCS responses (default reads without
   `Accept-Encoding: gzip`) also omit the ETag entirely and still fail with
   `MissingEtag`. Left for a follow-up.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to