nkemnitz opened a new pull request, #782:
URL: https://github.com/apache/arrow-rs-object-store/pull/782
# Which issue does this PR close?
Part of #774 (does not fully close it — see below).
# Rationale for this change
GCS serves large objects stored with `Content-Encoding: gzip` using chunked
transfer with no `Content-Length` (and decompressive transcoding when the
client
does not accept gzip encoding). `ObjectStore::get`/`head` on GCS required
`Content-Length` unconditionally and failed with
`Generic { store: "GCS", source: Header { source: MissingContentLength } }`,
even though a chunked, self-delimiting body is a valid response (RFC 9112
§6.2
forbids `Content-Length` alongside `Transfer-Encoding: chunked`).
# What changes are included in this PR?
`HeaderConfig` gains `stored_size_header: Option<&'static str>`. When
`Content-Length` is absent, `header_meta` reads the object size from this
header.
GCS sets it to `x-goog-stored-content-length` (always present); S3, Azure
and the
HTTP store leave it `None`, so a missing `Content-Length` stays a hard error
for
them.
# Are there any user-facing changes?
`get()`/`head()` now succeed on chunked gzip GCS objects. On a
server-decompressed
(transcoded) read, `ObjectMeta.size` is the *stored* (compressed) size,
since the
decompressed length is not known without reading the body; on a passthrough
read
(`Accept-Encoding: gzip`) it is exact.
**Not fully resolved:** some transcoded GCS responses (default reads without
`Accept-Encoding: gzip`) also omit the ETag entirely and still fail with
`MissingEtag`. Left for a follow-up.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]