[ 
https://issues.apache.org/jira/browse/HTTPCLIENT-2422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arturo Bernal updated HTTPCLIENT-2422:
--------------------------------------
    Fix Version/s: 5.7-alpha1

> DecompressingEntity in 5.4+ eagerly creates decompression stream, causing 
> ZipException on empty/invalid bodies (regression from 5.2 lazy behavior)
> --------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HTTPCLIENT-2422
>                 URL: https://issues.apache.org/jira/browse/HTTPCLIENT-2422
>             Project: HttpComponents HttpClient
>          Issue Type: Bug
>          Components: HttpClient (classic)
>    Affects Versions: 5.4, 5.5, 5.6
>            Reporter: Sneha Murganoor
>            Priority: Critical
>             Fix For: 5.7-alpha1
>
>          Time Spent: 20m
>  Remaining Estimate: 0h
>
> In 5.2, DecompressingEntity.getContent() returned a 
> LazyDecompressingInputStream that deferred GZIPInputStream creation to the 
> first read() call. This allowed responses with Content-Encoding: gzip but 
> empty or non-gzip bodies to be handled gracefully — the stream was never read 
> or the error surfaced at a point where callers could handle it.
> In 5.4+, DecompressingEntity (moved to 
> org.apache.hc.client5.http.entity.compress) was rewritten to eagerly call 
> decoder.apply(super.getContent()) in getContent(). This immediately creates 
> GZIPInputStream, which reads the gzip magic bytes in its constructor. If the 
> body is empty (e.g., chunked transfer with zero-length body) or not actually 
> compressed, this throws ZipException: Not in GZIP format at getContent() time 
> — before the caller has any opportunity to handle it.
> Reproduction:
> A backend sends:
> {quote}
> HTTP/1.1 200 OK
> Content-Encoding: gzip
> Transfer-Encoding: chunked
> 0\r\n\r\n
> (Empty chunked body with Content-Encoding: gzip header.)
> {quote}
> In 5.2: entity.getContent() succeeds, returns LazyDecompressingInputStream. 
> Caller reads EOF without error.
> In 5.4+: entity.getContent() throws java.util.zip.ZipException: Not in GZIP 
> format.
> Stack trace:
> {quote}
> java.util.zip.ZipException: Not in GZIP format
>     at 
> java.base/java.util.zip.GZIPInputStream.readHeader(GZIPInputStream.java:197)
>     at java.base/java.util.zip.GZIPInputStream.<init>(GZIPInputStream.java:81)
>     at 
> org.apache.hc.client5.http.entity.compress.DecompressingEntity.getContent(DecompressingEntity.java:63)
> {quote}
> Context:
> HTTPCLIENT-1432 reported the same class of issue (ZipException on 304 
> responses with Content-Encoding: gzip). It was fixed in 4.5.5 and 5.0 Beta1 
> by using LazyDecompressingInputStream. The 5.4 rewrite of DecompressingEntity 
> removed lazy initialization, reintroducing this failure mode.
> While the backend is arguably misbehaving by sending Content-Encoding: gzip 
> with no body, this is common in practice (web servers that add the header 
> unconditionally regardless of whether compression occurred). The 5.2 behavior 
> was more resilient to this.
> Suggested fix:
> Restore lazy stream initialization in DecompressingEntity.getContent() — 
> defer decoder.apply() to first read(), or handle the case where the 
> underlying stream is empty before attempting decompression.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to