macdoor opened a new pull request, #27965:
URL: https://github.com/apache/flink/pull/27965

   ## Purpose
   
   Fix `ConnectionClosedException: Premature end of Content-Length delimited 
message body` when reading large Parquet files from S3-compatible storage (e.g. 
MinIO) via `flink-s3-fs-native`, especially with `seek()` / ranged GETs.
   
   ## Root cause
   
   Closing a partially consumed `GetObject` response via 
`BufferedInputStream.close()` lets Apache HttpClient drain the remaining body 
for connection reuse. If the connection ends early, draining throws. AWS SDK v2 
recommends `ResponseInputStream.abort()` when the rest of the body will not be 
read.
   
   ## Change
   
   - On reopen and on `close()` when `position < contentLength`, call `abort()` 
on the `ResponseInputStream` instead of relying on drain-on-close.
   
   ## Testing
   
   - Local `mvn` compile was not run in this environment (Flink 2.4-SNAPSHOT 
reactor deps not resolvable against the configured Maven mirror). Relying on 
Azure Pipelines for `flink-s3-fs-native`.
   
   JIRA: https://issues.apache.org/jira/browse/FLINK-39484


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to