[
https://issues.apache.org/jira/browse/FLINK-39533?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Samrat Deb updated FLINK-39533:
-------------------------------
Description:
NativeS3InputStream currently calls close() on the underlying
ResponseInputStream during seek(), skip(), and close() operations. Apache
HttpClient's close() implementation drains all remaining bytes from the
response body to enable HTTP connection reuse.
For large S3 objects where only a small portion was read (large state file or
seeking within columnar formats)
this drain loop reads and discards potentially gigabytes of data over the
network causing severe latency during seek/close operations.
was:NativeS3InputStream currently calls close() on the underlying
ResponseInputStream during seek(), skip(), and close() operations. Apache
HttpClient's close() implementation drains all remaining bytes from the
response body to enable HTTP connection reuse.
> Use abort() instead of drain on close/seek when remaining bytes exceed
> threshold in NativeS3InputStream
> -------------------------------------------------------------------------------------------------------
>
> Key: FLINK-39533
> URL: https://issues.apache.org/jira/browse/FLINK-39533
> Project: Flink
> Issue Type: Technical Debt
> Components: Connectors / FileSystem
> Affects Versions: 2.3.0
> Reporter: Samrat Deb
> Priority: Major
> Fix For: 2.4.0
>
>
> NativeS3InputStream currently calls close() on the underlying
> ResponseInputStream during seek(), skip(), and close() operations. Apache
> HttpClient's close() implementation drains all remaining bytes from the
> response body to enable HTTP connection reuse.
>
>
> For large S3 objects where only a small portion was read (large state file or
> seeking within columnar formats)
> this drain loop reads and discards potentially gigabytes of data over the
> network causing severe latency during seek/close operations.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)