[ 
https://issues.apache.org/jira/browse/FLINK-39533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18076728#comment-18076728
 ] 

Gabor Somogyi commented on FLINK-39533:
---------------------------------------

[{{4eaf745}}|https://github.com/apache/flink/commit/4eaf745eef72453b1a7c59794be8360c7e129d8f]
 on release-2.3

> Use abort() instead of drain on close/seek when remaining bytes exceed 
> threshold in NativeS3InputStream
> -------------------------------------------------------------------------------------------------------
>
>                 Key: FLINK-39533
>                 URL: https://issues.apache.org/jira/browse/FLINK-39533
>             Project: Flink
>          Issue Type: Bug
>          Components: Connectors / FileSystem
>    Affects Versions: 2.3.0
>            Reporter: Samrat Deb
>            Assignee: Samrat Deb
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 2.4.0
>
>
> NativeS3InputStream currently calls close() on the underlying 
> ResponseInputStream during seek(), skip(), and close() operations. Apache 
> HttpClient's close() implementation drains all remaining bytes from the 
> response body to enable HTTP connection reuse.
> For large S3 objects where only a small portion was read (large state file or 
> seeking within columnar formats)
> this drain loop reads and discards potentially gigabytes of data over the 
> network causing severe latency during seek/close operations.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to