[ 
https://issues.apache.org/jira/browse/FLINK-39533?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Samrat Deb updated FLINK-39533:
-------------------------------
    Description: 
NativeS3InputStream currently calls close() on the underlying 
ResponseInputStream during seek(), skip(), and close() operations. Apache 
HttpClient's close() implementation drains all remaining bytes from the 
response body to enable HTTP connection reuse.

 

 

For large S3 objects where only a small portion was read (large state file or 
seeking within columnar formats)
this drain loop reads and discards potentially gigabytes of data over the 
network causing severe latency during seek/close operations.

  was:NativeS3InputStream currently calls close() on the underlying 
ResponseInputStream during seek(), skip(), and close() operations. Apache 
HttpClient's close() implementation drains all remaining bytes from the 
response body to enable HTTP connection reuse.


> Use abort() instead of drain on close/seek when remaining bytes exceed 
> threshold in NativeS3InputStream
> -------------------------------------------------------------------------------------------------------
>
>                 Key: FLINK-39533
>                 URL: https://issues.apache.org/jira/browse/FLINK-39533
>             Project: Flink
>          Issue Type: Technical Debt
>          Components: Connectors / FileSystem
>    Affects Versions: 2.3.0
>            Reporter: Samrat Deb
>            Priority: Major
>             Fix For: 2.4.0
>
>
> NativeS3InputStream currently calls close() on the underlying 
> ResponseInputStream during seek(), skip(), and close() operations. Apache 
> HttpClient's close() implementation drains all remaining bytes from the 
> response body to enable HTTP connection reuse.
>  
>  
> For large S3 objects where only a small portion was read (large state file or 
> seeking within columnar formats)
> this drain loop reads and discards potentially gigabytes of data over the 
> network causing severe latency during seek/close operations.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to