subject:"\[jira\] \[Updated\] \(HADOOP\-13047\) S3a Forward seek in stream length to be configurable"

[jira] [Updated] (HADOOP-13047) S3a Forward seek in stream length to be configurable

2016-04-24 Thread Rajesh Balamohan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-13047?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated HADOOP-13047:
--
Attachment: HADOOP-13047.WIP.2.patch

Agreed. Created "fs.s3a.readahead.buffer.size" which can be used for 
configuring the buffer size (initially thought of reusing io.file.buffer.size 
in S3AFileSystem, but it would be better to track s3a readahead separately).  

Created the patch which includes HADOOP-13028. Main changes are in 
seekInputStream. Patch should be a lot simplified once HADOOP-13028 is checked 
in.

> S3a Forward seek in stream length to be configurable
> 
>
> Key: HADOOP-13047
> URL: https://issues.apache.org/jira/browse/HADOOP-13047
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 2.8.0
>Reporter: Steve Loughran
> Attachments: HADOOP-13047.WIP.2.patch, HADOOP-13047.WIP.patch
>
>
> Even with lazy seek, tests can show that sometimes a short-distance forward 
> seek is triggering a close + reopen, because the threshold for the seek is 
> simply available bytes in the inner stream.
> A configurable threshold would allow data to be read and discarded before 
> that seek. This should be beneficial over long-haul networks as the time to 
> set up the TCP channel is high, and TCP-slow-start means that the ramp up of 
> bandwidth is slow. In such deployments, it will better to read forward than 
> re-open, though the exact "best" number will vary with client and endpoint.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HADOOP-13047) S3a Forward seek in stream length to be configurable

2016-04-22 Thread Rajesh Balamohan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-13047?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated HADOOP-13047:
--
Attachment: HADOOP-13047.WIP.patch

Attaching the high level WIP patch. Based on the gathered statistics on the 
amount of data read so far and the time taken to connect, it should be possible 
to determine whether to establish a new connection or to read from existing 
stream itself (like the case you had pointed earlier). WIP tries to address 
this scenario. It might not be possible to use something like ReadAheadPool in 
hadoop directly as that is based on FileDescriptor.

> S3a Forward seek in stream length to be configurable
> 
>
> Key: HADOOP-13047
> URL: https://issues.apache.org/jira/browse/HADOOP-13047
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 2.8.0
>Reporter: Steve Loughran
> Attachments: HADOOP-13047.WIP.patch
>
>
> Even with lazy seek, tests can show that sometimes a short-distance forward 
> seek is triggering a close + reopen, because the threshold for the seek is 
> simply available bytes in the inner stream.
> A configurable threshold would allow data to be read and discarded before 
> that seek. This should be beneficial over long-haul networks as the time to 
> set up the TCP channel is high, and TCP-slow-start means that the ramp up of 
> bandwidth is slow. In such deployments, it will better to read forward than 
> re-open, though the exact "best" number will vary with client and endpoint.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HADOOP-13047) S3a Forward seek in stream length to be configurable

[jira] [Updated] (HADOOP-13047) S3a Forward seek in stream length to be configurable

2 matches

Site Navigation

Mail list logo

Footer information