[ 
https://issues.apache.org/jira/browse/HADOOP-19229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17887574#comment-17887574
 ] 

Steve Loughran commented on HADOOP-19229:
-----------------------------------------

This is exactly what `"fs.s3a.vectored.read.min.seek.size" does. We set it to 
4K; maybe we should review it. The facebook Velox paper says that 20kB is 
better for cloud storage

> Vector IO on cloud storage: experiment to see what a good minimum seek size 
> should be
> -------------------------------------------------------------------------------------
>
>                 Key: HADOOP-19229
>                 URL: https://issues.apache.org/jira/browse/HADOOP-19229
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/s3
>    Affects Versions: 3.4.1
>            Reporter: Steve Loughran
>            Priority: Major
>
> vector iO has a max size to coalesce ranges, but it also needs a maximum gap 
> between ranges to justify the merge. Right now we could have a read where two 
> vectors of size 8 bytes can be merged with a 1 MB gap between them -and 
> that's wasteful. 
> We could also consider an "efficiency" metric which looks at the ratio of 
> bytes-read to bytes-discarded. Not sure what we'd do with it, but we could 
> track it as an IOStat



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

Reply via email to