[ https://issues.apache.org/jira/browse/HADOOP-14965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Steve Loughran updated HADOOP-14965: ------------------------------------ Attachment: HADOOP-14965-001.patch patch 001; in sync with git commit df2fc957aece43478e5b. Uploading for yetus to play with Testage, s3 ireland, {{-Ds3guard -Ddynamodb -Dscale}}. The scale test is the one which looks at input stream perf & is touched in this patch, so critical that it is executed > s3a input stream "normal" fadvise mode to be adaptive > ----------------------------------------------------- > > Key: HADOOP-14965 > URL: https://issues.apache.org/jira/browse/HADOOP-14965 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 > Affects Versions: 2.8.1 > Reporter: Steve Loughran > Assignee: Steve Loughran > Attachments: HADOOP-14965-001.patch > > > HADOOP-14535 added seek optimisation to wasb, but rather than require the > caller to declare sequential vs random, it works out for itself. > # defaults to sequential, lazy seek > # if the caller ever seeks backwards, switches to random IO. > This means that on the use pattern of columnar stores: of go to end of file, > read summary, then go to columns and work forwards, will switch to random IO > after that first seek back (cost: one aborted HTTP connection)/. > Where this should benefit the most is in downstream apps where you are > working with different data sources in the same object store/running of the > same app config, but have different read patterns. I'm seeing exactly this in > some of my spark tests, where it's near impossible to set things up so that > .gz files are read sequentially, but ORC data is read in random IO > I propose the "normal" fadvise => adaptive, sequential==sequential always, > random => random from the outset. -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org