[ https://issues.apache.org/jira/browse/HADOOP-16241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16828863#comment-16828863 ]
Sahil Takiar commented on HADOOP-16241: --------------------------------------- Attached a flamegraph of an Impala full table scan of {{web_returns}} - I'm planning to address some of the SSL decrypt overhead in HADOOP-16050. The other main bottleneck is from all the lazy seeks which are ultimately dominated by SSL handshakes. > S3AInputStream PositionReadable should perform ranged read on dedicated > stream > ------------------------------------------------------------------------------- > > Key: HADOOP-16241 > URL: https://issues.apache.org/jira/browse/HADOOP-16241 > Project: Hadoop Common > Issue Type: Improvement > Components: fs/s3 > Reporter: Sahil Takiar > Assignee: Sahil Takiar > Priority: Major > Attachments: Impala-TPCDS-scans.zip, > impala-web_returns-scan-flamegraph.svg > > > The current implementation of {{PositionReadable}} in {{S3AInputStream}} is > pretty close to the default implementation in {{FsInputStream}}. > This JIRA proposes overriding the {{read(long position, byte[] buffer, int > offset, int length)}} method and re-implementing the {{readFully(long > position, byte[] buffer, int offset, int length)}} method in S3A. > The new implementation would perform a "ranged read" on a dedicated object > stream (rather than the shared one). Prototypes have shown this to bring a > considerable performance improvement to readers who are only interested in > reading a random chunk of the file at a time (e.g. Impala, although I would > assume HBase would benefit from this as well). > Setting {{fs.s3a.experimental.input.fadvise}} to {{RANDOM}} is helpful for > clients that rely on pread, but has a few drawbacks: > * Unless the client explicitly sets fadvise to RANDOM, they will get at > least one connection reset when the backwards seek is issued (after which > fadvise automatically switches to RANDOM) > * Data is only read in 64 kb chunks, so for a large read, several GET > requests must be issued to S3 to fetch the data; while the 64 kb chunk value > is configurable, it is hard to set a reasonable value for variable length > preads > * If the readahead value is too big, closing the input stream can take > considerable time because the stream has to be drained of data before it can > be closed > The new implementation of {{PositionReadable}} would issue a > {{GetObjectRequest}} with the range specified by {{position}} and the size of > the given buffer. The data would be read from the {{S3ObjectInputStream}} and > then closed at the end of the method. This stream would be independent of the > {{wrappedStream}} currently maintained by S3A. > This brings the following benefits: > * The {{PositionedReadable}} methods can be thread-safe without a > {{synchronized}} block, which allows clients to concurrently call pread > methods on the same {{S3AInputStream}} instance > * preads will request all the data at once rather than requesting it in > chunks via the readahead logic > * Avoids performing potentially expensive seeks when performing preads -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org