Bhalchandra Pandit created HADOOP-18028:
-------------------------------------------
Summary: improve S3 read speed using prefetching & caching
Key: HADOOP-18028
URL: https://issues.apache.org/jira/browse/HADOOP-18028
Project: Hadoop Common
Issue Type: Improvement
Components: fs/s3
Reporter: Bhalchandra Pandit
I work for Pinterest. I developed a technique for vastly improving read
throughput when reading from the S3 file system. It not only helps the
sequential read case (like reading a SequenceFile) but also significantly
improves read throughput of a random access case (like reading Parquet). This
technique has been very useful in significantly improving efficiency of the
data processing jobs at Pinterest.
I would like to contribute that feature to Apache Hadoop. More details on this
technique are available in this blog I wrote recently:
[https://medium.com/pinterest-engineering/improving-efficiency-and-reducing-runtime-using-s3-read-optimization-b31da4b60fa0]
--
This message was sent by Atlassian Jira
(v8.20.1#820001)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]