[jira] [Updated] (HADOOP-18028) High performance S3A input stream with prefetching & caching

2022-03-25 Thread Steve Loughran (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-18028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-18028:

Summary: High performance S3A input stream with prefetching & caching  
(was: improve S3 read speed using prefetching & caching)

> High performance S3A input stream with prefetching & caching
> 
>
> Key: HADOOP-18028
> URL: https://issues.apache.org/jira/browse/HADOOP-18028
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs/s3
>Reporter: Bhalchandra Pandit
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 12.5h
>  Remaining Estimate: 0h
>
> I work for Pinterest. I developed a technique for vastly improving read 
> throughput when reading from the S3 file system. It not only helps the 
> sequential read case (like reading a SequenceFile) but also significantly 
> improves read throughput of a random access case (like reading Parquet). This 
> technique has been very useful in significantly improving efficiency of the 
> data processing jobs at Pinterest. 
>  
> I would like to contribute that feature to Apache Hadoop. More details on 
> this technique are available in this blog I wrote recently:
> [https://medium.com/pinterest-engineering/improving-efficiency-and-reducing-runtime-using-s3-read-optimization-b31da4b60fa0]
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-18028) High performance S3A input stream with prefetching & caching

2023-04-28 Thread Steve Loughran (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-18028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-18028:

Fix Version/s: 3.4.0
   3.3.9

> High performance S3A input stream with prefetching & caching
> 
>
> Key: HADOOP-18028
> URL: https://issues.apache.org/jira/browse/HADOOP-18028
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs/s3
>Reporter: Bhalchandra Pandit
>Assignee: Bhalchandra Pandit
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.3.9
>
>  Time Spent: 14.5h
>  Remaining Estimate: 0h
>
> I work for Pinterest. I developed a technique for vastly improving read 
> throughput when reading from the S3 file system. It not only helps the 
> sequential read case (like reading a SequenceFile) but also significantly 
> improves read throughput of a random access case (like reading Parquet). This 
> technique has been very useful in significantly improving efficiency of the 
> data processing jobs at Pinterest. 
>  
> I would like to contribute that feature to Apache Hadoop. More details on 
> this technique are available in this blog I wrote recently:
> [https://medium.com/pinterest-engineering/improving-efficiency-and-reducing-runtime-using-s3-read-optimization-b31da4b60fa0]
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-18028) High performance S3A input stream with prefetching & caching

2024-01-16 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-18028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HADOOP-18028:

Fix Version/s: (was: 3.4.0)

> High performance S3A input stream with prefetching & caching
> 
>
> Key: HADOOP-18028
> URL: https://issues.apache.org/jira/browse/HADOOP-18028
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs/s3
>Reporter: Bhalchandra Pandit
>Assignee: Bhalchandra Pandit
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.3.9
>
>  Time Spent: 14.5h
>  Remaining Estimate: 0h
>
> I work for Pinterest. I developed a technique for vastly improving read 
> throughput when reading from the S3 file system. It not only helps the 
> sequential read case (like reading a SequenceFile) but also significantly 
> improves read throughput of a random access case (like reading Parquet). This 
> technique has been very useful in significantly improving efficiency of the 
> data processing jobs at Pinterest. 
>  
> I would like to contribute that feature to Apache Hadoop. More details on 
> this technique are available in this blog I wrote recently:
> [https://medium.com/pinterest-engineering/improving-efficiency-and-reducing-runtime-using-s3-read-optimization-b31da4b60fa0]
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org