[
https://issues.apache.org/jira/browse/HADOOP-19348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17901489#comment-17901489
]
ASF GitHub Bot commented on HADOOP-19348:
-----------------------------------------
fuatbasik opened a new pull request, #7192:
URL: https://github.com/apache/hadoop/pull/7192
### Description of PR
Integrate Analytics Accelerator Library for Amazon S3
This PR is the initial integration of Analytics Accelerator
Library for Amazon S3 to S3A. It performs integration by introducing a new
S3ASeekableStream and modifying S3AFileSystem. Use of the Analytics
Accelerator Library is controlled by a configuration and it is off by
default.
### How was this patch tested?
Added new integration tests ITestS3AS3SeekableStream and running all
hadoop-aws tests with
<property>
<name>fs.s3a.analytics.accelerator.enabled</name>
<value>true</value>
</property>
We still have failures on tests, we are working to address them.
### For code changes:
- [X] Does the title or this PR starts with the corresponding JIRA issue id
(e.g. 'HADOOP-17799. Your PR title ...')?
- [X] Object storage: have the integration tests been executed and the
endpoint declared according to the connector-specific documentation?
- [X] If adding new dependencies to the code, are these dependencies
licensed in a way that is compatible for inclusion under [ASF
2.0](http://www.apache.org/legal/resolved.html#category-a)?
- [N/A] If applicable, have you updated the `LICENSE`, `LICENSE-binary`,
`NOTICE-binary` files?
> S3A: Add support for analytics-accelerator-s3
> ---------------------------------------------
>
> Key: HADOOP-19348
> URL: https://issues.apache.org/jira/browse/HADOOP-19348
> Project: Hadoop Common
> Issue Type: Sub-task
> Components: fs/s3
> Affects Versions: 3.4.2
> Reporter: Ahmar Suhail
> Priority: Major
>
> S3 recently released [Analytics Accelerator Library for Amazon
> S3|https://github.com/awslabs/analytics-accelerator-s3] as an Alpha release,
> which is an input stream, with an initial goal of improving performance for
> Apache Spark workloads on Parquet datasets.
> For example, it implements optimisations such as footer prefetching, and so
> avoids the multiple GETS S3AInputStream currently makes for the footer bytes
> and PageIndex structures.
> The library also tracks columns currently being read by a query using the
> parquet metadata, and then prefetches these bytes when parquet files with the
> same schema are opened.
> This ticket tracks the work required for the basic initial integration. There
> is still more work to be done, such as VectoredIO support etc, which we will
> identify and follow up with.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]