Ahmar Suhail created HADOOP-19348:
-------------------------------------
Summary: Add support for analytics-accelerator-s3
Key: HADOOP-19348
URL: https://issues.apache.org/jira/browse/HADOOP-19348
Project: Hadoop Common
Issue Type: Sub-task
Components: fs/s3
Reporter: Ahmar Suhail
S3 recently released [https://github.com/awslabs/analytics-accelerator-s3
|https://github.com/awslabs/analytics-accelerator-s3,] as an Alpha release,
which is an input stream, with an initial goal of improving performance for
Apache Spark workloads on Parquet datasets.
For example, it implements optimisations such as footer prefetching, and so
avoids the multiple GETS S3AInputStream currently makes for the footer bytes
and PageIndex structures.
The library also tracks columns currently being read by a query using the
parquet metadata, and then prefetches these bytes when parquet files with the
same schema are opened.
This ticket tracks the work required for the basic initial integration. There
is still more work to be done, such as VectoredIO support etc, which we will
identify and follow up with.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]