[
https://issues.apache.org/jira/browse/HADOOP-19348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17931191#comment-17931191
]
ASF GitHub Bot commented on HADOOP-19348:
-----------------------------------------
steveloughran commented on code in PR #7433:
URL: https://github.com/apache/hadoop/pull/7433#discussion_r1973722346
##########
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java:
##########
@@ -636,7 +636,8 @@ public void initialize(URI name, Configuration originalConf)
// If encryption method is set to CSE-KMS or CSE-CUSTOM then CSE is
enabled.
isCSEEnabled =
CSEUtils.isCSEEnabled(getS3EncryptionAlgorithm().getMethod());
- isAnalyticsAccelaratorEnabled =
StreamIntegration.determineInputStreamType(conf).equals(InputStreamType.Analytics);
+ isAnalyticsAccelaratorEnabled =
StreamIntegration.determineInputStreamType(conf)
Review Comment:
nit: spelling
##########
hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/ITestS3AAnalyticsAcceleratorStreamReading.java:
##########
@@ -141,37 +144,37 @@ public void testMalformedParquetFooter() throws
IOException {
* can contain multiple row groups, this allows for further parallelisation,
as each row group
* can be processed independently.
*/
- @Test
- public void testMultiRowGroupParquet() throws Throwable {
+ @Test
+ public void testMultiRowGroupParquet() throws Throwable {
describe("A parquet file is read successfully");
Path dest = path("multi_row_group.parquet");
- File file = new File("src/test/resources/multi_row_group.parquet");
- Path sourcePath = new Path(file.toURI().getPath());
- getFileSystem().copyFromLocalFile(false, true, sourcePath, dest);
+ File file = new File("src/test/resources/multi_row_group.parquet");
+ Path sourcePath = new Path(file.toURI().getPath());
+ getFileSystem().copyFromLocalFile(false, true, sourcePath, dest);
- FileStatus fileStatus = getFileSystem().getFileStatus(dest);
+ FileStatus fileStatus = getFileSystem().getFileStatus(dest);
- byte[] buffer = new byte[3000];
- IOStatistics ioStats;
+ byte[] buffer = new byte[3000];
+ IOStatistics ioStats;
- try (FSDataInputStream inputStream = getFileSystem().open(dest)) {
- ioStats = inputStream.getIOStatistics();
- inputStream.readFully(buffer, 0, (int) fileStatus.getLen());
- }
+ try (FSDataInputStream inputStream = getFileSystem().open(dest)) {
+ ioStats = inputStream.getIOStatistics();
+ inputStream.readFully(buffer, 0, (int) fileStatus.getLen());
+ }
- verifyStatisticCounterValue(ioStats, STREAM_READ_ANALYTICS_OPENED, 1);
+ verifyStatisticCounterValue(ioStats, STREAM_READ_ANALYTICS_OPENED, 1);
- try (FSDataInputStream inputStream = getFileSystem().openFile(dest)
-
.must(FS_OPTION_OPENFILE_READ_POLICY,FS_OPTION_OPENFILE_READ_POLICY_PARQUET)
- .build().get()) {
- ioStats = inputStream.getIOStatistics();
- inputStream.readFully(buffer, 0, (int) fileStatus.getLen());
- }
+ try (FSDataInputStream inputStream = getFileSystem().openFile(dest)
+ .must(FS_OPTION_OPENFILE_READ_POLICY,
FS_OPTION_OPENFILE_READ_POLICY_PARQUET)
+ .build().get()) {
+ ioStats = inputStream.getIOStatistics();
+ inputStream.readFully(buffer, 0, (int) fileStatus.getLen());
+ }
- verifyStatisticCounterValue(ioStats, STREAM_READ_ANALYTICS_OPENED, 1);
- }
+ verifyStatisticCounterValue(ioStats, STREAM_READ_ANALYTICS_OPENED, 1);
Review Comment:
add a check for the filesystem iostats too, to make sure it trickles up
> S3A: Add initial support for analytics-accelerator-s3
> -----------------------------------------------------
>
> Key: HADOOP-19348
> URL: https://issues.apache.org/jira/browse/HADOOP-19348
> Project: Hadoop Common
> Issue Type: Sub-task
> Components: fs/s3
> Affects Versions: 3.4.2
> Reporter: Ahmar Suhail
> Assignee: Ahmar Suhail
> Priority: Major
> Labels: pull-request-available
>
> S3 recently released [Analytics Accelerator Library for Amazon
> S3|https://github.com/awslabs/analytics-accelerator-s3] as an Alpha release,
> which is an input stream, with an initial goal of improving performance for
> Apache Spark workloads on Parquet datasets.
> For example, it implements optimisations such as footer prefetching, and so
> avoids the multiple GETS S3AInputStream currently makes for the footer bytes
> and PageIndex structures.
> The library also tracks columns currently being read by a query using the
> parquet metadata, and then prefetches these bytes when parquet files with the
> same schema are opened.
> This ticket tracks the work required for the basic initial integration. There
> is still more work to be done, such as VectoredIO support etc, which we will
> identify and follow up with.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]