[ https://issues.apache.org/jira/browse/HADOOP-19559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18004157#comment-18004157 ]
ASF GitHub Bot commented on HADOOP-19559: ----------------------------------------- fuatbasik commented on code in PR #7763: URL: https://github.com/apache/hadoop/pull/7763#discussion_r2195042590 ########## hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/Statistic.java: ########## @@ -459,6 +459,14 @@ public enum Statistic { "Gauge of active memory in use", TYPE_GAUGE), + ANALYTICS_GET_REQUESTS( + StreamStatisticNames.STREAM_READ_ANALYTICS_GET_REQUESTS, + "GET requests made by analytics streams", Review Comment: This is nit but i believe an important one. We have to make sure we are giving right name, description as well as doing it in the right place. 1/ why are these are here at these lines? This is neither lexicographic nor following a logical separation. We probably need to move these up to Object IO section. 2/ Next is the name for Stream read we had `STREAM_READ_OPENED` and we added `STREAM_READ_ANALYTICS_OPENED` so following the same schema we probably need to re-name this to `OBJECT_GET_ANALYTICS_REQUESTS` (there is no GET equivalent currently because stream read opens = gets. Likewise for HEAD, we have `OBJECT_METADATA_REQUESTS`, which we can add `OBJECT_METADATA_ANALYTICS_REQUESTS`. 3/ we should make sure the messages are similar to others. For example, we should call this `Count of object get requests made by analytics stream`. and for head `Count of requests for object metadata made by analytics stream` ########## hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/impl/streams/AnalyticsStream.java: ########## @@ -63,13 +68,23 @@ public AnalyticsStream(final ObjectReadParameters parameters, @Override public int read() throws IOException { throwIfClosed(); + if (getPos() >= lengthLimit) { + return -1; // EOF reached due to length limit Review Comment: do we need to close underlying stream here before returning? We might check this from other Stream implementations. ########## hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/impl/streams/AnalyticsRequestCallback.java: ########## @@ -0,0 +1,52 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hadoop.fs.s3a.impl.streams; + +import org.apache.hadoop.fs.s3a.statistics.S3AInputStreamStatistics; +import org.apache.hadoop.fs.statistics.DurationTracker; +import software.amazon.s3.analyticsaccelerator.util.RequestCallback; + +/** + * Implementation of AAL's RequestCallback interface that tracks analytics operations. + */ +public class AnalyticsRequestCallback implements RequestCallback { + private final S3AInputStreamStatistics statistics; + + /** + * Create a new callback instance. + * @param statistics the statistics to update + */ + public AnalyticsRequestCallback(S3AInputStreamStatistics statistics) { + this.statistics = statistics; + } + + @Override + public void onGetRequest() { + statistics.incrementAnalyticsGetRequests(); + // Update ACTION_HTTP_GET_REQUEST statistic + DurationTracker tracker = statistics.initiateGetRequest(); Review Comment: this duration will be always very close to 0 right? Why do we need to start and close tracker? ########## hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/commit/ITestCommitOperationCost.java: ########## @@ -174,10 +173,7 @@ private void abortActiveStream() throws IOException { public void testCostOfCreatingMagicFile() throws Throwable { describe("Files created under magic paths skip existence checks and marker deletes"); - // Analytics accelerator currently does not support IOStatistics, this will be added as - // part of https://issues.apache.org/jira/browse/HADOOP-19364 - skipIfAnalyticsAcceleratorEnabled(getConfiguration(), - "Analytics Accelerator currently does not support stream statistics"); + Review Comment: nit: extra whitespace ########## hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/impl/streams/AnalyticsRequestCallback.java: ########## @@ -0,0 +1,52 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hadoop.fs.s3a.impl.streams; + +import org.apache.hadoop.fs.s3a.statistics.S3AInputStreamStatistics; +import org.apache.hadoop.fs.statistics.DurationTracker; +import software.amazon.s3.analyticsaccelerator.util.RequestCallback; + +/** + * Implementation of AAL's RequestCallback interface that tracks analytics operations. + */ +public class AnalyticsRequestCallback implements RequestCallback { + private final S3AInputStreamStatistics statistics; + + /** + * Create a new callback instance. + * @param statistics the statistics to update + */ + public AnalyticsRequestCallback(S3AInputStreamStatistics statistics) { + this.statistics = statistics; + } + + @Override + public void onGetRequest() { + statistics.incrementAnalyticsGetRequests(); + // Update ACTION_HTTP_GET_REQUEST statistic + DurationTracker tracker = statistics.initiateGetRequest(); + tracker.close(); + } + + @Override + public void onHeadRequest() { + statistics.incrementAnalyticsHeadRequests(); Review Comment: now on GET you are updating both Analytics request counter and ACTION_HTTP_GET_REQUEST (with tracker). Why are you not updating OBJECT_METADATA_REQUESTS here? > S3A: Analytics accelerator for S3 to be enabled by default > ---------------------------------------------------------- > > Key: HADOOP-19559 > URL: https://issues.apache.org/jira/browse/HADOOP-19559 > Project: Hadoop Common > Issue Type: New Feature > Components: fs/s3 > Affects Versions: 3.5.0, 3.4.2 > Reporter: Ahmar Suhail > Priority: Major > Labels: pull-request-available > > Make "analytics" the default input stream in S3A. > Goals > * Parquet performance through applications running queries over the data > (spark etc) > * Performance for other formats good as/better than today. Examples: avro > manifests in iceberg, ORC in hive/spark > * Performance for other uses as good as today (whole-file/sequential reads of > parquet data in distcp etc) > * better resilience to bad uses (incomplete reads not retaining http streams, > buffer allocations on long-retained data) > * efficient on applications like Impala, which caches parquet footers itself, > and uses unbuffer() to discard all stream-side resources. Maybe just throw > alway all state on unbuffer() and stop trying to be sophisticated, or support > some new openFile flag which can be used to disable footer parsing -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org