Re: [PR] HADOOP-19140. [ABFS, S3A] Add IORateLimiter API [hadoop]

2024-09-04 Thread via GitHub


steveloughran commented on code in PR #6703:
URL: https://github.com/apache/hadoop/pull/6703#discussion_r1743444084


##
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/IORateLimiter.java:
##
@@ -0,0 +1,90 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.fs;
+
+import java.time.Duration;
+import javax.annotation.Nullable;
+
+import org.apache.hadoop.classification.InterfaceAudience;
+import org.apache.hadoop.classification.InterfaceStability;
+
+/**
+ * An optional interface for classes that provide rate limiters.
+ * For a filesystem source, the operation name SHOULD be one of
+ * those listed in
+ * {@link org.apache.hadoop.fs.statistics.StoreStatisticNames}
+ * if the operation is listed there.
+ * 
+ * This interfaces is intended to be exported by FileSystems so that
+ * applications wishing to perform bulk operations may request access
+ * to a rate limiter which is shared across all threads interacting
+ * with the store..
+ * That is: the rate limiting is global to the specific instance of the
+ * object implementing this interface.
+ * 
+ * It is not expected to be shared with other instances of the same
+ * class, or across processes.
+ * 
+ * This means it is primarily of benefit when limiting bulk operations
+ * which can overload an (object) store from a small pool of threads.
+ * Examples of this can include:
+ * 
+ *   Bulk delete operations
+ *   Bulk rename operations
+ *   Completing many in-progress uploads
+ *   Deep and wide recursive treewalks
+ *   Reading/prefetching many blocks within a file
+ * 
+ * In cluster applications, it is more likely that rate limiting is
+ * useful during job commit operations, or processes with many threads.
+ */
+@InterfaceAudience.Public
+@InterfaceStability.Unstable
+public interface IORateLimiter {
+
+  /**
+   * Acquire IO capacity.
+   * 
+   * The implementation may assign different costs to the different
+   * operations.
+   * 
+   * If there is not enough space, the permits will be acquired,
+   * but the subsequent call will block until the capacity has been
+   * refilled.
+   * 
+   * The path parameter is used to support stores where there may be different 
throttling
+   * under different paths.
+   * @param operation operation being performed. Must not be null, may be "",
+   * should be from {@link org.apache.hadoop.fs.statistics.StoreStatisticNames}
+   * where there is a matching operation.
+   * @param source path for operations.
+   * Use "/" for root/store-wide operations.
+   * @param dest destination path for rename operations or any other operation 
which
+   * takes two paths.
+   * @param requestedCapacity capacity to acquire.
+   * Must be greater than or equal to 0.
+   * @return time spent waiting for output.
+   */
+  Duration acquireIOCapacity(
+  String operation,
+  Path source,

Review Comment:
   really good q. will comment below
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



Re: [PR] HADOOP-19140. [ABFS, S3A] Add IORateLimiter API [hadoop]

2024-09-02 Thread via GitHub


anujmodi2021 commented on code in PR #6703:
URL: https://github.com/apache/hadoop/pull/6703#discussion_r1741499620


##
hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/fs/TestIORateLimiter.java:
##
@@ -0,0 +1,213 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.fs;
+
+import java.time.Duration;
+
+import org.assertj.core.api.Assertions;
+import org.junit.Test;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import org.apache.hadoop.fs.impl.IORateLimiterSupport;
+import org.apache.hadoop.test.AbstractHadoopTestBase;
+import org.apache.hadoop.util.RateLimiting;
+import org.apache.hadoop.util.RateLimitingFactory;
+
+import static org.apache.hadoop.fs.statistics.StoreStatisticNames.OP_DELETE;
+import static 
org.apache.hadoop.fs.statistics.StoreStatisticNames.OP_DELETE_BULK;
+import static 
org.apache.hadoop.fs.statistics.StoreStatisticNames.OP_DELETE_DIR;
+import static org.apache.hadoop.test.LambdaTestUtils.intercept;
+
+/**
+ * Test IO rate limiting in {@link RateLimiting} and {@link IORateLimiter}.
+ * 
+ * This includes: illegal arguments, and what if more capacity
+ * is requested than is available.
+ */
+public class TestIORateLimiter extends AbstractHadoopTestBase {
+
+  private static final Logger LOG = LoggerFactory.getLogger(
+  TestIORateLimiter.class);
+
+  public static final Path ROOT = new Path("/");
+
+  @Test
+  public void testAcquireCapacity() {
+final int size = 10;
+final RateLimiting limiter = RateLimitingFactory.create(size);
+// do a chain of requests
+limiter.acquire(0);
+limiter.acquire(1);
+limiter.acquire(2);
+
+// now ask for more than is allowed. This MUST work.
+final int excess = size * 2;
+limiter.acquire(excess);
+assertDelayed(limiter, excess);
+  }
+
+  @Test
+  public void testNegativeCapacityRejected() throws Throwable {
+final RateLimiting limiter = RateLimitingFactory.create(1);
+intercept(IllegalArgumentException.class, () ->
+limiter.acquire(-1));
+  }
+
+  @Test
+  public void testNegativeLimiterCapacityRejected() throws Throwable {
+intercept(IllegalArgumentException.class, () ->
+RateLimitingFactory.create(-1));
+  }
+
+  /**
+   * This is a key behavior: it is acceptable to ask for more capacity
+   * than the caller has, the initial request must be granted,
+   * but the followup request must be delayed until enough capacity
+   * has been restored.
+   */
+  @Test
+  public void testAcquireExcessCapacity() {
+
+// create a small limiter
+final int size = 10;
+final RateLimiting limiter = RateLimitingFactory.create(size);
+
+// now ask for more than is allowed. This MUST work.
+final int excess = size * 2;
+// first attempt gets more capacity than arrives every second.
+assertNotDelayed(limiter, excess);
+// second attempt will block
+assertDelayed(limiter, excess);
+// third attempt will block
+assertDelayed(limiter, size);
+// as these are short-cut, no delays.
+assertNotDelayed(limiter, 0);
+  }
+
+  @Test
+  public void testIORateLimiterWithLimitedCapacity() {
+final int size = 10;
+final IORateLimiter limiter = 
IORateLimiterSupport.createIORateLimiter(size);
+// this size will use more than can be allocated in a second.
+final int excess = size * 2;
+// first attempt gets more capacity than arrives every second.
+assertNotDelayed(limiter, OP_DELETE_DIR, excess);
+// second attempt will block
+assertDelayed(limiter, OP_DELETE_BULK, excess);
+// third attempt will block
+assertDelayed(limiter, OP_DELETE, size);
+// as zero capacity requests are short-cut, no delays, ever.
+assertNotDelayed(limiter, "", 0);
+  }
+
+  /**
+   * Verify the unlimited rate limiter really is unlimited.
+   */
+  @Test
+  public void testIORateLimiterWithUnlimitedCapacity() {
+final IORateLimiter limiter = IORateLimiterSupport.unlimited();
+// this size will use more than can be allocated in a second.
+
+assertNotDelayed(limiter, "1", 100_000);
+assertNotDelayed(limiter, "2", 100_000);
+  }
+
+  @Test
+  public void testUnlimitedRejectsNegativeCapacity() th

Re: [PR] HADOOP-19140. [ABFS, S3A] Add IORateLimiter API [hadoop]

2024-09-02 Thread via GitHub


anujmodi2021 commented on code in PR #6703:
URL: https://github.com/apache/hadoop/pull/6703#discussion_r1741496480


##
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/IORateLimiter.java:
##
@@ -0,0 +1,90 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.fs;
+
+import java.time.Duration;
+import javax.annotation.Nullable;
+
+import org.apache.hadoop.classification.InterfaceAudience;
+import org.apache.hadoop.classification.InterfaceStability;
+
+/**
+ * An optional interface for classes that provide rate limiters.
+ * For a filesystem source, the operation name SHOULD be one of
+ * those listed in
+ * {@link org.apache.hadoop.fs.statistics.StoreStatisticNames}
+ * if the operation is listed there.
+ * 
+ * This interfaces is intended to be exported by FileSystems so that
+ * applications wishing to perform bulk operations may request access
+ * to a rate limiter which is shared across all threads interacting
+ * with the store..
+ * That is: the rate limiting is global to the specific instance of the
+ * object implementing this interface.
+ * 
+ * It is not expected to be shared with other instances of the same
+ * class, or across processes.
+ * 
+ * This means it is primarily of benefit when limiting bulk operations
+ * which can overload an (object) store from a small pool of threads.
+ * Examples of this can include:
+ * 
+ *   Bulk delete operations
+ *   Bulk rename operations
+ *   Completing many in-progress uploads
+ *   Deep and wide recursive treewalks
+ *   Reading/prefetching many blocks within a file
+ * 
+ * In cluster applications, it is more likely that rate limiting is
+ * useful during job commit operations, or processes with many threads.
+ */
+@InterfaceAudience.Public
+@InterfaceStability.Unstable
+public interface IORateLimiter {
+
+  /**
+   * Acquire IO capacity.
+   * 
+   * The implementation may assign different costs to the different
+   * operations.
+   * 
+   * If there is not enough space, the permits will be acquired,
+   * but the subsequent call will block until the capacity has been
+   * refilled.
+   * 
+   * The path parameter is used to support stores where there may be different 
throttling
+   * under different paths.
+   * @param operation operation being performed. Must not be null, may be "",
+   * should be from {@link org.apache.hadoop.fs.statistics.StoreStatisticNames}
+   * where there is a matching operation.
+   * @param source path for operations.
+   * Use "/" for root/store-wide operations.
+   * @param dest destination path for rename operations or any other operation 
which
+   * takes two paths.
+   * @param requestedCapacity capacity to acquire.
+   * Must be greater than or equal to 0.
+   * @return time spent waiting for output.
+   */
+  Duration acquireIOCapacity(
+  String operation,
+  Path source,

Review Comment:
   Just to understand this better...
   If we have a list of paths on which we are attempting a bulk operation and 
the only common prefix for them, is the root itself.
   Should we acquire IO Capacity for each individual path or for the root path 
itself??



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



Re: [PR] HADOOP-19140. [ABFS, S3A] Add IORateLimiter API [hadoop]

2024-04-23 Thread via GitHub


hadoop-yetus commented on PR #6703:
URL: https://github.com/apache/hadoop/pull/6703#issuecomment-2073517148

   :confetti_ball: **+1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 31s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  0s |  |  detect-secrets was not available.  
|
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 1 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  46m 17s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |  17m 52s |  |  trunk passed with JDK 
Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1  |
   | +1 :green_heart: |  compile  |  17m 12s |  |  trunk passed with JDK 
Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06  |
   | +1 :green_heart: |  checkstyle  |   1m 15s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   1m 39s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   1m 14s |  |  trunk passed with JDK 
Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1  |
   | +1 :green_heart: |  javadoc  |   0m 50s |  |  trunk passed with JDK 
Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06  |
   | +1 :green_heart: |  spotbugs  |   2m 35s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  38m 40s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   0m 55s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |  16m 46s |  |  the patch passed with JDK 
Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1  |
   | +1 :green_heart: |  javac  |  16m 46s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |  16m  7s |  |  the patch passed with JDK 
Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06  |
   | +1 :green_heart: |  javac  |  16m  7s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | +1 :green_heart: |  checkstyle  |   1m 14s |  |  the patch passed  |
   | +1 :green_heart: |  mvnsite  |   1m 34s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   1m  4s |  |  the patch passed with JDK 
Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1  |
   | +1 :green_heart: |  javadoc  |   0m 49s |  |  the patch passed with JDK 
Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06  |
   | +1 :green_heart: |  spotbugs  |   2m 52s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  38m 43s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | +1 :green_heart: |  unit  |  19m 54s |  |  hadoop-common in the patch 
passed.  |
   | +1 :green_heart: |  asflicense  |   0m 58s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 232m 44s |  |  |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.45 ServerAPI=1.45 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6703/2/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/6703 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets |
   | uname | Linux 1e9683e47802 5.15.0-94-generic #104-Ubuntu SMP Tue Jan 9 
15:25:40 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / d2e146e4180311a52a94240922e3daf8f94ec8bd |
   | Default Java | Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06 |
   | Multi-JDK versions | 
/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1 
/usr/lib/jvm/java-8-openjdk-amd64:Private 
Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06 |
   |  Test Results | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6703/2/testReport/ |
   | Max. process+thread count | 2038 (vs. ulimit of 5500) |
   | modules | C: hadoop-common-project/hadoop-common U: 
hadoop-common-project/hadoop-common |
   | Console output | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6703/2/console |
   | versions | git=2.25.1 maven=3.6.3 spotbugs=4.2.2 |
   | Powered by | Apache Yetus 0.14.0 https://yetus.apache.org |
   
   
   This message was automatically generated.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, pleas

Re: [PR] HADOOP-19140. [ABFS, S3A] Add IORateLimiter API [hadoop]

2024-04-11 Thread via GitHub


steveloughran commented on code in PR #6703:
URL: https://github.com/apache/hadoop/pull/6703#discussion_r1561220101


##
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/impl/IORateLimiterSupport.java:
##
@@ -0,0 +1,73 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.fs.impl;
+
+import org.apache.hadoop.fs.IORateLimiter;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.util.RateLimiting;
+import org.apache.hadoop.util.RateLimitingFactory;
+
+import static org.apache.hadoop.util.Preconditions.checkArgument;
+
+/**
+ * Implementation support for {@link IORateLimiter}.
+ */
+public final class IORateLimiterSupport {

Review Comment:
   with the op name and path you can be clever: 
   * limit by path
   * use operation name and have a "multiplier" of actual io, to include extra 
operations made (rename: list, copy, delete). for s3, separate read/write io 
capacities would need to be requested.
   * consider some free and give a cost of 0
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



Re: [PR] HADOOP-19140. [ABFS, S3A] Add IORateLimiter API [hadoop]

2024-04-11 Thread via GitHub


steveloughran commented on code in PR #6703:
URL: https://github.com/apache/hadoop/pull/6703#discussion_r1561216836


##
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/IORateLimiter.java:
##
@@ -0,0 +1,90 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.fs;
+
+import java.time.Duration;
+import javax.annotation.Nullable;
+
+import org.apache.hadoop.classification.InterfaceAudience;
+import org.apache.hadoop.classification.InterfaceStability;
+
+/**
+ * An optional interface for classes that provide rate limiters.
+ * For a filesystem source, the operation name SHOULD be one of
+ * those listed in
+ * {@link org.apache.hadoop.fs.statistics.StoreStatisticNames}
+ * if the operation is listed there.
+ * 
+ * This interfaces is intended to be exported by FileSystems so that
+ * applications wishing to perform bulk operations may request access
+ * to a rate limiter which is shared across all threads interacting
+ * with the store..
+ * That is: the rate limiting is global to the specific instance of the
+ * object implementing this interface.
+ * 
+ * It is not expected to be shared with other instances of the same
+ * class, or across processes.
+ * 
+ * This means it is primarily of benefit when limiting bulk operations
+ * which can overload an (object) store from a small pool of threads.
+ * Examples of this can include:
+ * 
+ *   Bulk delete operations
+ *   Bulk rename operations
+ *   Completing many in-progress uploads
+ *   Deep and wide recursive treewalks
+ *   Reading/prefetching many blocks within a file
+ * 
+ * In cluster applications, it is more likely that rate limiting is
+ * useful during job commit operations, or processes with many threads.
+ */
+@InterfaceAudience.Public
+@InterfaceStability.Unstable
+public interface IORateLimiter {
+
+  /**
+   * Acquire IO capacity.
+   * 
+   * The implementation may assign different costs to the different
+   * operations.
+   * 
+   * If there is not enough space, the permits will be acquired,
+   * but the subsequent call will block until the capacity has been
+   * refilled.
+   * 
+   * The path parameter is used to support stores where there may be different 
throttling
+   * under different paths.
+   * @param operation operation being performed. Must not be null, may be "",
+   * should be from {@link org.apache.hadoop.fs.statistics.StoreStatisticNames}
+   * where there is a matching operation.
+   * @param source path for operations.
+   * Use "/" for root/store-wide operations.
+   * @param dest destination path for rename operations or any other operation 
which
+   * takes two paths.
+   * @param requestedCapacity capacity to acquire.
+   * Must be greater than or equal to 0.
+   * @return time spent waiting for output.
+   */
+  Duration acquireIOCapacity(
+  String operation,
+  Path source,

Review Comment:
   s3 throttling does as it is per prefix. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



Re: [PR] HADOOP-19140. [ABFS, S3A] Add IORateLimiter API [hadoop]

2024-04-09 Thread via GitHub


mukund-thakur commented on code in PR #6703:
URL: https://github.com/apache/hadoop/pull/6703#discussion_r1558044600


##
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/IORateLimiter.java:
##
@@ -0,0 +1,90 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.fs;
+
+import java.time.Duration;
+import javax.annotation.Nullable;
+
+import org.apache.hadoop.classification.InterfaceAudience;
+import org.apache.hadoop.classification.InterfaceStability;
+
+/**
+ * An optional interface for classes that provide rate limiters.
+ * For a filesystem source, the operation name SHOULD be one of
+ * those listed in
+ * {@link org.apache.hadoop.fs.statistics.StoreStatisticNames}
+ * if the operation is listed there.
+ * 
+ * This interfaces is intended to be exported by FileSystems so that
+ * applications wishing to perform bulk operations may request access
+ * to a rate limiter which is shared across all threads interacting
+ * with the store..
+ * That is: the rate limiting is global to the specific instance of the
+ * object implementing this interface.
+ * 
+ * It is not expected to be shared with other instances of the same
+ * class, or across processes.
+ * 
+ * This means it is primarily of benefit when limiting bulk operations
+ * which can overload an (object) store from a small pool of threads.
+ * Examples of this can include:
+ * 
+ *   Bulk delete operations
+ *   Bulk rename operations
+ *   Completing many in-progress uploads
+ *   Deep and wide recursive treewalks
+ *   Reading/prefetching many blocks within a file
+ * 
+ * In cluster applications, it is more likely that rate limiting is
+ * useful during job commit operations, or processes with many threads.
+ */
+@InterfaceAudience.Public
+@InterfaceStability.Unstable
+public interface IORateLimiter {
+
+  /**
+   * Acquire IO capacity.
+   * 
+   * The implementation may assign different costs to the different
+   * operations.
+   * 
+   * If there is not enough space, the permits will be acquired,
+   * but the subsequent call will block until the capacity has been
+   * refilled.
+   * 
+   * The path parameter is used to support stores where there may be different 
throttling
+   * under different paths.
+   * @param operation operation being performed. Must not be null, may be "",
+   * should be from {@link org.apache.hadoop.fs.statistics.StoreStatisticNames}
+   * where there is a matching operation.
+   * @param source path for operations.
+   * Use "/" for root/store-wide operations.
+   * @param dest destination path for rename operations or any other operation 
which
+   * takes two paths.
+   * @param requestedCapacity capacity to acquire.
+   * Must be greater than or equal to 0.
+   * @return time spent waiting for output.
+   */
+  Duration acquireIOCapacity(
+  String operation,
+  Path source,

Review Comment:
   A multi-delete operation takes a list of paths. Although we have a concept 
of the base path, I don't think the S3 client cares about every path to be 
under the base path. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



Re: [PR] HADOOP-19140. [ABFS, S3A] Add IORateLimiter API [hadoop]

2024-04-05 Thread via GitHub


mukund-thakur commented on code in PR #6703:
URL: https://github.com/apache/hadoop/pull/6703#discussion_r1554373273


##
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/impl/IORateLimiterSupport.java:
##
@@ -0,0 +1,73 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.fs.impl;
+
+import org.apache.hadoop.fs.IORateLimiter;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.util.RateLimiting;
+import org.apache.hadoop.util.RateLimitingFactory;
+
+import static org.apache.hadoop.util.Preconditions.checkArgument;
+
+/**
+ * Implementation support for {@link IORateLimiter}.
+ */
+public final class IORateLimiterSupport {

Review Comment:
   This is just a wrapper on top of RestrictedRateLimiting with extra operation 
name validation right? 
   I think this can be extended to limit per operation. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



Re: [PR] HADOOP-19140. [ABFS, S3A] Add IORateLimiter API [hadoop]

2024-04-03 Thread via GitHub


hadoop-yetus commented on PR #6703:
URL: https://github.com/apache/hadoop/pull/6703#issuecomment-2035351221

   :confetti_ball: **+1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 19s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  0s |  |  detect-secrets was not available.  
|
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 1 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  32m 47s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   8m 56s |  |  trunk passed with JDK 
Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1  |
   | +1 :green_heart: |  compile  |   8m  7s |  |  trunk passed with JDK 
Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06  |
   | +1 :green_heart: |  checkstyle  |   0m 44s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   1m  3s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   0m 48s |  |  trunk passed with JDK 
Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1  |
   | +1 :green_heart: |  javadoc  |   0m 34s |  |  trunk passed with JDK 
Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06  |
   | +1 :green_heart: |  spotbugs  |   1m 23s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  20m 53s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   0m 30s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   8m 30s |  |  the patch passed with JDK 
Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1  |
   | +1 :green_heart: |  javac  |   8m 30s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   8m  6s |  |  the patch passed with JDK 
Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06  |
   | +1 :green_heart: |  javac  |   8m  6s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | +1 :green_heart: |  checkstyle  |   0m 35s |  |  the patch passed  |
   | +1 :green_heart: |  mvnsite  |   0m 56s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   0m 43s |  |  the patch passed with JDK 
Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1  |
   | +1 :green_heart: |  javadoc  |   0m 37s |  |  the patch passed with JDK 
Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06  |
   | +1 :green_heart: |  spotbugs  |   1m 35s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  21m 19s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | +1 :green_heart: |  unit  |  16m 31s |  |  hadoop-common in the patch 
passed.  |
   | +1 :green_heart: |  asflicense  |   0m 42s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 138m 38s |  |  |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.45 ServerAPI=1.45 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6703/1/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/6703 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets |
   | uname | Linux e24358cb7c53 5.15.0-94-generic #104-Ubuntu SMP Tue Jan 9 
15:25:40 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / 58fb6a3036d824f0c201c7dbdf18b542cc6576d8 |
   | Default Java | Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06 |
   | Multi-JDK versions | 
/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1 
/usr/lib/jvm/java-8-openjdk-amd64:Private 
Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06 |
   |  Test Results | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6703/1/testReport/ |
   | Max. process+thread count | 2150 (vs. ulimit of 5500) |
   | modules | C: hadoop-common-project/hadoop-common U: 
hadoop-common-project/hadoop-common |
   | Console output | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6703/1/console |
   | versions | git=2.25.1 maven=3.6.3 spotbugs=4.2.2 |
   | Powered by | Apache Yetus 0.14.0 https://yetus.apache.org |
   
   
   This message was automatically generated.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, pleas

[PR] HADOOP-19140. [ABFS, S3A] Add IORateLimiter API [hadoop]

2024-04-03 Thread via GitHub


steveloughran opened a new pull request, #6703:
URL: https://github.com/apache/hadoop/pull/6703

   
   Adds an API (pulled from #6596) to allow callers to request IO capacity for 
an named operation with optional source and dest paths.
   
   The first use of this would be the bulk delete operation of #6494; there'd 
be some throttling within the s3a code which set max # of writes per bucket and 
for the bulk delete the caller would ask for as many as there were entries. 
   
   Added new store operations for delete_bulk and delete_dir
   
   ### How was this patch tested?
   
   New tests.
   
   ### For code changes:
   
   - [X] Does the title or this PR starts with the corresponding JIRA issue id 
(e.g. 'HADOOP-17799. Your PR title ...')?
   - [ ] Object storage: have the integration tests been executed and the 
endpoint declared according to the connector-specific documentation?
   - [ ] If adding new dependencies to the code, are these dependencies 
licensed in a way that is compatible for inclusion under [ASF 
2.0](http://www.apache.org/legal/resolved.html#category-a)?
   - [ ] If applicable, have you updated the `LICENSE`, `LICENSE-binary`, 
`NOTICE-binary` files?
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org