[jira] [Commented] (HADOOP-19140) [ABFS, S3A] Add IORateLimiter api to hadoop common
[ https://issues.apache.org/jira/browse/HADOOP-19140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17879167#comment-17879167 ] ASF GitHub Bot commented on HADOOP-19140: - steveloughran commented on PR #6703: URL: https://github.com/apache/hadoop/pull/6703#issuecomment-2328444967 @anujmodi2021 For the work on manifest committer I was asking for some IOPs per rename, so that if there wasn't enough capacity, only those over capacity renames blocked. It also allows for incremental IO: you don't have to block acquire up front, just ask as you go along. gets a bit more complex for S3 where dir operations are mimicked by file-by-file. There nwe'd ask for 2 read and 1 write ops per file rename (HEAD (read) + COPY (read + write) and for the bulk delete to be the same #of writes as the delete list. That is already done in its implementation of BulkDelete. Note that the AWS SDK does split up large COPY operations into multipart copies, so really the IO capacity is (2 * file-size/block size) but as these copies can be so slow I'm not worrying about it. We'd need to replace that bit of the SDK and while we've discussed it. FYI I've let this work lapse as other things took priority; if you want to take it up -feel free to do so. > [ABFS, S3A] Add IORateLimiter api to hadoop common > -- > > Key: HADOOP-19140 > URL: https://issues.apache.org/jira/browse/HADOOP-19140 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs, fs/azure, fs/s3 >Affects Versions: 3.4.0 >Reporter: Steve Loughran >Assignee: Steve Loughran >Priority: Minor > Labels: pull-request-available > > Create a rate limiter API in hadoop common which code (initially, manifest > committer, bulk delete).. can request iO capacity for a specific operation. > this can be exported by filesystems so support shared rate limiting across > all threads > pulled from HADOOP-19093 PR -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-19140) [ABFS, S3A] Add IORateLimiter api to hadoop common
[ https://issues.apache.org/jira/browse/HADOOP-19140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17878722#comment-17878722 ] ASF GitHub Bot commented on HADOOP-19140: - anujmodi2021 commented on code in PR #6703: URL: https://github.com/apache/hadoop/pull/6703#discussion_r1741499620 ## hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/fs/TestIORateLimiter.java: ## @@ -0,0 +1,213 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hadoop.fs; + +import java.time.Duration; + +import org.assertj.core.api.Assertions; +import org.junit.Test; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import org.apache.hadoop.fs.impl.IORateLimiterSupport; +import org.apache.hadoop.test.AbstractHadoopTestBase; +import org.apache.hadoop.util.RateLimiting; +import org.apache.hadoop.util.RateLimitingFactory; + +import static org.apache.hadoop.fs.statistics.StoreStatisticNames.OP_DELETE; +import static org.apache.hadoop.fs.statistics.StoreStatisticNames.OP_DELETE_BULK; +import static org.apache.hadoop.fs.statistics.StoreStatisticNames.OP_DELETE_DIR; +import static org.apache.hadoop.test.LambdaTestUtils.intercept; + +/** + * Test IO rate limiting in {@link RateLimiting} and {@link IORateLimiter}. + * + * This includes: illegal arguments, and what if more capacity + * is requested than is available. + */ +public class TestIORateLimiter extends AbstractHadoopTestBase { + + private static final Logger LOG = LoggerFactory.getLogger( + TestIORateLimiter.class); + + public static final Path ROOT = new Path("/"); + + @Test + public void testAcquireCapacity() { +final int size = 10; +final RateLimiting limiter = RateLimitingFactory.create(size); +// do a chain of requests +limiter.acquire(0); +limiter.acquire(1); +limiter.acquire(2); + +// now ask for more than is allowed. This MUST work. +final int excess = size * 2; +limiter.acquire(excess); +assertDelayed(limiter, excess); + } + + @Test + public void testNegativeCapacityRejected() throws Throwable { +final RateLimiting limiter = RateLimitingFactory.create(1); +intercept(IllegalArgumentException.class, () -> +limiter.acquire(-1)); + } + + @Test + public void testNegativeLimiterCapacityRejected() throws Throwable { +intercept(IllegalArgumentException.class, () -> +RateLimitingFactory.create(-1)); + } + + /** + * This is a key behavior: it is acceptable to ask for more capacity + * than the caller has, the initial request must be granted, + * but the followup request must be delayed until enough capacity + * has been restored. + */ + @Test + public void testAcquireExcessCapacity() { + +// create a small limiter +final int size = 10; +final RateLimiting limiter = RateLimitingFactory.create(size); + +// now ask for more than is allowed. This MUST work. +final int excess = size * 2; +// first attempt gets more capacity than arrives every second. +assertNotDelayed(limiter, excess); +// second attempt will block +assertDelayed(limiter, excess); +// third attempt will block +assertDelayed(limiter, size); +// as these are short-cut, no delays. +assertNotDelayed(limiter, 0); + } + + @Test + public void testIORateLimiterWithLimitedCapacity() { +final int size = 10; +final IORateLimiter limiter = IORateLimiterSupport.createIORateLimiter(size); +// this size will use more than can be allocated in a second. +final int excess = size * 2; +// first attempt gets more capacity than arrives every second. +assertNotDelayed(limiter, OP_DELETE_DIR, excess); +// second attempt will block +assertDelayed(limiter, OP_DELETE_BULK, excess); +// third attempt will block +assertDelayed(limiter, OP_DELETE, size); +// as zero capacity requests are short-cut, no delays, ever. +assertNotDelayed(limiter, "", 0); + } + + /** + * Verify the unlimited rate limiter really is unlimited. + */ + @Test + public void testIORateLimiterWithUnlimitedCapacity() { +final IORateLimiter limiter = IORate
[jira] [Commented] (HADOOP-19140) [ABFS, S3A] Add IORateLimiter api to hadoop common
[ https://issues.apache.org/jira/browse/HADOOP-19140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17878718#comment-17878718 ] ASF GitHub Bot commented on HADOOP-19140: - anujmodi2021 commented on code in PR #6703: URL: https://github.com/apache/hadoop/pull/6703#discussion_r1741496480 ## hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/IORateLimiter.java: ## @@ -0,0 +1,90 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hadoop.fs; + +import java.time.Duration; +import javax.annotation.Nullable; + +import org.apache.hadoop.classification.InterfaceAudience; +import org.apache.hadoop.classification.InterfaceStability; + +/** + * An optional interface for classes that provide rate limiters. + * For a filesystem source, the operation name SHOULD be one of + * those listed in + * {@link org.apache.hadoop.fs.statistics.StoreStatisticNames} + * if the operation is listed there. + * + * This interfaces is intended to be exported by FileSystems so that + * applications wishing to perform bulk operations may request access + * to a rate limiter which is shared across all threads interacting + * with the store.. + * That is: the rate limiting is global to the specific instance of the + * object implementing this interface. + * + * It is not expected to be shared with other instances of the same + * class, or across processes. + * + * This means it is primarily of benefit when limiting bulk operations + * which can overload an (object) store from a small pool of threads. + * Examples of this can include: + * + * Bulk delete operations + * Bulk rename operations + * Completing many in-progress uploads + * Deep and wide recursive treewalks + * Reading/prefetching many blocks within a file + * + * In cluster applications, it is more likely that rate limiting is + * useful during job commit operations, or processes with many threads. + */ +@InterfaceAudience.Public +@InterfaceStability.Unstable +public interface IORateLimiter { + + /** + * Acquire IO capacity. + * + * The implementation may assign different costs to the different + * operations. + * + * If there is not enough space, the permits will be acquired, + * but the subsequent call will block until the capacity has been + * refilled. + * + * The path parameter is used to support stores where there may be different throttling + * under different paths. + * @param operation operation being performed. Must not be null, may be "", + * should be from {@link org.apache.hadoop.fs.statistics.StoreStatisticNames} + * where there is a matching operation. + * @param source path for operations. + * Use "/" for root/store-wide operations. + * @param dest destination path for rename operations or any other operation which + * takes two paths. + * @param requestedCapacity capacity to acquire. + * Must be greater than or equal to 0. + * @return time spent waiting for output. + */ + Duration acquireIOCapacity( + String operation, + Path source, Review Comment: Just to understand this better... If we have a list of paths on which we are attempting a bulk operation and the only common prefix for them, is the root itself. Should we acquire IO Capacity for each individual path or for the root path itself?? > [ABFS, S3A] Add IORateLimiter api to hadoop common > -- > > Key: HADOOP-19140 > URL: https://issues.apache.org/jira/browse/HADOOP-19140 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs, fs/azure, fs/s3 >Affects Versions: 3.4.0 >Reporter: Steve Loughran >Assignee: Steve Loughran >Priority: Minor > Labels: pull-request-available > > Create a rate limiter API in hadoop common which code (initially, manifest > committer, bulk delete).. can request iO capacity for a specific operation. > this can be exported by filesystems so support shared rate limiting across > all threads > pulled from
[jira] [Commented] (HADOOP-19140) [ABFS, S3A] Add IORateLimiter api to hadoop common
[ https://issues.apache.org/jira/browse/HADOOP-19140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17840237#comment-17840237 ] ASF GitHub Bot commented on HADOOP-19140: - hadoop-yetus commented on PR #6703: URL: https://github.com/apache/hadoop/pull/6703#issuecomment-2073517148 :confetti_ball: **+1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 0m 31s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 0s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 0s | | codespell was not available. | | +0 :ok: | detsecrets | 0m 0s | | detect-secrets was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | +1 :green_heart: | test4tests | 0m 0s | | The patch appears to include 1 new or modified test files. | _ trunk Compile Tests _ | | +1 :green_heart: | mvninstall | 46m 17s | | trunk passed | | +1 :green_heart: | compile | 17m 52s | | trunk passed with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1 | | +1 :green_heart: | compile | 17m 12s | | trunk passed with JDK Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06 | | +1 :green_heart: | checkstyle | 1m 15s | | trunk passed | | +1 :green_heart: | mvnsite | 1m 39s | | trunk passed | | +1 :green_heart: | javadoc | 1m 14s | | trunk passed with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1 | | +1 :green_heart: | javadoc | 0m 50s | | trunk passed with JDK Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06 | | +1 :green_heart: | spotbugs | 2m 35s | | trunk passed | | +1 :green_heart: | shadedclient | 38m 40s | | branch has no errors when building and testing our client artifacts. | _ Patch Compile Tests _ | | +1 :green_heart: | mvninstall | 0m 55s | | the patch passed | | +1 :green_heart: | compile | 16m 46s | | the patch passed with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1 | | +1 :green_heart: | javac | 16m 46s | | the patch passed | | +1 :green_heart: | compile | 16m 7s | | the patch passed with JDK Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06 | | +1 :green_heart: | javac | 16m 7s | | the patch passed | | +1 :green_heart: | blanks | 0m 0s | | The patch has no blanks issues. | | +1 :green_heart: | checkstyle | 1m 14s | | the patch passed | | +1 :green_heart: | mvnsite | 1m 34s | | the patch passed | | +1 :green_heart: | javadoc | 1m 4s | | the patch passed with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1 | | +1 :green_heart: | javadoc | 0m 49s | | the patch passed with JDK Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06 | | +1 :green_heart: | spotbugs | 2m 52s | | the patch passed | | +1 :green_heart: | shadedclient | 38m 43s | | patch has no errors when building and testing our client artifacts. | _ Other Tests _ | | +1 :green_heart: | unit | 19m 54s | | hadoop-common in the patch passed. | | +1 :green_heart: | asflicense | 0m 58s | | The patch does not generate ASF License warnings. | | | | 232m 44s | | | | Subsystem | Report/Notes | |--:|:-| | Docker | ClientAPI=1.45 ServerAPI=1.45 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6703/2/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/6703 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets | | uname | Linux 1e9683e47802 5.15.0-94-generic #104-Ubuntu SMP Tue Jan 9 15:25:40 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | dev-support/bin/hadoop.sh | | git revision | trunk / d2e146e4180311a52a94240922e3daf8f94ec8bd | | Default Java | Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06 | | Multi-JDK versions | /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06 | | Test Results | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6703/2/testReport/ | | Max. process+thread count | 2038 (vs. ulimit of 5500) | | modules | C: hadoop-common-project/hadoop-common U: hadoop-common-project/hadoop-common | | Console output | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6703/2/console | | versions | git=2.25.1 maven=3.6.3 spotbugs=4.2.2 | | Powered by | Apache Yetus 0.14.0 https://yetus.apache.org | This message was automatically generated. > [ABFS, S
[jira] [Commented] (HADOOP-19140) [ABFS, S3A] Add IORateLimiter api to hadoop common
[ https://issues.apache.org/jira/browse/HADOOP-19140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17836245#comment-17836245 ] ASF GitHub Bot commented on HADOOP-19140: - steveloughran commented on code in PR #6703: URL: https://github.com/apache/hadoop/pull/6703#discussion_r1561220101 ## hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/impl/IORateLimiterSupport.java: ## @@ -0,0 +1,73 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hadoop.fs.impl; + +import org.apache.hadoop.fs.IORateLimiter; +import org.apache.hadoop.fs.Path; +import org.apache.hadoop.util.RateLimiting; +import org.apache.hadoop.util.RateLimitingFactory; + +import static org.apache.hadoop.util.Preconditions.checkArgument; + +/** + * Implementation support for {@link IORateLimiter}. + */ +public final class IORateLimiterSupport { Review Comment: with the op name and path you can be clever: * limit by path * use operation name and have a "multiplier" of actual io, to include extra operations made (rename: list, copy, delete). for s3, separate read/write io capacities would need to be requested. * consider some free and give a cost of 0 > [ABFS, S3A] Add IORateLimiter api to hadoop common > -- > > Key: HADOOP-19140 > URL: https://issues.apache.org/jira/browse/HADOOP-19140 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs, fs/azure, fs/s3 >Affects Versions: 3.4.0 >Reporter: Steve Loughran >Assignee: Steve Loughran >Priority: Minor > Labels: pull-request-available > > Create a rate limiter API in hadoop common which code (initially, manifest > committer, bulk delete).. can request iO capacity for a specific operation. > this can be exported by filesystems so support shared rate limiting across > all threads > pulled from HADOOP-19093 PR -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-19140) [ABFS, S3A] Add IORateLimiter api to hadoop common
[ https://issues.apache.org/jira/browse/HADOOP-19140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17836243#comment-17836243 ] ASF GitHub Bot commented on HADOOP-19140: - steveloughran commented on code in PR #6703: URL: https://github.com/apache/hadoop/pull/6703#discussion_r1561216836 ## hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/IORateLimiter.java: ## @@ -0,0 +1,90 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hadoop.fs; + +import java.time.Duration; +import javax.annotation.Nullable; + +import org.apache.hadoop.classification.InterfaceAudience; +import org.apache.hadoop.classification.InterfaceStability; + +/** + * An optional interface for classes that provide rate limiters. + * For a filesystem source, the operation name SHOULD be one of + * those listed in + * {@link org.apache.hadoop.fs.statistics.StoreStatisticNames} + * if the operation is listed there. + * + * This interfaces is intended to be exported by FileSystems so that + * applications wishing to perform bulk operations may request access + * to a rate limiter which is shared across all threads interacting + * with the store.. + * That is: the rate limiting is global to the specific instance of the + * object implementing this interface. + * + * It is not expected to be shared with other instances of the same + * class, or across processes. + * + * This means it is primarily of benefit when limiting bulk operations + * which can overload an (object) store from a small pool of threads. + * Examples of this can include: + * + * Bulk delete operations + * Bulk rename operations + * Completing many in-progress uploads + * Deep and wide recursive treewalks + * Reading/prefetching many blocks within a file + * + * In cluster applications, it is more likely that rate limiting is + * useful during job commit operations, or processes with many threads. + */ +@InterfaceAudience.Public +@InterfaceStability.Unstable +public interface IORateLimiter { + + /** + * Acquire IO capacity. + * + * The implementation may assign different costs to the different + * operations. + * + * If there is not enough space, the permits will be acquired, + * but the subsequent call will block until the capacity has been + * refilled. + * + * The path parameter is used to support stores where there may be different throttling + * under different paths. + * @param operation operation being performed. Must not be null, may be "", + * should be from {@link org.apache.hadoop.fs.statistics.StoreStatisticNames} + * where there is a matching operation. + * @param source path for operations. + * Use "/" for root/store-wide operations. + * @param dest destination path for rename operations or any other operation which + * takes two paths. + * @param requestedCapacity capacity to acquire. + * Must be greater than or equal to 0. + * @return time spent waiting for output. + */ + Duration acquireIOCapacity( + String operation, + Path source, Review Comment: s3 throttling does as it is per prefix. > [ABFS, S3A] Add IORateLimiter api to hadoop common > -- > > Key: HADOOP-19140 > URL: https://issues.apache.org/jira/browse/HADOOP-19140 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs, fs/azure, fs/s3 >Affects Versions: 3.4.0 >Reporter: Steve Loughran >Assignee: Steve Loughran >Priority: Minor > Labels: pull-request-available > > Create a rate limiter API in hadoop common which code (initially, manifest > committer, bulk delete).. can request iO capacity for a specific operation. > this can be exported by filesystems so support shared rate limiting across > all threads > pulled from HADOOP-19093 PR -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apach
[jira] [Commented] (HADOOP-19140) [ABFS, S3A] Add IORateLimiter api to hadoop common
[ https://issues.apache.org/jira/browse/HADOOP-19140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17835491#comment-17835491 ] ASF GitHub Bot commented on HADOOP-19140: - mukund-thakur commented on code in PR #6703: URL: https://github.com/apache/hadoop/pull/6703#discussion_r1558044600 ## hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/IORateLimiter.java: ## @@ -0,0 +1,90 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hadoop.fs; + +import java.time.Duration; +import javax.annotation.Nullable; + +import org.apache.hadoop.classification.InterfaceAudience; +import org.apache.hadoop.classification.InterfaceStability; + +/** + * An optional interface for classes that provide rate limiters. + * For a filesystem source, the operation name SHOULD be one of + * those listed in + * {@link org.apache.hadoop.fs.statistics.StoreStatisticNames} + * if the operation is listed there. + * + * This interfaces is intended to be exported by FileSystems so that + * applications wishing to perform bulk operations may request access + * to a rate limiter which is shared across all threads interacting + * with the store.. + * That is: the rate limiting is global to the specific instance of the + * object implementing this interface. + * + * It is not expected to be shared with other instances of the same + * class, or across processes. + * + * This means it is primarily of benefit when limiting bulk operations + * which can overload an (object) store from a small pool of threads. + * Examples of this can include: + * + * Bulk delete operations + * Bulk rename operations + * Completing many in-progress uploads + * Deep and wide recursive treewalks + * Reading/prefetching many blocks within a file + * + * In cluster applications, it is more likely that rate limiting is + * useful during job commit operations, or processes with many threads. + */ +@InterfaceAudience.Public +@InterfaceStability.Unstable +public interface IORateLimiter { + + /** + * Acquire IO capacity. + * + * The implementation may assign different costs to the different + * operations. + * + * If there is not enough space, the permits will be acquired, + * but the subsequent call will block until the capacity has been + * refilled. + * + * The path parameter is used to support stores where there may be different throttling + * under different paths. + * @param operation operation being performed. Must not be null, may be "", + * should be from {@link org.apache.hadoop.fs.statistics.StoreStatisticNames} + * where there is a matching operation. + * @param source path for operations. + * Use "/" for root/store-wide operations. + * @param dest destination path for rename operations or any other operation which + * takes two paths. + * @param requestedCapacity capacity to acquire. + * Must be greater than or equal to 0. + * @return time spent waiting for output. + */ + Duration acquireIOCapacity( + String operation, + Path source, Review Comment: A multi-delete operation takes a list of paths. Although we have a concept of the base path, I don't think the S3 client cares about every path to be under the base path. > [ABFS, S3A] Add IORateLimiter api to hadoop common > -- > > Key: HADOOP-19140 > URL: https://issues.apache.org/jira/browse/HADOOP-19140 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs, fs/azure, fs/s3 >Affects Versions: 3.4.0 >Reporter: Steve Loughran >Assignee: Steve Loughran >Priority: Minor > Labels: pull-request-available > > Create a rate limiter API in hadoop common which code (initially, manifest > committer, bulk delete).. can request iO capacity for a specific operation. > this can be exported by filesystems so support shared rate limiting across > all threads > pulled from HADOOP-19093 PR -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HADOOP-19140) [ABFS, S3A] Add IORateLimiter api to hadoop common
[ https://issues.apache.org/jira/browse/HADOOP-19140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17834454#comment-17834454 ] ASF GitHub Bot commented on HADOOP-19140: - mukund-thakur commented on code in PR #6703: URL: https://github.com/apache/hadoop/pull/6703#discussion_r1554373273 ## hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/impl/IORateLimiterSupport.java: ## @@ -0,0 +1,73 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hadoop.fs.impl; + +import org.apache.hadoop.fs.IORateLimiter; +import org.apache.hadoop.fs.Path; +import org.apache.hadoop.util.RateLimiting; +import org.apache.hadoop.util.RateLimitingFactory; + +import static org.apache.hadoop.util.Preconditions.checkArgument; + +/** + * Implementation support for {@link IORateLimiter}. + */ +public final class IORateLimiterSupport { Review Comment: This is just a wrapper on top of RestrictedRateLimiting with extra operation name validation right? I think this can be extended to limit per operation. > [ABFS, S3A] Add IORateLimiter api to hadoop common > -- > > Key: HADOOP-19140 > URL: https://issues.apache.org/jira/browse/HADOOP-19140 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs, fs/azure, fs/s3 >Affects Versions: 3.4.0 >Reporter: Steve Loughran >Assignee: Steve Loughran >Priority: Minor > Labels: pull-request-available > > Create a rate limiter API in hadoop common which code (initially, manifest > committer, bulk delete).. can request iO capacity for a specific operation. > this can be exported by filesystems so support shared rate limiting across > all threads > pulled from HADOOP-19093 PR -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-19140) [ABFS, S3A] Add IORateLimiter api to hadoop common
[ https://issues.apache.org/jira/browse/HADOOP-19140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17833715#comment-17833715 ] ASF GitHub Bot commented on HADOOP-19140: - hadoop-yetus commented on PR #6703: URL: https://github.com/apache/hadoop/pull/6703#issuecomment-2035351221 :confetti_ball: **+1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 0m 19s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 0s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 0s | | codespell was not available. | | +0 :ok: | detsecrets | 0m 0s | | detect-secrets was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | +1 :green_heart: | test4tests | 0m 0s | | The patch appears to include 1 new or modified test files. | _ trunk Compile Tests _ | | +1 :green_heart: | mvninstall | 32m 47s | | trunk passed | | +1 :green_heart: | compile | 8m 56s | | trunk passed with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1 | | +1 :green_heart: | compile | 8m 7s | | trunk passed with JDK Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06 | | +1 :green_heart: | checkstyle | 0m 44s | | trunk passed | | +1 :green_heart: | mvnsite | 1m 3s | | trunk passed | | +1 :green_heart: | javadoc | 0m 48s | | trunk passed with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1 | | +1 :green_heart: | javadoc | 0m 34s | | trunk passed with JDK Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06 | | +1 :green_heart: | spotbugs | 1m 23s | | trunk passed | | +1 :green_heart: | shadedclient | 20m 53s | | branch has no errors when building and testing our client artifacts. | _ Patch Compile Tests _ | | +1 :green_heart: | mvninstall | 0m 30s | | the patch passed | | +1 :green_heart: | compile | 8m 30s | | the patch passed with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1 | | +1 :green_heart: | javac | 8m 30s | | the patch passed | | +1 :green_heart: | compile | 8m 6s | | the patch passed with JDK Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06 | | +1 :green_heart: | javac | 8m 6s | | the patch passed | | +1 :green_heart: | blanks | 0m 0s | | The patch has no blanks issues. | | +1 :green_heart: | checkstyle | 0m 35s | | the patch passed | | +1 :green_heart: | mvnsite | 0m 56s | | the patch passed | | +1 :green_heart: | javadoc | 0m 43s | | the patch passed with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1 | | +1 :green_heart: | javadoc | 0m 37s | | the patch passed with JDK Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06 | | +1 :green_heart: | spotbugs | 1m 35s | | the patch passed | | +1 :green_heart: | shadedclient | 21m 19s | | patch has no errors when building and testing our client artifacts. | _ Other Tests _ | | +1 :green_heart: | unit | 16m 31s | | hadoop-common in the patch passed. | | +1 :green_heart: | asflicense | 0m 42s | | The patch does not generate ASF License warnings. | | | | 138m 38s | | | | Subsystem | Report/Notes | |--:|:-| | Docker | ClientAPI=1.45 ServerAPI=1.45 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6703/1/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/6703 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets | | uname | Linux e24358cb7c53 5.15.0-94-generic #104-Ubuntu SMP Tue Jan 9 15:25:40 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | dev-support/bin/hadoop.sh | | git revision | trunk / 58fb6a3036d824f0c201c7dbdf18b542cc6576d8 | | Default Java | Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06 | | Multi-JDK versions | /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06 | | Test Results | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6703/1/testReport/ | | Max. process+thread count | 2150 (vs. ulimit of 5500) | | modules | C: hadoop-common-project/hadoop-common U: hadoop-common-project/hadoop-common | | Console output | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6703/1/console | | versions | git=2.25.1 maven=3.6.3 spotbugs=4.2.2 | | Powered by | Apache Yetus 0.14.0 https://yetus.apache.org | This message was automatically generated. > [ABFS, S
[jira] [Commented] (HADOOP-19140) [ABFS, S3A] Add IORateLimiter api to hadoop common
[ https://issues.apache.org/jira/browse/HADOOP-19140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17833665#comment-17833665 ] ASF GitHub Bot commented on HADOOP-19140: - steveloughran opened a new pull request, #6703: URL: https://github.com/apache/hadoop/pull/6703 Adds an API (pulled from #6596) to allow callers to request IO capacity for an named operation with optional source and dest paths. The first use of this would be the bulk delete operation of #6494; there'd be some throttling within the s3a code which set max # of writes per bucket and for the bulk delete the caller would ask for as many as there were entries. Added new store operations for delete_bulk and delete_dir ### How was this patch tested? New tests. ### For code changes: - [X] Does the title or this PR starts with the corresponding JIRA issue id (e.g. 'HADOOP-17799. Your PR title ...')? - [ ] Object storage: have the integration tests been executed and the endpoint declared according to the connector-specific documentation? - [ ] If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under [ASF 2.0](http://www.apache.org/legal/resolved.html#category-a)? - [ ] If applicable, have you updated the `LICENSE`, `LICENSE-binary`, `NOTICE-binary` files? > [ABFS, S3A] Add IORateLimiter api to hadoop common > -- > > Key: HADOOP-19140 > URL: https://issues.apache.org/jira/browse/HADOOP-19140 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs, fs/azure, fs/s3 >Affects Versions: 3.4.0 >Reporter: Steve Loughran >Assignee: Steve Loughran >Priority: Minor > > Create a rate limiter API in hadoop common which code (initially, manifest > committer, bulk delete).. can request iO capacity for a specific operation. > this can be exported by filesystems so support shared rate limiting across > all threads > pulled from HADOOP-19093 PR -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org