[jira] [Commented] (HADOOP-18435) Remove usage of fs.s3a.executor.capacity
[ https://issues.apache.org/jira/browse/HADOOP-18435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17611287#comment-17611287 ] Viraj Jasani commented on HADOOP-18435: --- I agree that bounded threadpool is used by more usecases now, and also that we do need an executor in store context. But we already have the consumers of {code:java} public ExecutorService createThrottledExecutor(int capacity) { return new SemaphoredDelegatingExecutor(executor, capacity, true); } {code} We don't have any consumers of the StoreContext#executorCapacity and hence *_createThrottledExecutor()_* is of no use, only *_createThrottledExecutor(int capacity)_* is being used. Hence, when we say that the config _fs.s3a.executor.capacity_ is used to represent the capacity of the executor queues other than block upload, the users would tend to use the config to tune the executor queues but would not see any difference in the use-case behaviors. The use-cases like prefetch, vectored IO, huge file write with S3ABlockOutputStream etc do use SemaphoredDelegatingExecutor with bounded thread pool, however none of them are using _fs.s3a.executor.capacity_ to determine the bounded thread pool capacity. > Remove usage of fs.s3a.executor.capacity > > > Key: HADOOP-18435 > URL: https://issues.apache.org/jira/browse/HADOOP-18435 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Reporter: Viraj Jasani >Assignee: Viraj Jasani >Priority: Major > > When s3guard was part of s3a, DynamoDBMetadataStore was the only consumer of > StoreContext that used throttled executor provided by StoreContext, which > internally uses fs.s3a.executor.capacity to determine executor capacity for > SemaphoredDelegatingExecutor. With the removal of s3guard from s3a, we should > also remove fs.s3a.executor.capacity and it's usages as it's no longer being > used by any StoreContext consumers. The config's existence and its > description can be really confusing for the users. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-18435) Remove usage of fs.s3a.executor.capacity
[ https://issues.apache.org/jira/browse/HADOOP-18435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17611281#comment-17611281 ] Viraj Jasani commented on HADOOP-18435: --- [~mthakur] [~ste...@apache.org] even after HADOOP-18347 is merged, i still see that config *_fs.s3a.executor.capacity_* is not being used anywhere and merely creating confusion for users. We can apply this patch and run tests with scale profile to confirm the same: {code:java} diff --git a/hadoop-common-project/hadoop-common/src/main/resources/core-default.xml b/hadoop-common-project/hadoop-common/src/main/resources/core-default.xml index 17cd228dc1b..de7a9b84759 100644 --- a/hadoop-common-project/hadoop-common/src/main/resources/core-default.xml +++ b/hadoop-common-project/hadoop-common/src/main/resources/core-default.xml @@ -1523,7 +1523,7 @@ fs.s3a.executor.capacity - 16 + 0 The maximum number of submitted tasks which is a single operation (e.g. rename(), delete()) may submit simultaneously for execution -excluding the IO-heavy block uploads, whose capacity diff --git a/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/Constants.java b/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/Constants.java index 856d1dfb97b..807ba23913d 100644 --- a/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/Constants.java +++ b/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/Constants.java @@ -417,7 +417,7 @@ private Constants() { * upload, where {@link #FAST_UPLOAD_ACTIVE_BLOCKS} is used instead. * Value: {@value} */ - public static final int DEFAULT_EXECUTOR_CAPACITY = 16; + public static final int DEFAULT_EXECUTOR_CAPACITY = 0; // Private | PublicRead | PublicReadWrite | AuthenticatedRead | // LogDeliveryWrite | BucketOwnerRead | BucketOwnerFullControl diff --git a/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java b/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java index 3e6f2322d3b..ca4abaf0850 100644 --- a/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java +++ b/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java @@ -783,7 +783,7 @@ private void initThreadPools(Configuration conf) { name + "-unbounded")); unboundedThreadPool.allowCoreThreadTimeOut(true); executorCapacity = intOption(conf, - EXECUTOR_CAPACITY, DEFAULT_EXECUTOR_CAPACITY, 1); + EXECUTOR_CAPACITY, DEFAULT_EXECUTOR_CAPACITY, 0); if (prefetchEnabled) { final S3AInputStreamStatistics s3AInputStreamStatistics = statisticsContext.newInputStreamStatistics(); {code} The reason why this patch would have no impact is because as I mentioned in the above comment, this config is only used by: {code:java} /** * Create a new executor with the capacity defined in * {@link #executorCapacity}. * @return a new executor for exclusive use by the caller. */ public ExecutorService createThrottledExecutor() { return createThrottledExecutor(executorCapacity); }{code} The executorCapacity here is derived from the config {*}_fs.s3a.executor.capacity_{*}. Hence, even after merging HADOOP-18347, the fact remains the same that when s3guard was part of s3a, DynamoDBMetadataStore was the only consumer of the above method (i.e. createThrottledExecutor()). Now that s3guard is no longer present, this config's presence is just creating more confusions. > Remove usage of fs.s3a.executor.capacity > > > Key: HADOOP-18435 > URL: https://issues.apache.org/jira/browse/HADOOP-18435 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Reporter: Viraj Jasani >Assignee: Viraj Jasani >Priority: Major > > When s3guard was part of s3a, DynamoDBMetadataStore was the only consumer of > StoreContext that used throttled executor provided by StoreContext, which > internally uses fs.s3a.executor.capacity to determine executor capacity for > SemaphoredDelegatingExecutor. With the removal of s3guard from s3a, we should > also remove fs.s3a.executor.capacity and it's usages as it's no longer being > used by any StoreContext consumers. The config's existence and its > description can be really confusing for the users. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-18435) Remove usage of fs.s3a.executor.capacity
[ https://issues.apache.org/jira/browse/HADOOP-18435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17600957#comment-17600957 ] Mukund Thakur commented on HADOOP-18435: Yes it will be used in future for Vectored IO stuff. Jira here https://issues.apache.org/jira/browse/HADOOP-18347 > Remove usage of fs.s3a.executor.capacity > > > Key: HADOOP-18435 > URL: https://issues.apache.org/jira/browse/HADOOP-18435 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Reporter: Viraj Jasani >Assignee: Viraj Jasani >Priority: Major > > When s3guard was part of s3a, DynamoDBMetadataStore was the only consumer of > StoreContext that used throttled executor provided by StoreContext, which > internally uses fs.s3a.executor.capacity to determine executor capacity for > SemaphoredDelegatingExecutor. With the removal of s3guard from s3a, we should > also remove fs.s3a.executor.capacity and it's usages as it's no longer being > used by any StoreContext consumers. The config's existence and its > description can be really confusing for the users. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-18435) Remove usage of fs.s3a.executor.capacity
[ https://issues.apache.org/jira/browse/HADOOP-18435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17599164#comment-17599164 ] Viraj Jasani commented on HADOOP-18435: --- The way I came to know about this issue is by testing hbase on s3 (hbase WAL on HDFS and actual HFiles on S3, accessed using S3A). I just wanted to see how much perf improvement could be achieved by tuning fs.s3a.executor.capacity. I tried with small bump, and then eventually very high bump but no significant difference was observed. When I looked into the usages of fs.s3a.executor.capacity, realized that it's not being used anywhere as per the above comment. Just to confirm that I have not missed any usecase, I provided value 0 to fs.s3a.executor.capacity (adjusted min value from 1 to 0), and ran all S3A tests with -Dscale, and not a single test failed. On the other hand, HBase doesn't face any issues either. > Remove usage of fs.s3a.executor.capacity > > > Key: HADOOP-18435 > URL: https://issues.apache.org/jira/browse/HADOOP-18435 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Reporter: Viraj Jasani >Assignee: Viraj Jasani >Priority: Major > > When s3guard was part of s3a, DynamoDBMetadataStore was the only consumer of > StoreContext that used throttled executor provided by StoreContext, which > internally uses fs.s3a.executor.capacity to determine executor capacity for > SemaphoredDelegatingExecutor. With the removal of s3guard from s3a, we should > also remove fs.s3a.executor.capacity and it's usages as it's no longer being > used by any StoreContext consumers. The config's existence and its > description can be really confusing for the users. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-18435) Remove usage of fs.s3a.executor.capacity
[ https://issues.apache.org/jira/browse/HADOOP-18435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17599093#comment-17599093 ] Viraj Jasani commented on HADOOP-18435: --- To be more specific, this method is no longer in use, making this config not usable by any consumer. That's why having the config's existence in codebase as well as default site config would make any user try configuring it's value but it won't have any impact in reality. {code:java} /** * Create a new executor with the capacity defined in * {@link #executorCapacity}. * @return a new executor for exclusive use by the caller. */ public ExecutorService createThrottledExecutor() { return createThrottledExecutor(executorCapacity); } {code} > Remove usage of fs.s3a.executor.capacity > > > Key: HADOOP-18435 > URL: https://issues.apache.org/jira/browse/HADOOP-18435 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Reporter: Viraj Jasani >Assignee: Viraj Jasani >Priority: Major > > When s3guard was part of s3a, DynamoDBMetadataStore was the only consumer of > StoreContext that used throttled executor provided by StoreContext, which > internally uses fs.s3a.executor.capacity to determine executor capacity for > SemaphoredDelegatingExecutor. With the removal of s3guard from s3a, we should > also remove fs.s3a.executor.capacity and it's usages as it's no longer being > used by any StoreContext consumers. The config's existence and its > description can be really confusing for the users. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-18435) Remove usage of fs.s3a.executor.capacity
[ https://issues.apache.org/jira/browse/HADOOP-18435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17598856#comment-17598856 ] Viraj Jasani commented on HADOOP-18435: --- Executor is available in store context, what I meant to say is that the config fs.s3a.executor.capacity is not being used anywhere. As of today, the description of fs.s3a.executor.capacity says that it represents the capacity of executor queues for operations other than block upload, but in reality the config is not being used anywhere. On the other hand, fs.s3a.fast.upload.active.blocks is being used as permit count in SemaphoredDelegatingExecutor by S3ABlockOutputStream. > Remove usage of fs.s3a.executor.capacity > > > Key: HADOOP-18435 > URL: https://issues.apache.org/jira/browse/HADOOP-18435 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Reporter: Viraj Jasani >Assignee: Viraj Jasani >Priority: Major > > When s3guard was part of s3a, DynamoDBMetadataStore was the only consumer of > StoreContext that used throttled executor provided by StoreContext, which > internally uses fs.s3a.executor.capacity to determine executor capacity for > SemaphoredDelegatingExecutor. With the removal of s3guard from s3a, we should > also remove fs.s3a.executor.capacity and it's usages as it's no longer being > used by any StoreContext consumers. The config's existence and its > description can be really confusing for the users. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-18435) Remove usage of fs.s3a.executor.capacity
[ https://issues.apache.org/jira/browse/HADOOP-18435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17598838#comment-17598838 ] Steve Loughran commented on HADOOP-18435: - 1. we do need an executor in store context 2. and the prefeching and vector code both need to be using a bounded thread pool; outstanding JIRAs there > Remove usage of fs.s3a.executor.capacity > > > Key: HADOOP-18435 > URL: https://issues.apache.org/jira/browse/HADOOP-18435 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Reporter: Viraj Jasani >Assignee: Viraj Jasani >Priority: Major > > When s3guard was part of s3a, DynamoDBMetadataStore was the only consumer of > StoreContext that used throttled executor provided by StoreContext, which > internally uses fs.s3a.executor.capacity to determine executor capacity for > SemaphoredDelegatingExecutor. With the removal of s3guard from s3a, we should > also remove fs.s3a.executor.capacity and it's usages as it's no longer being > used by any StoreContext consumers. The config's existence and its > description can be really confusing for the users. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org