[jira] [Commented] (HADOOP-18435) Remove usage of fs.s3a.executor.capacity

2022-09-29 Thread Viraj Jasani (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-18435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17611287#comment-17611287
 ] 

Viraj Jasani commented on HADOOP-18435:
---

I agree that bounded threadpool is used by more usecases now, and also that we 
do need an executor in store context. But we already have the consumers of 
{code:java}
public ExecutorService createThrottledExecutor(int capacity) {
  return new SemaphoredDelegatingExecutor(executor,
  capacity, true);
} {code}
We don't have any consumers of the StoreContext#executorCapacity and hence 
*_createThrottledExecutor()_* is of no use, only *_createThrottledExecutor(int 
capacity)_* is being used.

Hence, when we say that the config _fs.s3a.executor.capacity_ is used to 
represent the capacity of the executor queues other than block upload, the 
users would tend to use the config to tune the executor queues but would not 
see any difference in the use-case behaviors.

 

The use-cases like prefetch, vectored IO, huge file write with 
S3ABlockOutputStream etc do use SemaphoredDelegatingExecutor with bounded 
thread pool, however none of them are using _fs.s3a.executor.capacity_ to 
determine the bounded thread pool capacity.

> Remove usage of fs.s3a.executor.capacity
> 
>
> Key: HADOOP-18435
> URL: https://issues.apache.org/jira/browse/HADOOP-18435
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Reporter: Viraj Jasani
>Assignee: Viraj Jasani
>Priority: Major
>
> When s3guard was part of s3a, DynamoDBMetadataStore was the only consumer of 
> StoreContext that used throttled executor provided by StoreContext, which 
> internally uses fs.s3a.executor.capacity to determine executor capacity for 
> SemaphoredDelegatingExecutor. With the removal of s3guard from s3a, we should 
> also remove fs.s3a.executor.capacity and it's usages as it's no longer being 
> used by any StoreContext consumers. The config's existence and its 
> description can be really confusing for the users.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-18435) Remove usage of fs.s3a.executor.capacity

2022-09-29 Thread Viraj Jasani (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-18435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17611281#comment-17611281
 ] 

Viraj Jasani commented on HADOOP-18435:
---

[~mthakur] [~ste...@apache.org] even after HADOOP-18347 is merged, i still see 
that config *_fs.s3a.executor.capacity_* is not being used anywhere and merely 
creating confusion for users.

We can apply this patch and run tests with scale profile to confirm the same:
{code:java}
diff --git 
a/hadoop-common-project/hadoop-common/src/main/resources/core-default.xml 
b/hadoop-common-project/hadoop-common/src/main/resources/core-default.xml
index 17cd228dc1b..de7a9b84759 100644
--- a/hadoop-common-project/hadoop-common/src/main/resources/core-default.xml
+++ b/hadoop-common-project/hadoop-common/src/main/resources/core-default.xml
@@ -1523,7 +1523,7 @@
 
 
   fs.s3a.executor.capacity
-  16
+  0
   The maximum number of submitted tasks which is a single
     operation (e.g. rename(), delete()) may submit simultaneously for
     execution -excluding the IO-heavy block uploads, whose capacity
diff --git 
a/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/Constants.java 
b/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/Constants.java
index 856d1dfb97b..807ba23913d 100644
--- 
a/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/Constants.java
+++ 
b/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/Constants.java
@@ -417,7 +417,7 @@ private Constants() {
    * upload, where {@link #FAST_UPLOAD_ACTIVE_BLOCKS} is used instead.
    * Value: {@value}
    */
-  public static final int DEFAULT_EXECUTOR_CAPACITY = 16;
+  public static final int DEFAULT_EXECUTOR_CAPACITY = 0;
 
   // Private | PublicRead | PublicReadWrite | AuthenticatedRead |
   // LogDeliveryWrite | BucketOwnerRead | BucketOwnerFullControl
diff --git 
a/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java
 
b/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java
index 3e6f2322d3b..ca4abaf0850 100644
--- 
a/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java
+++ 
b/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java
@@ -783,7 +783,7 @@ private void initThreadPools(Configuration conf) {
             name + "-unbounded"));
     unboundedThreadPool.allowCoreThreadTimeOut(true);
     executorCapacity = intOption(conf,
-        EXECUTOR_CAPACITY, DEFAULT_EXECUTOR_CAPACITY, 1);
+        EXECUTOR_CAPACITY, DEFAULT_EXECUTOR_CAPACITY, 0);
     if (prefetchEnabled) {
       final S3AInputStreamStatistics s3AInputStreamStatistics =
           statisticsContext.newInputStreamStatistics();
 {code}
The reason why this patch would have no impact is because as I mentioned in the 
above comment, this config is only used by:
{code:java}
/**
 * Create a new executor with the capacity defined in
 * {@link #executorCapacity}.
 * @return a new executor for exclusive use by the caller.
 */
public ExecutorService createThrottledExecutor() {
  return createThrottledExecutor(executorCapacity);
}{code}
The executorCapacity here is derived from the config 
{*}_fs.s3a.executor.capacity_{*}.

Hence, even after merging HADOOP-18347, the fact remains the same that when 
s3guard was part of s3a, DynamoDBMetadataStore was the only consumer of the 
above method (i.e. createThrottledExecutor()). Now that s3guard is no longer 
present, this config's presence is just creating more confusions.

> Remove usage of fs.s3a.executor.capacity
> 
>
> Key: HADOOP-18435
> URL: https://issues.apache.org/jira/browse/HADOOP-18435
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Reporter: Viraj Jasani
>Assignee: Viraj Jasani
>Priority: Major
>
> When s3guard was part of s3a, DynamoDBMetadataStore was the only consumer of 
> StoreContext that used throttled executor provided by StoreContext, which 
> internally uses fs.s3a.executor.capacity to determine executor capacity for 
> SemaphoredDelegatingExecutor. With the removal of s3guard from s3a, we should 
> also remove fs.s3a.executor.capacity and it's usages as it's no longer being 
> used by any StoreContext consumers. The config's existence and its 
> description can be really confusing for the users.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-18435) Remove usage of fs.s3a.executor.capacity

2022-09-06 Thread Mukund Thakur (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-18435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17600957#comment-17600957
 ] 

Mukund Thakur commented on HADOOP-18435:


Yes it will be used in future for Vectored IO stuff. Jira here 
https://issues.apache.org/jira/browse/HADOOP-18347

 

> Remove usage of fs.s3a.executor.capacity
> 
>
> Key: HADOOP-18435
> URL: https://issues.apache.org/jira/browse/HADOOP-18435
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Reporter: Viraj Jasani
>Assignee: Viraj Jasani
>Priority: Major
>
> When s3guard was part of s3a, DynamoDBMetadataStore was the only consumer of 
> StoreContext that used throttled executor provided by StoreContext, which 
> internally uses fs.s3a.executor.capacity to determine executor capacity for 
> SemaphoredDelegatingExecutor. With the removal of s3guard from s3a, we should 
> also remove fs.s3a.executor.capacity and it's usages as it's no longer being 
> used by any StoreContext consumers. The config's existence and its 
> description can be really confusing for the users.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-18435) Remove usage of fs.s3a.executor.capacity

2022-09-01 Thread Viraj Jasani (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-18435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17599164#comment-17599164
 ] 

Viraj Jasani commented on HADOOP-18435:
---

The way I came to know about this issue is by testing hbase on s3 (hbase WAL on 
HDFS and actual HFiles on S3, accessed using S3A). I just wanted to see how 
much perf improvement could be achieved by tuning fs.s3a.executor.capacity. I 
tried with small bump, and then eventually very high bump but no significant 
difference was observed. When I looked into the usages of 
fs.s3a.executor.capacity, realized that it's not being used anywhere as per the 
above comment.

Just to confirm that I have not missed any usecase, I provided value 0 to 
fs.s3a.executor.capacity (adjusted min value from 1 to 0), and ran all S3A 
tests with -Dscale, and not a single test failed. On the other hand, HBase 
doesn't face any issues either.

> Remove usage of fs.s3a.executor.capacity
> 
>
> Key: HADOOP-18435
> URL: https://issues.apache.org/jira/browse/HADOOP-18435
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Reporter: Viraj Jasani
>Assignee: Viraj Jasani
>Priority: Major
>
> When s3guard was part of s3a, DynamoDBMetadataStore was the only consumer of 
> StoreContext that used throttled executor provided by StoreContext, which 
> internally uses fs.s3a.executor.capacity to determine executor capacity for 
> SemaphoredDelegatingExecutor. With the removal of s3guard from s3a, we should 
> also remove fs.s3a.executor.capacity and it's usages as it's no longer being 
> used by any StoreContext consumers. The config's existence and its 
> description can be really confusing for the users.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-18435) Remove usage of fs.s3a.executor.capacity

2022-09-01 Thread Viraj Jasani (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-18435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17599093#comment-17599093
 ] 

Viraj Jasani commented on HADOOP-18435:
---

To be more specific, this method is no longer in use, making this config not 
usable by any consumer. That's why having the config's existence in codebase as 
well as default site config would make any user try configuring it's value but 
it won't have any impact in reality.
{code:java}
/**
 * Create a new executor with the capacity defined in
 * {@link #executorCapacity}.
 * @return a new executor for exclusive use by the caller.
 */
public ExecutorService createThrottledExecutor() {
  return createThrottledExecutor(executorCapacity);
} {code}

> Remove usage of fs.s3a.executor.capacity
> 
>
> Key: HADOOP-18435
> URL: https://issues.apache.org/jira/browse/HADOOP-18435
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Reporter: Viraj Jasani
>Assignee: Viraj Jasani
>Priority: Major
>
> When s3guard was part of s3a, DynamoDBMetadataStore was the only consumer of 
> StoreContext that used throttled executor provided by StoreContext, which 
> internally uses fs.s3a.executor.capacity to determine executor capacity for 
> SemaphoredDelegatingExecutor. With the removal of s3guard from s3a, we should 
> also remove fs.s3a.executor.capacity and it's usages as it's no longer being 
> used by any StoreContext consumers. The config's existence and its 
> description can be really confusing for the users.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-18435) Remove usage of fs.s3a.executor.capacity

2022-09-01 Thread Viraj Jasani (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-18435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17598856#comment-17598856
 ] 

Viraj Jasani commented on HADOOP-18435:
---

Executor is available in store context, what I meant to say is that the config 
fs.s3a.executor.capacity is not being used anywhere. As of today, the 
description of fs.s3a.executor.capacity says that it represents the capacity of 
executor queues for operations other than block upload, but in reality the 
config is not being used anywhere.

On the other hand, fs.s3a.fast.upload.active.blocks is being used as permit 
count in SemaphoredDelegatingExecutor by S3ABlockOutputStream.

> Remove usage of fs.s3a.executor.capacity
> 
>
> Key: HADOOP-18435
> URL: https://issues.apache.org/jira/browse/HADOOP-18435
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Reporter: Viraj Jasani
>Assignee: Viraj Jasani
>Priority: Major
>
> When s3guard was part of s3a, DynamoDBMetadataStore was the only consumer of 
> StoreContext that used throttled executor provided by StoreContext, which 
> internally uses fs.s3a.executor.capacity to determine executor capacity for 
> SemaphoredDelegatingExecutor. With the removal of s3guard from s3a, we should 
> also remove fs.s3a.executor.capacity and it's usages as it's no longer being 
> used by any StoreContext consumers. The config's existence and its 
> description can be really confusing for the users.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-18435) Remove usage of fs.s3a.executor.capacity

2022-09-01 Thread Steve Loughran (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-18435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17598838#comment-17598838
 ] 

Steve Loughran commented on HADOOP-18435:
-

1. we do need an executor in store context
2. and the prefeching and vector code both need to be using a bounded thread 
pool; outstanding JIRAs there

> Remove usage of fs.s3a.executor.capacity
> 
>
> Key: HADOOP-18435
> URL: https://issues.apache.org/jira/browse/HADOOP-18435
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Reporter: Viraj Jasani
>Assignee: Viraj Jasani
>Priority: Major
>
> When s3guard was part of s3a, DynamoDBMetadataStore was the only consumer of 
> StoreContext that used throttled executor provided by StoreContext, which 
> internally uses fs.s3a.executor.capacity to determine executor capacity for 
> SemaphoredDelegatingExecutor. With the removal of s3guard from s3a, we should 
> also remove fs.s3a.executor.capacity and it's usages as it's no longer being 
> used by any StoreContext consumers. The config's existence and its 
> description can be really confusing for the users.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org