[jira] [Commented] (HADOOP-16618) increase the default number of threads and http connections in S3A

2022-01-16 Thread Steve Loughran (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-16618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17476818#comment-17476818
 ] 

Steve Loughran commented on HADOOP-16618:
-

* there's a fixed thread pool and an unlimited pool, both for doing work on 
behalf of calling threads
* all http io which takes place in blocking calls is in the threads calling in.

if you have a hive or spark worker process with 128 threads, it'll be using at 
least that many when reading data; when writing blocks can be queued for async 
upload job commit in the s3a committer is another many-thread operation

> increase the default number of threads and http connections in S3A
> --
>
> Key: HADOOP-16618
> URL: https://issues.apache.org/jira/browse/HADOOP-16618
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.2.1
>Reporter: Steve Loughran
>Priority: Major
>
> Enable bigger thread and http pools in the S3A connector, especially now that 
> the transfer manager is doing parallel block transfer, as is rename()
> We can make a lot more with parallelism in a single thread, and for 
> applications with multiple worker threads, we need bigger defaults



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-16618) increase the default number of threads and http connections in S3A

2022-01-14 Thread Ahmar Suhail (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-16618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17476069#comment-17476069
 ] 

Ahmar Suhail commented on HADOOP-16618:
---

I started looking at this ticket and the default values are currently:
DEFAULT_MAXIMUM_CONNECTIONS = 96 
DEFAULT_MAX_THREADS = 10
DEFAULT_SOCKET_SEND_BUFFER = 8 * 1024 
I was wondering if you had any ideas for what the new numbers should be or what 
we can do to find these numbers? I tried tweaking the values and ran a couple 
of performance tests, see [concurrent 
renames|https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/scale/ITestS3AConcurrentOps.java#L164]
 and [read 
file|https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/scale/ITestS3AInputStreamPerformance.java#L244],
 but I don't know if that gives us enough data. 

Also wanted to understand what advantage having a connection pool of size 96 
gives? From what I understand, if we have max_threads to 10, we'll only ever 
use a max of 10 connections?

 

> increase the default number of threads and http connections in S3A
> --
>
> Key: HADOOP-16618
> URL: https://issues.apache.org/jira/browse/HADOOP-16618
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.2.1
>Reporter: Steve Loughran
>Priority: Major
>
> Enable bigger thread and http pools in the S3A connector, especially now that 
> the transfer manager is doing parallel block transfer, as is rename()
> We can make a lot more with parallelism in a single thread, and for 
> applications with multiple worker threads, we need bigger defaults



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-16618) increase the default number of threads and http connections in S3A

2021-01-06 Thread Steve Loughran (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-16618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17259830#comment-17259830
 ] 

Steve Loughran commented on HADOOP-16618:
-

also: socket buffer sizes

> increase the default number of threads and http connections in S3A
> --
>
> Key: HADOOP-16618
> URL: https://issues.apache.org/jira/browse/HADOOP-16618
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.2.1
>Reporter: Steve Loughran
>Priority: Major
>
> Enable bigger thread and http pools in the S3A connector, especially now that 
> the transfer manager is doing parallel block transfer, as is rename()
> We can make a lot more with parallelism in a single thread, and for 
> applications with multiple worker threads, we need bigger defaults



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org