[ 
https://issues.apache.org/jira/browse/HADOOP-16246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16817049#comment-16817049
 ] 

Steve Loughran commented on HADOOP-16246:
-----------------------------------------

It was the AWS task manager which created problems here. We'd have to review 
the lib to see if it is now safe to use in the bounded pool. 

I could imagine making it possible to set an upper limit on that 
no-longer-unbounded pool, just to catch thread overload. Then you could turn it 
on to see if deadlocks were still surfacing. Though as usual, one more config 
option == one more way to get the system misconfigured

> Unbounded thread pool maximum pool size in S3AFileSystem TransferManager
> ------------------------------------------------------------------------
>
>                 Key: HADOOP-16246
>                 URL: https://issues.apache.org/jira/browse/HADOOP-16246
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: fs/s3
>    Affects Versions: 2.8.0
>            Reporter: Greg Kinman
>            Priority: Major
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> I have something running in production that is running up on {{ulimit}} 
> trying to create {{s3a-transfer-unbounded}} threads.
> Relevant background: https://issues.apache.org/jira/browse/HADOOP-13826.
> Before that change, the thread pool used in the {{TransferManager}} had both 
> a reasonably small maximum pool size and work queue capacity.
> After that change, the thread pool has both a maximum pool size and work 
> queue capacity of {{Integer.MAX_VALUE}}.
> This seems like a pretty bad idea, because now we have, practically speaking, 
> no bound on the number of threads that might get created. I understand the 
> change was made in response to experiencing deadlocks and at the warning of 
> the documentation, which I will repeat here:
> {quote}It is not recommended to use a single threaded executor or a thread 
> pool with a bounded work queue as control tasks may submit subtasks that 
> can't complete until all sub tasks complete. Using an incorrectly configured 
> thread pool may cause a deadlock (I.E. the work queue is filled with control 
> tasks that can't finish until subtasks complete but subtasks can't execute 
> because the queue is filled).
> {quote}
> The documentation only warns against having a bounded _work queue_, not 
> against having a bounded _maximum pool size_. And this seems fine, as having 
> an unbounded work queue sounds ok. Having an unbounded maximum pool size, 
> however, does not.
> I will also note that this constructor is now deprecated and suggests using 
> {{TransferManagerBuilder}} instead, which by default creates a fixed thread 
> pool of size 10: 
> [https://github.com/aws/aws-sdk-java/blob/1.11.534/aws-java-sdk-s3/src/main/java/com/amazonaws/services/s3/transfer/internal/TransferManagerUtils.java#L59].
> I suggest we make a small change here and keep the maximum pool size at 
> {{maxThreads}}, which defaults to 10, while keeping the work queue as is 
> (unbounded).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

Reply via email to