[ 
https://issues.apache.org/jira/browse/TEZ-1187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15933832#comment-15933832
 ] 

Siddharth Seth commented on TEZ-1187:
-------------------------------------

The main intent of the jira was to make sure that fetcher threads get re-used, 
since quite a few of these are created per Input, and then discarded after 
completion. Also, There's control over num threads per input, but not a total 
thread count for fetchers, and the number of inputs varies by query / stage of 
a query. There isn't really any co-ordination to limit the count (Note: nothing 
stops malicious code from creating a large number of threads).
There can, of course, be other Inputs/Outputs which create a large number of 
threads and discard them - this would be useful in such scenarios as well.

Think there's 2 major use cases.
1) 1 task per container. Container may or may not be re-used - Regular Tez Job 
Execution
2) Multiple parallel tasks per container - typically external services, but 
support could be added directly in Tez at some point (exists in 
tez-ext-services-tests)

Within each task, the following threads exist
1) Framework threads (used to initialize inputs/outputs, RPC thread to the AM, 
EventRoutingThreads)
2) Framework thread - which runs the actual task
3) User level threads - created by a Task. Can be created by the Processor, or 
any of the Input(s) / Output(s) which run as part of the task. The fetcher 
threads fall into this category.

Initially, I don't think we should try replacing any of the framework threads. 
Exposing the ExecutorService to user code (*Context), and let select parts of 
the user code make use of these (mostly the Fetchers - which can be a separate 
jira).

After looking at the patch, and some offline discussion with [~harishjp], 
there's several things that need to be considered (and handled eventually)
1) Should threads be limited per task? If so, is there a model by which a task 
can shared threads between it's executors? (Something like the 
MemoryDistributor to share available memory?)
2) Should threads be limited per Input/Processor/Output belonging to a single 
task
3) Is there an overall thread limit
4) What happens if the overall thread limit is hit. It's very likely that a 
task will hang if some of it's Inputs / Outputs get access to shared threads, 
while others do not.
5) How will threads be oversubscribed.

Think the initial model is to create as many threads as requested, and keep 
them around for some time so that they can be re-used. This works nicely - no 
worse than the thread count today, and re-uses threads. Probably the best 
approach to get started, and possibly for the longer term.

Another thing that eventually needs to be looked at, is the security aspects. 
Under what UGI / security policies to individual threads run. With ThreadPools, 
I believe they are created under the running UGI. The UGI does not get updated 
though. 

On the patch.
- In terms of the API. If the interface accepts a thread name, it should accept 
a pattern instead. Maybe skip thread naming, and leave that responsibility to 
the user thread? Also number of threads, skip this from the parameter list?
- Convert TezSharedExecutor into an interface. Different implementations for 
single container single task, single container, multiple task, tests etc.
- TestTaskExecution2 has far more changes than required. It doesn't need to use 
the TezExecutorService.
- Avoid using the TezSharedExecutor for framework threads. (Changes to 
LogicalIOProcessorRuntimeTask, etc)

One more change which would be very useful for single container, multiple 
executors - is to track resources usage per task. This could be handled by 
using the SharedExecutor / or a wrapped executor for framework threads and 
linking them to a task. SharedExecutor with ( Framework threads + User threads) 
could potentially do this. Alternately just a simple wrapped executor. Anyway - 
that's a separate jira.

> Share Thread pools between different tasks
> ------------------------------------------
>
>                 Key: TEZ-1187
>                 URL: https://issues.apache.org/jira/browse/TEZ-1187
>             Project: Apache Tez
>          Issue Type: Sub-task
>            Reporter: Siddharth Seth
>            Assignee: Harish Jaiprakash
>         Attachments: TEZ-1187.02.patch, TEZ-1187.03.patch, TEZ-1187.04.patch, 
> TEZ-1187.WIP.01.patch
>
>
> Thread pools are used all over for fetchers, sort etc. When running a single 
> task - this is already a problem, and gets worse when running multiple tasks.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to