[
https://issues.apache.org/jira/browse/HADOOP-4513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12642881#action_12642881
]
Vivek Ratan commented on HADOOP-4513:
-------------------------------------
Yes, we need to make sure jobs are initialized asynchronously (so that
initTasks() is not called synchronously from within a heartbeat) and as early
as possible (so that a job is already initialized when we consider it to run).
We also want to have just a few number of waiting jobs initialized at any given
time so that their memory footprint is low. I suggest we use an enhanced
version of EagerTaskInitializationListener, so that jobs are initialized
asynchronously in a separate thread. The difference being, we use some of the
limits described in HADOOP-4428. We can have a limit on the total number of
waiting jobs initialized (maybe 10 per queue), as well a limit on initialized
jobs/user/queue (maybe 3/per/queue). The modified
EagerTaskInitializationListener thread enforces these limits and only
initializes jobs as necessary.
> Capacity scheduler should initialize tasks asynchronously
> ---------------------------------------------------------
>
> Key: HADOOP-4513
> URL: https://issues.apache.org/jira/browse/HADOOP-4513
> Project: Hadoop Core
> Issue Type: Bug
> Components: contrib/capacity-sched
> Affects Versions: 0.19.0
> Reporter: Hemanth Yamijala
> Assignee: Sreekanth Ramakrishnan
>
> Currently, the capacity scheduler initializes tasks on demand, as opposed to
> the eager initialization technique used by the default scheduler. This is
> done in order to save JT memory footprint. However, the initialization is
> done in the {{assignTasks}} API which is not a good idea as task
> initialization could be a time consuming operation. This JIRA is to move out
> the initialization outside the {{assignTasks}} API and do it asynchronously.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.