[
https://issues.apache.org/jira/browse/HADOOP-4513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12650899#action_12650899
]
Hemanth Yamijala commented on HADOOP-4513:
------------------------------------------
This is looking good. Some of my previous comments aren't incorporated though.
If you are ok with the suggestions, can you please incorporate them. If you
have any reason not to make them, please mention that here.
- In JobInitializationPoller, the constructor is still redundantly calling
super.
- The purpose of Data structures such as {{jobQueues, initializedJobs and
threadsToQueueMap}} should be documented. This is moderately complex code and
documentation will help to maintain it easier.
- In an offline discussion with Devaraj, we figured that there is no need to
join these threads. Hence, I think we should make these daemon threads, and not
need to join. We can have a separate JIRA to address the same issue in the
EagerTaskInitializer later.
- {{assignThreadsToQueue}} should be {{assignThreadsToQueues}}.
- The comments on {{getJobsToInitialize}} are helpful. I think it would help
even more if they are inline with the code, so one can read them while looking
at the code itself.
- In JobInitializationThread.run, the code {{ if (!startIniting) { break; }}}
is redundant, as this is already handled above.
- JobInitializationThread's shutdown is still called after interrupting. It
should be before.
- Documentation is needed for getSleepInterval and getMaxWorkerThreads
I also looked at the test cases in detail. A few comments on them:
- Broadly, we should remove the sleep calls everywhere. Instead, let us test by
waiting for the first condition that we want to verify to occur. For e.g. in
{{testJobInitializationPoller1()}}, after the first set of jobs is submitted,
we will wait until the size of initialized jobs reaches 6. After this, the test
code will verify that the correct six jobs are initialized. This way, we remove
all timing issues. In case the condition never reaches, the test will timeout
and fail. Hence our purpose will still be served.
- There are a few methods which are being used exclusively for testing. Request
that there be a comment on these methods that they are meant for testing. This
will help others to take an extra look at this code if they want to use it in
non-test code.
- For FakeJobInProgressWithLongInit, use signalling rather than busy waiting to
signal the jobs to finish initing.
- initingJobId is not synchronized consistently.
- instead of isIniting, a better helper API is getInitingJob(queue), and do the
verification in the tests. I can think of other reasons this API can be useful,
e.g. for logging.
- Make capacity 100.0 in the tests if there's only one queue to prevent
confusion. Currently it is set to 50.
- In testJobInitializationPoller3, before you set priority of the job for u4 as
high, verify it is not in the initialized list already. Also verify that the
size of the initialized jobs exceeds the limit, i.e. it is 7 and not 6.
- I think there should also be a steps to verify that compeleted and running
jobs are being removed from the jobQueue in {{JobQueueManager}}.
- Finally, can we rename the tests as testJobInitialization(),
testMultiQueueJobInitialization(), testHighPriorityJobInitialization(). I think
that conveys more meaning.
> Capacity scheduler should initialize tasks asynchronously
> ---------------------------------------------------------
>
> Key: HADOOP-4513
> URL: https://issues.apache.org/jira/browse/HADOOP-4513
> Project: Hadoop Core
> Issue Type: Bug
> Components: contrib/capacity-sched
> Affects Versions: 0.19.0
> Reporter: Hemanth Yamijala
> Assignee: Sreekanth Ramakrishnan
> Fix For: 0.19.1
>
> Attachments: HADOOP-4513-1.patch, HADOOP-4513-2.patch,
> HADOOP-4513-3.patch, HADOOP-4513-4.patch
>
>
> Currently, the capacity scheduler initializes tasks on demand, as opposed to
> the eager initialization technique used by the default scheduler. This is
> done in order to save JT memory footprint. However, the initialization is
> done in the {{assignTasks}} API which is not a good idea as task
> initialization could be a time consuming operation. This JIRA is to move out
> the initialization outside the {{assignTasks}} API and do it asynchronously.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.