[ 
https://issues.apache.org/jira/browse/HADOOP-4513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12650899#action_12650899
 ] 

Hemanth Yamijala commented on HADOOP-4513:
------------------------------------------

This is looking good. Some of my previous comments aren't incorporated though. 
If you are ok with the suggestions, can you please incorporate them. If you 
have any reason not to make them, please mention that here.

- In JobInitializationPoller, the constructor is still redundantly calling 
super.
- The purpose of Data structures such as {{jobQueues, initializedJobs and 
threadsToQueueMap}} should be documented. This is moderately complex code and 
documentation will help to maintain it easier.
- In an offline discussion with Devaraj, we figured that there is no need to 
join these threads. Hence, I think we should make these daemon threads, and not 
need to join. We can have a separate JIRA to address the same issue in the 
EagerTaskInitializer later.
- {{assignThreadsToQueue}} should be {{assignThreadsToQueues}}. 
- The comments on {{getJobsToInitialize}} are helpful. I think it would help 
even more if they are inline with the code, so one can read them while looking 
at the code itself.
- In JobInitializationThread.run, the code {{ if (!startIniting) { break; }}} 
is redundant, as this is already handled above.
- JobInitializationThread's shutdown is still called after interrupting. It 
should be before.
- Documentation is needed for getSleepInterval and getMaxWorkerThreads

I also looked at the test cases in detail. A few comments on them:
- Broadly, we should remove the sleep calls everywhere. Instead, let us test by 
waiting for the first condition that we want to verify to occur. For e.g. in 
{{testJobInitializationPoller1()}}, after the first set of jobs is submitted, 
we will wait until the size of initialized jobs reaches 6. After this, the test 
code will verify that the correct six jobs are initialized. This way, we remove 
all timing issues. In case the condition never reaches, the test will timeout 
and fail. Hence our purpose will still be served.
- There are a few methods which are being used exclusively for testing. Request 
that there be a comment on these methods that they are meant for testing. This 
will help others to take an extra look at this code if they want to use it in 
non-test code.
- For FakeJobInProgressWithLongInit, use signalling rather than busy waiting to 
signal the jobs to finish initing.
- initingJobId is not synchronized consistently.
- instead of isIniting, a better helper API is getInitingJob(queue), and do the 
verification in the tests. I can think of other reasons this API can be useful, 
e.g. for logging.
- Make capacity 100.0 in the tests if there's only one queue to prevent 
confusion. Currently it is set to 50.
- In testJobInitializationPoller3, before you set priority of the job for u4 as 
high, verify it is not in the initialized list already. Also verify that the 
size of the initialized jobs exceeds the limit, i.e. it is 7 and not 6.
- I think there should also be a steps to verify that compeleted and running 
jobs are being removed from the jobQueue in {{JobQueueManager}}.
- Finally, can we rename the tests as testJobInitialization(), 
testMultiQueueJobInitialization(), testHighPriorityJobInitialization(). I think 
that conveys more meaning.


> Capacity scheduler should initialize tasks asynchronously
> ---------------------------------------------------------
>
>                 Key: HADOOP-4513
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4513
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: contrib/capacity-sched
>    Affects Versions: 0.19.0
>            Reporter: Hemanth Yamijala
>            Assignee: Sreekanth Ramakrishnan
>             Fix For: 0.19.1
>
>         Attachments: HADOOP-4513-1.patch, HADOOP-4513-2.patch, 
> HADOOP-4513-3.patch, HADOOP-4513-4.patch
>
>
> Currently, the capacity scheduler initializes tasks on demand, as opposed to 
> the eager initialization technique used by the default scheduler. This is 
> done in order to save JT memory footprint. However, the initialization is 
> done in the {{assignTasks}} API which is not a good idea as task 
> initialization could be a time consuming operation. This JIRA is to move out 
> the initialization outside the {{assignTasks}} API and do it asynchronously.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to