[ 
https://issues.apache.org/jira/browse/HADOOP-4830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12659087#action_12659087
 ] 

Hemanth Yamijala commented on HADOOP-4830:
------------------------------------------

Some comments:

ControlledMapReduceJob:
- All paths should be created relative to the build directory. Something like 
new Path(System.getProperty("test.build.data","/tmp"), "signalFileDir-...")
- Do we really need to create the temp file ? Is it only for creating a unique 
random number. Can we use Random for the same ?
- Rather than split and get the task id, we can use the TaskID classes to get 
the same. The hierarchy extends to ID, which will give you the number. Also, 
rather than call it TaskID which has a specific meaning, can we call it 
taskNumber or something.
- getTasksCounts: can't we directly use finishedMaps or finishedReduces 
- assertNTasksRunningAtSteadyState: The 5 seconds time limit brings in timing 
dependencies that should be avoided if we can. Ideally if we can check that two 
consecutive heartbeat cycles don't change the running counts, that should be 
enough. Can we check the state of the JT or the scheduler to get this 
information ?

ClusterWithCapacityScheduler:
- Write the capacity scheduler configuration in a path relative to 
test.build.data
- The default values for the job initialization related properties are fixed. 
So this can be removed now.
- Please review the Log level of the log statements. I think some of them will 
be too verbose for an INFO level. For e.g. what keys we're writing to the 
scheduler conf.
- Please have a more clear comment on why 
fs.getRawFileSystem().setConf(config); is required after setting it on the 
local file system object.

TestClusterWithCapacityScheduler doesn't seem specifically needed. A lot of the 
tests will exercise this and it will be very obvious if it doesn't work. Unlike 
the TestControlledMapReduceJob which is a simple test that can be easily 
verified for correctness with the default scheduler.


TestQueueCapacities:
- Since we're trying to use MiniMR, can we by default have more than 1 
tasktracker - like 4 or something, and suitably scale all the task counts. 
Likewise, I also think number of reduces should be non-zero for most cases. I 
think this would make it closer to reality. And since we're using controlled 
execution, it would really not make a difference to the test logic, right ?
- I think we fixed reclaimcapacity time limit to be in seconds, rather than 
milliseconds.
- Related to multiple queue tests, can we also have a test where jobs are 
submitted to different queues - all below the queue's capacity, and make sure 
they are all running. This will exercise some specific code paths related to 
job initialization, considering multiple queues for scheduling jobs etc.

> Have end to end tests based on MiniMRCluster to verify that queue capacities 
> are honoured.
> ------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-4830
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4830
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: contrib/capacity-sched
>            Reporter: Vinod K V
>            Assignee: Vinod K V
>         Attachments: HADOOP-4830-20081222-svn.2
>
>
> At present, we only have unit tests that make use of FakeTaskManager and that 
> only test the proper functionality of capacity scheduler in isolation. Many 
> issues unearthed recently proved that this is not enough and that it is 
> required to have end-to-end tests so that real JT is brought into the picture 
> and with that the interaction of the scheduler with JT. This issue along with 
> few other related jiras should automate and replace the end-to-end tests that 
> are now manually done by QA, using MiniMRCluster.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to