[
https://issues.apache.org/jira/browse/HADOOP-4830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12659087#action_12659087
]
Hemanth Yamijala commented on HADOOP-4830:
------------------------------------------
Some comments:
ControlledMapReduceJob:
- All paths should be created relative to the build directory. Something like
new Path(System.getProperty("test.build.data","/tmp"), "signalFileDir-...")
- Do we really need to create the temp file ? Is it only for creating a unique
random number. Can we use Random for the same ?
- Rather than split and get the task id, we can use the TaskID classes to get
the same. The hierarchy extends to ID, which will give you the number. Also,
rather than call it TaskID which has a specific meaning, can we call it
taskNumber or something.
- getTasksCounts: can't we directly use finishedMaps or finishedReduces
- assertNTasksRunningAtSteadyState: The 5 seconds time limit brings in timing
dependencies that should be avoided if we can. Ideally if we can check that two
consecutive heartbeat cycles don't change the running counts, that should be
enough. Can we check the state of the JT or the scheduler to get this
information ?
ClusterWithCapacityScheduler:
- Write the capacity scheduler configuration in a path relative to
test.build.data
- The default values for the job initialization related properties are fixed.
So this can be removed now.
- Please review the Log level of the log statements. I think some of them will
be too verbose for an INFO level. For e.g. what keys we're writing to the
scheduler conf.
- Please have a more clear comment on why
fs.getRawFileSystem().setConf(config); is required after setting it on the
local file system object.
TestClusterWithCapacityScheduler doesn't seem specifically needed. A lot of the
tests will exercise this and it will be very obvious if it doesn't work. Unlike
the TestControlledMapReduceJob which is a simple test that can be easily
verified for correctness with the default scheduler.
TestQueueCapacities:
- Since we're trying to use MiniMR, can we by default have more than 1
tasktracker - like 4 or something, and suitably scale all the task counts.
Likewise, I also think number of reduces should be non-zero for most cases. I
think this would make it closer to reality. And since we're using controlled
execution, it would really not make a difference to the test logic, right ?
- I think we fixed reclaimcapacity time limit to be in seconds, rather than
milliseconds.
- Related to multiple queue tests, can we also have a test where jobs are
submitted to different queues - all below the queue's capacity, and make sure
they are all running. This will exercise some specific code paths related to
job initialization, considering multiple queues for scheduling jobs etc.
> Have end to end tests based on MiniMRCluster to verify that queue capacities
> are honoured.
> ------------------------------------------------------------------------------------------
>
> Key: HADOOP-4830
> URL: https://issues.apache.org/jira/browse/HADOOP-4830
> Project: Hadoop Core
> Issue Type: Bug
> Components: contrib/capacity-sched
> Reporter: Vinod K V
> Assignee: Vinod K V
> Attachments: HADOOP-4830-20081222-svn.2
>
>
> At present, we only have unit tests that make use of FakeTaskManager and that
> only test the proper functionality of capacity scheduler in isolation. Many
> issues unearthed recently proved that this is not enough and that it is
> required to have end-to-end tests so that real JT is brought into the picture
> and with that the interaction of the scheduler with JT. This issue along with
> few other related jiras should automate and replace the end-to-end tests that
> are now manually done by QA, using MiniMRCluster.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.