[
https://issues.apache.org/jira/browse/HADOOP-4035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12650991#action_12650991
]
Hemanth Yamijala commented on HADOOP-4035:
------------------------------------------
Comments on the scheduling portion:
- The config value for % of VMEM for RAM should be in capacity scheduler's conf
- no ? I am OK leaving it in JobConf as well. But in comments above, it seemed
like we were going to define this in the CapacityScheduler config.
- killJobsWithInvalidRequirements is looking at every single job in the waiting
list of jobs in a queue. This would be a very costly operation. I think
instead, just before a scheduler considers a job for scheduling, it should
check for this, maybe by looking at the count of running maps / reduces to
decide if it's looking at it the first time.
- In getTaskFromJob, we are using a switch on Type being Map/Reduce. We should
instead use the pattern of defining an abstract method in TaskSchedulingMgr,
and implementing it appropriately for Map/Reduce in the specific type of
TaskSchedulingMgr. So, pendingTasks would be an API in TaskSchedulingMgr.
- If scheduling based on memory is disabled, we are logging at INFO level a
statement regarding this. This might get to be too verbose since most jobs
aren't going to specify any memory requirements. Switch to a debug level.
- Should we cache the value for the jobId -> value for running jobs. It will be
low on memory and will be more performant maybe if we use a HashMap. Maybe we
should consider this after benchmarking the capacity scheduler with and without
this patch.
- We should definitely cache the value of the JobInProgress across computation
of reservedPmem and reservedVmem. Actually the lookup is done thrice - twice in
getting reservedPmem.
- Rename usedUpVmem to getVmemReservedForTasks(taskTracker), likewise for
usedUpPmem.
- isSchedulingBasedOnMemoryEnabled - for consistency, rename to
isSchedulingBasedOnVmemEnabled
- It seems a little complicated to have the decision to lookup a job in the
queue, or another queue to be in TaskLookupResult. I think a simpler model
would be for the TaskLookupResult to only state the *reason* why it did not
find a task, and then let callers decide what to do next. This will make it
simpler to change scheduling decisions if required later.
- For better readability, I am thinking if all the memory related scheduling
should be moved to a separate class that will be used by the
CapacityTaskScheduler, maybe a class like MemoryMatcher. This can have a
package private method boolean matchesMemoryRequirements(JobInProgress jip,
TaskTrackerStatus ttStatus). Essentially move over everything starting from
TTHasEnoughMemoryForJob to this class. In future, if these algorithms seem
reusable this class can move out of the capacity scheduler into mapred itself.
> Modify the capacity scheduler (HADOOP-3445) to schedule tasks based on memory
> requirements and task trackers free memory
> ------------------------------------------------------------------------------------------------------------------------
>
> Key: HADOOP-4035
> URL: https://issues.apache.org/jira/browse/HADOOP-4035
> Project: Hadoop Core
> Issue Type: Bug
> Components: contrib/capacity-sched
> Affects Versions: 0.19.0
> Reporter: Hemanth Yamijala
> Assignee: Vinod K V
> Priority: Blocker
> Fix For: 0.20.0
>
> Attachments: 4035.1.patch, HADOOP-4035-20080918.1.txt,
> HADOOP-4035-20081006.1.txt, HADOOP-4035-20081006.txt,
> HADOOP-4035-20081008.txt, HADOOP-4035-20081121.txt, HADOOP-4035-20081126.1.txt
>
>
> HADOOP-3759 introduced configuration variables that can be used to specify
> memory requirements for jobs, and also modified the tasktrackers to report
> their free memory. The capacity scheduler in HADOOP-3445 should schedule
> tasks based on these parameters. A task that is scheduled on a TT that uses
> more than the default amount of memory per slot can be viewed as effectively
> using more than one slot, as it would decrease the amount of free memory on
> the TT by more than the default amount while it runs. The scheduler should
> make the used capacity account for this additional usage while enforcing
> limits, etc.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.