[
https://issues.apache.org/jira/browse/HADOOP-4035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12647577#action_12647577
]
Vinod K V commented on HADOOP-4035:
-----------------------------------
Some details about configuration:
- pmem denotes the limits of job and TT w.r.t physical memory
- vmem denotes the limits of job and TT w.r.t virtual memory
h3. Job Configuration:
h4. Already addressed cases:
- If pmem, and vmem are both specified, or both not specified(disabled/job
doesn't care), we use them as they are.
- If pmem is not specified but vmem is specified, the above proposal is to
adjust pmem to be a percentage(say P) of vmem.
h4. Cases not addressed:
- The proposal doesn't address the edge case when pmem is specified but vmem
is not specified. I propose that, in the similar vein as above, we adjust vmem
to be (100/P) of pmem.
h3. TT Configuration:
h4. Already addressed cases:
* If offsets for vmem and pmem are both specified,
** TT takes care of overflowing tasks itself by doing virtual memory
management,
** Scheduler uses both vmem and pmem for scheduling and using the latter
controls thrashing.
* If offsets for both vmem and pmem are not specified,
** TT doesn't care about overflowing tasks and disables virtual memory
management,
** Scheduler neglects scheduling based on vmem and pmem and cannot
attempt avoiding thrashing/task overflow.
h4. Cases not addressed:
* If offset for vmem is specified but not for pmem. We have two alternative
approaches here.
** We already calculate the total pmem reporting. So, take the pmem
offset to be zero and use the total pmem available and (vmem - vmemoffset) for
scheduling. vmemoffset by default will be zero. So, we just need another
configuration and a field in TaskTrackerStatus to specify whether vmem is
disabled/enabled. Note that today we overload total vmem and by extension
vmemoffset to also specify enabling/disabling of task memory management.
** Only use vmem for scheduling and don't care about pmem and thus make
no attempt to avoid any possible thrashing.
* If offset for pmem is specified but not for vmem. We have two alternative
approaches here also.
** We already calculate the total vmem reporting. So, take the vmem
offset to be zero and use the total vmem available and (pmem - pmemoffset) for
scheduling.
** Only use pmem for scheduling to avoid any possible thrashing, but
don't care about vmem and thus make no attempt to avoid any possible
overflowing of tasks.
I prefer the first solutions to the later in both the above cases, as we are
making our best efforts to prevent thrashing and task overflowing in those
solutions. Thoughts?
> Modify the capacity scheduler (HADOOP-3445) to schedule tasks based on memory
> requirements and task trackers free memory
> ------------------------------------------------------------------------------------------------------------------------
>
> Key: HADOOP-4035
> URL: https://issues.apache.org/jira/browse/HADOOP-4035
> Project: Hadoop Core
> Issue Type: Bug
> Components: contrib/capacity-sched
> Affects Versions: 0.19.0
> Reporter: Hemanth Yamijala
> Assignee: Vinod K V
> Priority: Blocker
> Fix For: 0.20.0
>
> Attachments: 4035.1.patch, HADOOP-4035-20080918.1.txt,
> HADOOP-4035-20081006.1.txt, HADOOP-4035-20081006.txt, HADOOP-4035-20081008.txt
>
>
> HADOOP-3759 introduced configuration variables that can be used to specify
> memory requirements for jobs, and also modified the tasktrackers to report
> their free memory. The capacity scheduler in HADOOP-3445 should schedule
> tasks based on these parameters. A task that is scheduled on a TT that uses
> more than the default amount of memory per slot can be viewed as effectively
> using more than one slot, as it would decrease the amount of free memory on
> the TT by more than the default amount while it runs. The scheduler should
> make the used capacity account for this additional usage while enforcing
> limits, etc.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.