[ 
https://issues.apache.org/jira/browse/HADOOP-4035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12647577#action_12647577
 ] 

Vinod K V commented on HADOOP-4035:
-----------------------------------

Some details about configuration:
 - pmem denotes the limits of job and TT w.r.t physical memory
 - vmem denotes the limits of job and TT w.r.t virtual memory

h3. Job Configuration:

h4. Already addressed cases:
 - If pmem, and vmem are both specified, or both not specified(disabled/job 
doesn't care), we use them as they are.
 - If pmem is not specified but vmem is specified, the above proposal is to 
adjust pmem to be a percentage(say P) of vmem.

h4. Cases not addressed:
 - The proposal doesn't address the edge case when pmem is specified but vmem 
is not specified. I propose that, in the similar vein as above, we adjust vmem 
to be (100/P) of pmem.

h3. TT Configuration:

h4. Already addressed cases:
 * If offsets for vmem and pmem are both specified,
      ** TT takes care of overflowing tasks itself by doing virtual memory 
management,
      ** Scheduler uses both vmem and pmem for scheduling and using the latter 
controls thrashing.
 * If offsets for both vmem and pmem are not specified,
      ** TT doesn't care about overflowing tasks and disables virtual memory 
management,
      ** Scheduler neglects scheduling based on vmem and pmem and cannot 
attempt avoiding thrashing/task overflow.

h4. Cases not addressed:

 * If offset for vmem is specified but not for pmem. We have two alternative 
approaches here.
      ** We already calculate the total pmem reporting. So, take the pmem 
offset to be zero and use the total pmem available and (vmem - vmemoffset) for 
scheduling. vmemoffset by default will be zero. So, we just need another 
configuration and a field in TaskTrackerStatus to specify whether vmem is 
disabled/enabled. Note that today we overload total vmem and by extension 
vmemoffset to also specify enabling/disabling of task memory management.
      ** Only use vmem for scheduling and don't care about pmem and thus make 
no attempt to avoid any possible thrashing.
* If offset for pmem is specified but not for vmem. We have two alternative 
approaches here also.
      ** We already calculate the total vmem reporting. So, take the vmem 
offset to be zero and use the total vmem available and (pmem - pmemoffset) for 
scheduling.
      ** Only use pmem for scheduling to avoid any possible thrashing, but 
don't care about vmem and thus make no attempt to avoid any possible 
overflowing of tasks.

I prefer the first solutions to the later in both the above cases, as we are 
making our best efforts to prevent thrashing and task overflowing in those 
solutions. Thoughts? 


> Modify the capacity scheduler (HADOOP-3445) to schedule tasks based on memory 
> requirements and task trackers free memory
> ------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-4035
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4035
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: contrib/capacity-sched
>    Affects Versions: 0.19.0
>            Reporter: Hemanth Yamijala
>            Assignee: Vinod K V
>            Priority: Blocker
>             Fix For: 0.20.0
>
>         Attachments: 4035.1.patch, HADOOP-4035-20080918.1.txt, 
> HADOOP-4035-20081006.1.txt, HADOOP-4035-20081006.txt, HADOOP-4035-20081008.txt
>
>
> HADOOP-3759 introduced configuration variables that can be used to specify 
> memory requirements for jobs, and also modified the tasktrackers to report 
> their free memory. The capacity scheduler in HADOOP-3445 should schedule 
> tasks based on these parameters. A task that is scheduled on a TT that uses 
> more than the default amount of memory per slot can be viewed as effectively 
> using more than one slot, as it would decrease the amount of free memory on 
> the TT by more than the default amount while it runs. The scheduler should 
> make the used capacity account for this additional usage while enforcing 
> limits, etc.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to