[
https://issues.apache.org/jira/browse/HADOOP-4768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12655130#action_12655130
]
Matei Zaharia commented on HADOOP-4768:
---------------------------------------
Thanks for the clarifications, Thomas, and sorry for the late reply (I'm
currently at a conference and have little time for email). I'm still somewhat
uncomfortable with modifying scheduler source because of the kind of precedent
it sets - people are allowed to change the internal logic of several existing
schedulers to support a meta-scheduler. I imagine the Capacity Scheduler folks
might have the same concern. Changing logic in an existing scheduler is okay if
you are trying to improve the way it handles its original goal, but more
dangerous if it might hinder it in how it can change to meet this goal in the
future (or cause it to have to break the way it interacts with you). This is
why I would really like if things went through the "supported API" for the fair
and capacity scheduler, which at this point is the config file. If there are
problems with that not being reloaded frequently enough, it might be better to
improve that aspect of the schedulers. This is obviously just my feeling though
and I'd appreciate input from other people interested in scheduling.
As an example of the API point, note that both the fair and capacity schedulers
were planned in theory to have some extension points where you can plug in
classes to do certain things. For the fair scheduler, there is a way to add a
"weight adjuster" which can set job weights; I wouldn't mind providing a
similar thing for pool weights if needed. I'm not sure whether such extension
points exist in the capacity scheduler yet but I know they were part of the
plan. A related point is that it may be easier to support just one of the two
schedulers if this is worthwhile. It's somewhat unfortunate that there are two
multi-user schedulers right now but they do have different philosophies and
goals as far as I can tell from talking with Owen and Arun (the capacity
scheduler is more focused on hard guarantees and the fair scheduler on
flexibility, though these are converging somewhat).
One other thing I'd like to understand is how often you plan to be changing
allocations, and why this would cause any kind of performance degradation with
using config files (if they were reloaded fast enough, etc). I can't imagine
xpath being slow or IO being a problem if you have a reasonable number of
users, unless you are really changing the file a few times per second. Perhaps
a process external to Hadoop is not the right solution but even a metascheduler
that does not modify the APIs might work.
> Dynamic Priority Scheduler that allows queue shares to be controlled
> dynamically by a currency
> ----------------------------------------------------------------------------------------------
>
> Key: HADOOP-4768
> URL: https://issues.apache.org/jira/browse/HADOOP-4768
> Project: Hadoop Core
> Issue Type: New Feature
> Components: contrib/capacity-sched, contrib/fair-share
> Affects Versions: 0.20.0
> Reporter: Thomas Sandholm
> Assignee: Thomas Sandholm
> Fix For: 0.20.0
>
> Attachments: HADOOP-4768-capacity-scheduler.patch,
> HADOOP-4768-dynamic-scheduler.patch, HADOOP-4768-fairshare.patch,
> HADOOP-4768.patch
>
>
> Contribution based on work presented at the Hadoop User Group meeting in
> Santa Clara in September and the HadoopCamp in New Orleans in November.
> From README:
> This package implements dynamic priority scheduling for MapReduce jobs.
> Overview
> --------
> The purpose of this scheduler is to allow users to increase and decrease
> their queue priorities continuosly to meet the requirements of their
> current workloads. The scheduler is aware of the current demand and makes
> it more expensive to boost the priority under peak usage times. Thus
> users who move their workload to low usage times are rewarded with
> discounts. Priorities can only be boosted within a limited quota.
> All users are given a quota or a budget which is deducted periodically
> in configurable accounting intervals. How much of the budget is
> deducted is determined by a per-user spending rate, which may
> be modified at any time directly by the user. The cluster slots
> share allocated to a particular user is computed as that users
> spending rate over the sum of all spending rates in the same accounting
> period.
> Configuration
> -------------
> This scheduler has been designed as a meta-scheduler on top of
> existing MapReduce schedulers, which are responsible for enforcing
> shares computed by the dynamic scheduler in the cluster. Thie configuration
> of this MapReduce scheduler does not have to change when deploying
> the dynamic scheduler.
> Hadoop Configuration (e.g. hadoop-site.xml):
> mapred.jobtracker.taskScheduler This needs to be set to
>
> org.apache.hadoop.mapred.DynamicPriorityScheduler
> to use the dynamic scheduler.
> mapred.queue.names All queues managed by the dynamic
> scheduler must be listed
> here (comma separated no spaces)
> Scheduler Configuration:
> mapred.dynamic-scheduler.scheduler The Java path of the MapReduce scheduler
> that should
> enforce the allocated shares.
> Has been tested with:
> org.apache.hadoop.mapred.FairScheduler
> and
>
> org.apache.hadoop.mapred.CapacityTaskScheduler
> mapred.dynamic-scheduler.budgetfile The full OS path of the file from which
> the
> budgets are read. The synatx of this
> file is:
> <queueName> <budget>
> separated by newlines where budget can
> be specified
> as a Java float
> mapred.dynamic-scheduler.spendfile The full OS path of the file from which
> the
> user/queue spending rate is read. It
> allows
> the queue name to be placed into the path
> at runtime, e.g.:
> /home/%QUEUE%/.spending
> Only the user(s) who submit jobs to the
> specified queue should have write access
> to this file. The syntax of the file is
> just:
> <spending rate>
> where the spending rate is specified as a
> Java float. If no spending rate is
> specified
> the rate defaults to budget/1000.
> mapred.dynamic-scheduler.alloc Allocation interval, when the scheduler
> rereads the
> spending rates and recalculates the
> cluster shares.
> Specified as seconds between allocations.
> Default is 20 seconds.
> mapred.dynamic-scheduler.budgetset Boolean which is true if the budget
> should be deducted
> by the scheduler and the updated budget
> written to the
> budget file. Default is true. Setting
> this to false is
> useful if there is a tool that controls
> budgets and
> spending rates externally to the
> scheduler.
> Runtime Configuration:
> mapred.scheduler.shares The shares that should be allocated to
> the specified queue.
> The configuration property is a comma
> separated list of
> strings where the odd positioned
> elements are the
> queue names and the even positioned
> elements are the shares
> as Java floats of the preceding queue
> name. It is updated
> for all the queues atomically in each
> allocation pass. MapReduce
> schedulers such as the Fair and
> CapacityTask schedulers
> are expected to read from this property
> periodically.
> Example property value:
> "queue1,45.0,queue2,55.0"
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.