Best way to limit the number of concurrent tasks per job on hadoop 0.20.2
Hi, we would like to limit the number of maximum tasks per job on our hadoop 0.20.2 cluster. Is the Capacity Scheduler [1] will allow to do this ? Is it correctly working on hadoop 0.20.2 (I remember a few months ago, we were looking at it, but it seemed incompatible with hadoop 0.20.2). [1] http://hadoop.apache.org/common/docs/r0.20.2/capacity_scheduler.html Regards, -- Renaud Delbru
Re: Best way to limit the number of concurrent tasks per job on hadoop 0.20.2
Capacity Scheduler (or a version of it) does ship with the 0.20 release of Hadoop and is usable. It can be used to assign queues with a limited capacity for each, which your jobs must appropriately submit to if you want them to utilize only the assigned fraction of your cluster for its processing. On Tue, Jan 25, 2011 at 5:19 PM, Renaud Delbru wrote: > Hi, > > we would like to limit the number of maximum tasks per job on our hadoop > 0.20.2 cluster. > Is the Capacity Scheduler [1] will allow to do this ? Is it correctly > working on hadoop 0.20.2 (I remember a few months ago, we were looking at > it, but it seemed incompatible with hadoop 0.20.2). > > [1] http://hadoop.apache.org/common/docs/r0.20.2/capacity_scheduler.html > > Regards, > -- > Renaud Delbru > -- Harsh J www.harshj.com
Re: Best way to limit the number of concurrent tasks per job on hadoop 0.20.2
Our experience with the Capacity Scheduler was not what we expected and what you describe. But, it might be due to a wrong comprehension of the configuration parameters. The problem is the following: mapred.capacity-scheduler.queue..capacity: Percentage of the number of slots in the cluster that are *guaranteed* to be available for jobs in this queue. mapred.capacity-scheduler.queue..minimum-user-limit-percent: Each queue enforces a limit on the percentage of resources allocated to a user at any given time, if *there is competition for them*. So, in fact, it seems that if there is no competition, and that the cluster is fully available, the scheduler will assign the full cluster to the job, and will not limit the number of concurrent tasks. It seemed to us that the only way to enforce a strong limit was to use the Fair Scheduler of hadoop 0.21.0 which includes a new configuration parameters 'maxMaps'. Am I right, or did we miss something ? cheers -- Renaud Delbru On 25/01/11 15:20, Harsh J wrote: Capacity Scheduler (or a version of it) does ship with the 0.20 release of Hadoop and is usable. It can be used to assign queues with a limited capacity for each, which your jobs must appropriately submit to if you want them to utilize only the assigned fraction of your cluster for its processing. On Tue, Jan 25, 2011 at 5:19 PM, Renaud Delbru wrote: Hi, we would like to limit the number of maximum tasks per job on our hadoop 0.20.2 cluster. Is the Capacity Scheduler [1] will allow to do this ? Is it correctly working on hadoop 0.20.2 (I remember a few months ago, we were looking at it, but it seemed incompatible with hadoop 0.20.2). [1] http://hadoop.apache.org/common/docs/r0.20.2/capacity_scheduler.html Regards, -- Renaud Delbru
Re: Best way to limit the number of concurrent tasks per job on hadoop 0.20.2
No, that is right. I did not assume that it was a very strict slot limit you were looking to impose for your jobs. On Tue, Jan 25, 2011 at 9:27 PM, Renaud Delbru wrote: > Our experience with the Capacity Scheduler was not what we expected and what > you describe. But, it might be due to a wrong comprehension of the > configuration parameters. > The problem is the following: > mapred.capacity-scheduler.queue..capacity: Percentage of the > number of slots in the cluster that are *guaranteed* to be available for > jobs in this queue. > mapred.capacity-scheduler.queue..minimum-user-limit-percent: > Each queue enforces a limit on the percentage of resources allocated to a > user at any given time, if *there is competition for them*. > > So, in fact, it seems that if there is no competition, and that the cluster > is fully available, the scheduler will assign the full cluster to the job, > and will not limit the number of concurrent tasks. It seemed to us that the > only way to enforce a strong limit was to use the Fair Scheduler of hadoop > 0.21.0 which includes a new configuration parameters 'maxMaps'. > > Am I right, or did we miss something ? > > cheers > -- > Renaud Delbru > > On 25/01/11 15:20, Harsh J wrote: >> >> Capacity Scheduler (or a version of it) does ship with the 0.20 >> release of Hadoop and is usable. It can be used to assign queues with >> a limited capacity for each, which your jobs must appropriately submit >> to if you want them to utilize only the assigned fraction of your >> cluster for its processing. >> >> On Tue, Jan 25, 2011 at 5:19 PM, Renaud Delbru >> wrote: >>> >>> Hi, >>> >>> we would like to limit the number of maximum tasks per job on our hadoop >>> 0.20.2 cluster. >>> Is the Capacity Scheduler [1] will allow to do this ? Is it correctly >>> working on hadoop 0.20.2 (I remember a few months ago, we were looking >>> at >>> it, but it seemed incompatible with hadoop 0.20.2). >>> >>> [1] http://hadoop.apache.org/common/docs/r0.20.2/capacity_scheduler.html >>> >>> Regards, >>> -- >>> Renaud Delbru >>> >> >> > > -- Harsh J www.harshj.com
Re: Best way to limit the number of concurrent tasks per job on hadoop 0.20.2
As it seems that the capacity and fair schedulers in hadoop 0.20.2 do not allow a hard upper limit in number of concurrent tasks, do anybody know any other solutions to achieve this ? -- Renaud Delbru On 25/01/11 11:49, Renaud Delbru wrote: Hi, we would like to limit the number of maximum tasks per job on our hadoop 0.20.2 cluster. Is the Capacity Scheduler [1] will allow to do this ? Is it correctly working on hadoop 0.20.2 (I remember a few months ago, we were looking at it, but it seemed incompatible with hadoop 0.20.2). [1] http://hadoop.apache.org/common/docs/r0.20.2/capacity_scheduler.html Regards,
Re: Best way to limit the number of concurrent tasks per job on hadoop 0.20.2
Hi Renaud, Hopefully it'll be in 0.20-security branch that Arun is trying to push. Related (very abstract) Jira. https://issues.apache.org/jira/browse/MAPREDUCE-1872 Koji On 1/25/11 12:48 PM, "Renaud Delbru" wrote: As it seems that the capacity and fair schedulers in hadoop 0.20.2 do not allow a hard upper limit in number of concurrent tasks, do anybody know any other solutions to achieve this ? -- Renaud Delbru On 25/01/11 11:49, Renaud Delbru wrote: > Hi, > > we would like to limit the number of maximum tasks per job on our > hadoop 0.20.2 cluster. > Is the Capacity Scheduler [1] will allow to do this ? Is it correctly > working on hadoop 0.20.2 (I remember a few months ago, we were > looking at it, but it seemed incompatible with hadoop 0.20.2). > > [1] http://hadoop.apache.org/common/docs/r0.20.2/capacity_scheduler.html > > Regards,
Re: Best way to limit the number of concurrent tasks per job on hadoop 0.20.2
Hi Koji, thanks for sharing the information, Is the 0.20-security branch planned to be a official release at some point ? Cheers -- Renaud Delbru On 27/01/11 01:50, Koji Noguchi wrote: Hi Renaud, Hopefully it’ll be in 0.20-security branch that Arun is trying to push. Related (very abstract) Jira. https://issues.apache.org/jira/browse/MAPREDUCE-1872 Koji On 1/25/11 12:48 PM, "Renaud Delbru" wrote: As it seems that the capacity and fair schedulers in hadoop 0.20.2 do not allow a hard upper limit in number of concurrent tasks, do anybody know any other solutions to achieve this ? -- Renaud Delbru On 25/01/11 11:49, Renaud Delbru wrote: > Hi, > > we would like to limit the number of maximum tasks per job on our > hadoop 0.20.2 cluster. > Is the Capacity Scheduler [1] will allow to do this ? Is it correctly > working on hadoop 0.20.2 (I remember a few months ago, we were > looking at it, but it seemed incompatible with hadoop 0.20.2). > > [1] http://hadoop.apache.org/common/docs/r0.20.2/capacity_scheduler.html > > Regards,
Re: Best way to limit the number of concurrent tasks per job on hadoop 0.20.2
On 27/01/11 10:51, Renaud Delbru wrote: Hi Koji, thanks for sharing the information, Is the 0.20-security branch planned to be a official release at some point ? Cheers If you can play with the beta you can see that it works for you and if not, get bugs fixed during the beta cycle http://people.apache.org/~acmurthy/hadoop-0.20.100-rc0/
Re: Best way to limit the number of concurrent tasks per job on hadoop 0.20.2
Thanks, we will try to test it next week. -- Renaud Delbru On 27/01/11 11:31, Steve Loughran wrote: On 27/01/11 10:51, Renaud Delbru wrote: Hi Koji, thanks for sharing the information, Is the 0.20-security branch planned to be a official release at some point ? Cheers If you can play with the beta you can see that it works for you and if not, get bugs fixed during the beta cycle http://people.apache.org/~acmurthy/hadoop-0.20.100-rc0/
Re: Best way to limit the number of concurrent tasks per job on hadoop 0.20.2
On Jan 25, 2011, at 12:48 PM, Renaud Delbru wrote: > As it seems that the capacity and fair schedulers in hadoop 0.20.2 do not > allow a hard upper limit in number of concurrent tasks, do anybody know any > other solutions to achieve this ? The specific change for capacity scheduler has been backported to 0.20.2 as part of https://issues.apache.org/jira/browse/MAPREDUCE-1105 . Note that you'll also need https://issues.apache.org/jira/browse/MAPREDUCE-1160 which fixes a logging bug in the JobTracker. Otherwise your logs will fill up.
Re: Best way to limit the number of concurrent tasks per job on hadoop 0.20.2
Hi Allen, thanks for pointing this out. On 28/01/11 17:34, Allen Wittenauer wrote: As it seems that the capacity and fair schedulers in hadoop 0.20.2 do not allow a hard upper limit in number of concurrent tasks, do anybody know any other solutions to achieve this ? The specific change for capacity scheduler has been backported to 0.20.2 as part of https://issues.apache.org/jira/browse/MAPREDUCE-1105 . Note that you'll also need https://issues.apache.org/jira/browse/MAPREDUCE-1160 which fixes a logging bug in the JobTracker. Otherwise your logs will fill up. -- Renaud Delbru