Fair scheduler fairness question
I am learning how fair scheduler manage the jobs to allow each job share resource over time; but don't know if my understanding is correct or not. My scenario is that I have 3 data nodes and the cluster is configured using fair scheduler with three pools launched (e.g. A, B, C). Each pool is configured with 'maxRunningJobs1/maxRunningJobs.' Now the clients try to submit 4 jobs (e.g. submitjob()) to 3 differt pools. For instance, the first job is submitted to pool A the second job is submitted to pool B the third job is submitted to pool B the fourth job is submitted to pool C So I expect that the first 3 jobs will occupy the free slots (the slots should be fool now.) Then the fourth job is submitted. But since the slots are full, and the fourth job should also have a slot executing its job; therefore, the third job will be terminated (or kill) so that the fourth job can be launched. Is my scenario correct? And if I am right, is there any key word searchable in the log to observe such activites (the job that is being killed e.g. the third job)? Thanks for help. I apprecaite any advice.
Re: Fair scheduler fairness question
On 3/10/10 7:38 AM, Neo Anderson javadeveloper...@yahoo.co.uk wrote: I am learning how fair scheduler manage the jobs to allow each job share resource over time; but don't know if my understanding is correct or not. My scenario is that I have 3 data nodes and the cluster is configured using fair scheduler with three pools launched (e.g. A, B, C). Each pool is configured with 'maxRunningJobs1/maxRunningJobs.' Now the clients try to submit 4 jobs (e.g. submitjob()) to 3 differt pools. For instance, the first job is submitted to pool A the second job is submitted to pool B the third job is submitted to pool B the fourth job is submitted to pool C So I expect that the first 3 jobs will occupy the free slots (the slots should be fool now.) Then the fourth job is submitted. But since the slots are full, and the fourth job should also have a slot executing its job; therefore, the third job will be terminated (or kill) so that the fourth job can be launched. Is my scenario correct? And if I am right, is there any key word searchable in the log to observe such activites (the job that is being killed e.g. the third job)? A lot of it depends upon timing. If there is a long enough pause between job 1 and job 2, job 1 will take every slot available to it. As job 1's slots finish, job 2 and 4 would get those slots. As job 2 finishes, job 3 will get its slots. Slots are only freed by force if the scheduler you are using has pre-emption. I think some versions of fair share may have it. Entire jobs are never killed.
Re: Fair scheduler fairness question
--- On Wed, 10/3/10, Allen Wittenauer awittena...@linkedin.com wrote: From: Allen Wittenauer awittena...@linkedin.com Subject: Re: Fair scheduler fairness question To: common-user@hadoop.apache.org Date: Wednesday, 10 March, 2010, 16:06 On 3/10/10 7:38 AM, Neo Anderson javadeveloper...@yahoo.co.uk wrote: I am learning how fair scheduler manage the jobs to allow each job share resource over time; but don't know if my understanding is correct or not. My scenario is that I have 3 data nodes and the cluster is configured using fair scheduler with three pools launched (e.g. A, B, C). Each pool is configured with 'maxRunningJobs1/maxRunningJobs.' Now the clients try to submit 4 jobs (e.g. submitjob()) to 3 differt pools. For instance, the first job is submitted to pool A the second job is submitted to pool B the third job is submitted to pool B the fourth job is submitted to pool C So I expect that the first 3 jobs will occupy the free slots (the slots should be fool now.) Then the fourth job is submitted. But since the slots are full, and the fourth job should also have a slot executing its job; therefore, the third job will be terminated (or kill) so that the fourth job can be launched. Is my scenario correct? And if I am right, is there any key word searchable in the log to observe such activites (the job that is being killed e.g. the third job)? A lot of it depends upon timing. If there is a long enough pause between job 1 and job 2, job 1 will take every slot available to it. As job 1's slots finish, job 2 and 4 would get those slots. As job 2 finishes, job 3 will get its slots. Slots are only freed by force if the scheduler you are using has pre-emption. I think some versions of fair share may have it. Entire jobs are never killed. At the moment I use hadoop 0.20.2 and I can not find code that relates to 'preempt' function; however, I read the jira MAPREDUCE-551 saying preempt function is already been fixed at version 0.20.0. Also, I can find some functons that relates to 'preemption' e.g. 'protected void preemptTasksIfNecessary()' in the patch. I am confused now - which function in version 0.20.2 (or 0.20.1) is used to preempt unnecessary tasks (so that slots can be freed for other tasks/ jobs to run)? Thanks you for your help.
Re: Fair scheduler fairness question
On 3/10/10 9:14 AM, Neo Anderson javadeveloper...@yahoo.co.uk wrote: At the moment I use hadoop 0.20.2 and I can not find code that relates to 'preempt' function; however, I read the jira MAPREDUCE-551 saying preempt function is already been fixed at version 0.20.0. MR-551 says fixed in 0.21 at the top. Reading the text shows that patches are available if you want to patch your own build of 0.20.
Re: Fair scheduler fairness question
On Wed, Mar 10, 2010 at 9:18 AM, Allen Wittenauer awittena...@linkedin.comwrote: On 3/10/10 9:14 AM, Neo Anderson javadeveloper...@yahoo.co.uk wrote: At the moment I use hadoop 0.20.2 and I can not find code that relates to 'preempt' function; however, I read the jira MAPREDUCE-551 saying preempt function is already been fixed at version 0.20.0. MR-551 says fixed in 0.21 at the top. Reading the text shows that patches are available if you want to patch your own build of 0.20. If you'd rather not patch your own build of Hadoop, the fair scheduler preemption feature is also available in CDH2: http://archive.cloudera.com/cdh/2/hadoop-0.20.1+169.56.tar.gz -Todd -- Todd Lipcon Software Engineer, Cloudera