Re: Configure hadoop scheduler
Hi Merto, a) is LATE scheduler a standalone scheduler or is integrated in fairscheduler? If is standalone where to find it and which hadoop version does it support? LATE is actually part of the common scheduling code that all of the schedulers in Hadoop use, implemented in the JobInProgress class. However, as far as I know it was committed after 0.20 branched out and hasn't been incorporated in 0.20.20* yet. It's available in Hadoop 0.21, 0.22, and 0.23. b) Is it the pseudo code-concept for fair scheduling from paper Job scheduling for multi-user mapreduce cluster still the same in version 20.204 as is described in appendix A of the paper? I believe so. Actually that version of the fair scheduler is very old (2+ years old) due to branch 0.20 being ages behind, but a much more modern version is available in 0.20.205. I really recommend using 0.20.205 for a better fair scheduler. That one contains preemption, delay scheduling, etc, and has been in production use for a pretty long time in Cloudera's Distribution for Hadoop, which backported the changes to a 0.20 based version pretty early. c) Is it delay scheduling and task preemption like mentioned in your paper available in version 20.204? I' ve checked fairscheduler parameters but I did not find such options.. They're in 0.20.205 or CDH (see above). For details on how they work you may also want to check out the fair scheduler design doc in src/contrib/fairscheduler/designdoc. Matei On 20 December 2011 18:03, Prashant Kommireddi prash1...@gmail.com wrote: I am guessing you are trying to use the FairScheduler but you have specified CapacityScheduler in your configuration. You need to change mapreduce.jobtracker.scheduler to FairScheduler. Sent from my iPhone On Dec 20, 2011, at 8:51 AM, Merto Mertek masmer...@gmail.com wrote: Hi, I am having problems with changing the default hadoop scheduler (i assume that the default scheduler is a FIFO scheduler). I am following the guide located in hadoop/docs directory however I am not able to run it. Link for scheduling administration returns an http error 404 ( http://localhost:50030/scheduler ). In the UI under scheduling information I can see only one queue named default. mapred-site.xml file is accessible because when changing a port for a jobtracker I can see a daemon running with a changed port. Variable $HADOOP_CONFIG_DIR was added to .bashrc, however that did not solve the problem. I tried to rebuild hadoop, manualy place the fair scheduler jar in hadoop/lib and changed the hadoop classpath in hadoop-env.sh to point to the lib folder, but without success. The only info of the scheduler that is seen in the jobtracker log is the folowing info: Scheduler configured with (memSizeForMapSlotOnJT, memSizeForReduceSlotOnJT, limitMaxMemForMapTasks, limitMaxMemForReduceTasks) (-1, -1, -1, -1) I am working on this several days and running out of ideas... I am wondering how to fix it and where to check currently active scheduler parameters? Config files: mapred-site.xml http://pastebin.com/HmDfWqE1 allocation.xml http://pastebin.com/Uexq7uHV Tried versions: 0.20.203 and 204 Thank you
Configure hadoop scheduler
Hi, I am having problems with changing the default hadoop scheduler (i assume that the default scheduler is a FIFO scheduler). I am following the guide located in hadoop/docs directory however I am not able to run it. Link for scheduling administration returns an http error 404 ( http://localhost:50030/scheduler ). In the UI under scheduling information I can see only one queue named default. mapred-site.xml file is accessible because when changing a port for a jobtracker I can see a daemon running with a changed port. Variable $HADOOP_CONFIG_DIR was added to .bashrc, however that did not solve the problem. I tried to rebuild hadoop, manualy place the fair scheduler jar in hadoop/lib and changed the hadoop classpath in hadoop-env.sh to point to the lib folder, but without success. The only info of the scheduler that is seen in the jobtracker log is the folowing info: Scheduler configured with (memSizeForMapSlotOnJT, memSizeForReduceSlotOnJT, limitMaxMemForMapTasks, limitMaxMemForReduceTasks) (-1, -1, -1, -1) I am working on this several days and running out of ideas... I am wondering how to fix it and where to check currently active scheduler parameters? Config files: mapred-site.xml http://pastebin.com/HmDfWqE1 allocation.xml http://pastebin.com/Uexq7uHV Tried versions: 0.20.203 and 204 Thank you
Re: Configure hadoop scheduler
Are you trying to use the capacity scheduler or the fair scheduler? Your mapred-site.xml says to use the capacity scheduler but then points to a fair scheduler allocation file. Take a look at http://hadoop.apache.org/common/docs/r0.20.204.0/fair_scheduler.html for setting up the fair scheduler or http://hadoop.apache.org/common/docs/r0.20.204.0/capacity_scheduler.html for the capacity scheduler. It may also be good to remove the final stuff in mapred-site.xml. I'm not sure whether it can affect these settings but it's certainly not necessary for the scheduler settings. Matei On Dec 20, 2011, at 11:51 AM, Merto Mertek wrote: Hi, I am having problems with changing the default hadoop scheduler (i assume that the default scheduler is a FIFO scheduler). I am following the guide located in hadoop/docs directory however I am not able to run it. Link for scheduling administration returns an http error 404 ( http://localhost:50030/scheduler ). In the UI under scheduling information I can see only one queue named default. mapred-site.xml file is accessible because when changing a port for a jobtracker I can see a daemon running with a changed port. Variable $HADOOP_CONFIG_DIR was added to .bashrc, however that did not solve the problem. I tried to rebuild hadoop, manualy place the fair scheduler jar in hadoop/lib and changed the hadoop classpath in hadoop-env.sh to point to the lib folder, but without success. The only info of the scheduler that is seen in the jobtracker log is the folowing info: Scheduler configured with (memSizeForMapSlotOnJT, memSizeForReduceSlotOnJT, limitMaxMemForMapTasks, limitMaxMemForReduceTasks) (-1, -1, -1, -1) I am working on this several days and running out of ideas... I am wondering how to fix it and where to check currently active scheduler parameters? Config files: mapred-site.xml http://pastebin.com/HmDfWqE1 allocation.xml http://pastebin.com/Uexq7uHV Tried versions: 0.20.203 and 204 Thank you
Re: Configure hadoop scheduler
I am guessing you are trying to use the FairScheduler but you have specified CapacityScheduler in your configuration. You need to change mapreduce.jobtracker.scheduler to FairScheduler. Sent from my iPhone On Dec 20, 2011, at 8:51 AM, Merto Mertek masmer...@gmail.com wrote: Hi, I am having problems with changing the default hadoop scheduler (i assume that the default scheduler is a FIFO scheduler). I am following the guide located in hadoop/docs directory however I am not able to run it. Link for scheduling administration returns an http error 404 ( http://localhost:50030/scheduler ). In the UI under scheduling information I can see only one queue named default. mapred-site.xml file is accessible because when changing a port for a jobtracker I can see a daemon running with a changed port. Variable $HADOOP_CONFIG_DIR was added to .bashrc, however that did not solve the problem. I tried to rebuild hadoop, manualy place the fair scheduler jar in hadoop/lib and changed the hadoop classpath in hadoop-env.sh to point to the lib folder, but without success. The only info of the scheduler that is seen in the jobtracker log is the folowing info: Scheduler configured with (memSizeForMapSlotOnJT, memSizeForReduceSlotOnJT, limitMaxMemForMapTasks, limitMaxMemForReduceTasks) (-1, -1, -1, -1) I am working on this several days and running out of ideas... I am wondering how to fix it and where to check currently active scheduler parameters? Config files: mapred-site.xml http://pastebin.com/HmDfWqE1 allocation.xml http://pastebin.com/Uexq7uHV Tried versions: 0.20.203 and 204 Thank you