Re: Configure hadoop scheduler

2011-12-21 Thread Matei Zaharia
Hi Merto,

 a) is LATE scheduler a standalone scheduler or is integrated in
 fairscheduler? If is standalone where to find it and which hadoop version
 does it support?

LATE is actually part of the common scheduling code that all of the schedulers 
in Hadoop use, implemented in the JobInProgress class. However, as far as I 
know it was committed after 0.20 branched out and hasn't been incorporated in 
0.20.20* yet. It's available in Hadoop 0.21, 0.22, and 0.23.

 b) Is it the pseudo code-concept for fair scheduling from paper Job
 scheduling for multi-user mapreduce cluster still the same in version
 20.204 as is described in appendix A of the paper?

I believe so. Actually that version of the fair scheduler is very old (2+ years 
old) due to branch 0.20 being ages behind, but a much more modern version is 
available in 0.20.205. I really recommend using 0.20.205 for a better fair 
scheduler. That one contains preemption, delay scheduling, etc, and has been in 
production use for a pretty long time in Cloudera's Distribution for Hadoop, 
which backported the changes to a 0.20 based version pretty early.

 c) Is it delay scheduling and task preemption like mentioned in your paper
 available in version 20.204? I' ve checked fairscheduler parameters but I
 did not find such options..

They're in 0.20.205 or CDH (see above). For details on how they work you may 
also want to check out the fair scheduler design doc in 
src/contrib/fairscheduler/designdoc.

Matei


 
 
 
 
 
 On 20 December 2011 18:03, Prashant Kommireddi prash1...@gmail.com wrote:
 
 I am guessing you are trying to use the FairScheduler but you have
 specified CapacityScheduler in your configuration. You need to change
 mapreduce.jobtracker.scheduler to FairScheduler.
 
 Sent from my iPhone
 
 On Dec 20, 2011, at 8:51 AM, Merto Mertek masmer...@gmail.com wrote:
 
 Hi,
 
 I am having problems with changing the default hadoop scheduler (i assume
 that the default scheduler is a FIFO scheduler).
 
 I am following the guide located in hadoop/docs directory however I am
 not
 able to run it.  Link for scheduling administration returns an http error
 404 ( http://localhost:50030/scheduler ). In the UI under scheduling
 information I can see only one queue named default. mapred-site.xml
 file
 is accessible because when changing a port for a jobtracker I can see a
 daemon running with a changed port. Variable $HADOOP_CONFIG_DIR was added
 to .bashrc, however that did not solve the problem. I tried to rebuild
 hadoop, manualy place the fair scheduler jar in hadoop/lib and changed
 the
 hadoop classpath in hadoop-env.sh to point to the lib folder, but without
 success. The only info of the scheduler that is seen in the jobtracker
 log
 is the folowing info:
 
 Scheduler configured with (memSizeForMapSlotOnJT,
 memSizeForReduceSlotOnJT,
 limitMaxMemForMapTasks, limitMaxMemForReduceTasks) (-1, -1, -1, -1)
 
 
 I am working on this several days and running out of ideas... I am
 wondering how to fix it and where to check currently active scheduler
 parameters?
 
 Config files:
 mapred-site.xml http://pastebin.com/HmDfWqE1
 allocation.xml http://pastebin.com/Uexq7uHV
 Tried versions: 0.20.203 and 204
 
 Thank you
 



Configure hadoop scheduler

2011-12-20 Thread Merto Mertek
Hi,

I am having problems with changing the default hadoop scheduler (i assume
that the default scheduler is a FIFO scheduler).

I am following the guide located in hadoop/docs directory however I am not
able to run it.  Link for scheduling administration returns an http error
404 ( http://localhost:50030/scheduler ). In the UI under scheduling
information I can see only one queue named default. mapred-site.xml file
is accessible because when changing a port for a jobtracker I can see a
daemon running with a changed port. Variable $HADOOP_CONFIG_DIR was added
to .bashrc, however that did not solve the problem. I tried to rebuild
hadoop, manualy place the fair scheduler jar in hadoop/lib and changed the
hadoop classpath in hadoop-env.sh to point to the lib folder, but without
success. The only info of the scheduler that is seen in the jobtracker log
is the folowing info:

Scheduler configured with (memSizeForMapSlotOnJT, memSizeForReduceSlotOnJT,
 limitMaxMemForMapTasks, limitMaxMemForReduceTasks) (-1, -1, -1, -1)


I am working on this several days and running out of ideas... I am
wondering how to fix it and where to check currently active scheduler
parameters?

Config files:
mapred-site.xml http://pastebin.com/HmDfWqE1
allocation.xml http://pastebin.com/Uexq7uHV
Tried versions: 0.20.203 and 204

Thank you


Re: Configure hadoop scheduler

2011-12-20 Thread Matei Zaharia
Are you trying to use the capacity scheduler or the fair scheduler? Your 
mapred-site.xml says to use the capacity scheduler but then points to a fair 
scheduler allocation file. Take a look at 
http://hadoop.apache.org/common/docs/r0.20.204.0/fair_scheduler.html for 
setting up the fair scheduler or 
http://hadoop.apache.org/common/docs/r0.20.204.0/capacity_scheduler.html for 
the capacity scheduler.

It may also be good to remove the final stuff in mapred-site.xml. I'm not 
sure whether it can affect these settings but it's certainly not necessary for 
the scheduler settings.

Matei

On Dec 20, 2011, at 11:51 AM, Merto Mertek wrote:

 Hi,
 
 I am having problems with changing the default hadoop scheduler (i assume
 that the default scheduler is a FIFO scheduler).
 
 I am following the guide located in hadoop/docs directory however I am not
 able to run it.  Link for scheduling administration returns an http error
 404 ( http://localhost:50030/scheduler ). In the UI under scheduling
 information I can see only one queue named default. mapred-site.xml file
 is accessible because when changing a port for a jobtracker I can see a
 daemon running with a changed port. Variable $HADOOP_CONFIG_DIR was added
 to .bashrc, however that did not solve the problem. I tried to rebuild
 hadoop, manualy place the fair scheduler jar in hadoop/lib and changed the
 hadoop classpath in hadoop-env.sh to point to the lib folder, but without
 success. The only info of the scheduler that is seen in the jobtracker log
 is the folowing info:
 
 Scheduler configured with (memSizeForMapSlotOnJT, memSizeForReduceSlotOnJT,
 limitMaxMemForMapTasks, limitMaxMemForReduceTasks) (-1, -1, -1, -1)
 
 
 I am working on this several days and running out of ideas... I am
 wondering how to fix it and where to check currently active scheduler
 parameters?
 
 Config files:
 mapred-site.xml http://pastebin.com/HmDfWqE1
 allocation.xml http://pastebin.com/Uexq7uHV
 Tried versions: 0.20.203 and 204
 
 Thank you



Re: Configure hadoop scheduler

2011-12-20 Thread Prashant Kommireddi
I am guessing you are trying to use the FairScheduler but you have
specified CapacityScheduler in your configuration. You need to change
mapreduce.jobtracker.scheduler to FairScheduler.

Sent from my iPhone

On Dec 20, 2011, at 8:51 AM, Merto Mertek masmer...@gmail.com wrote:

 Hi,

 I am having problems with changing the default hadoop scheduler (i assume
 that the default scheduler is a FIFO scheduler).

 I am following the guide located in hadoop/docs directory however I am not
 able to run it.  Link for scheduling administration returns an http error
 404 ( http://localhost:50030/scheduler ). In the UI under scheduling
 information I can see only one queue named default. mapred-site.xml file
 is accessible because when changing a port for a jobtracker I can see a
 daemon running with a changed port. Variable $HADOOP_CONFIG_DIR was added
 to .bashrc, however that did not solve the problem. I tried to rebuild
 hadoop, manualy place the fair scheduler jar in hadoop/lib and changed the
 hadoop classpath in hadoop-env.sh to point to the lib folder, but without
 success. The only info of the scheduler that is seen in the jobtracker log
 is the folowing info:

 Scheduler configured with (memSizeForMapSlotOnJT, memSizeForReduceSlotOnJT,
 limitMaxMemForMapTasks, limitMaxMemForReduceTasks) (-1, -1, -1, -1)


 I am working on this several days and running out of ideas... I am
 wondering how to fix it and where to check currently active scheduler
 parameters?

 Config files:
 mapred-site.xml http://pastebin.com/HmDfWqE1
 allocation.xml http://pastebin.com/Uexq7uHV
 Tried versions: 0.20.203 and 204

 Thank you