Fair Scheduler Problem

2012-03-06 Thread hao.wang
Hi ,All,
I encountered a problem in using Cloudera Hadoop 0.20.2-cdh3u1. When I use 
the fair Scheduler I find the scheduler seems  not support preemption.  
Can anybody tell me whether preemption is supported in this version?
This is my configration:
 mapred-site.xml   
property 
  namemapred.jobtracker.taskScheduler/name 
  valueorg.apache.hadoop.mapred.FairScheduler/value 
/property 
property
  namemapred.fairscheduler.allocation.file/name
  value/usr/lib/hadoop-0.20/conf/fair-scheduler.xml/value
/property
property
namemapred.fairscheduler.preemption/name
valuetrue/value
/property
property
namemapred.fairscheduler.preemption.only.log/name
valuetrue/value
/property
property
namemapred.fairscheduler.preemption.interval/name
value15000/value
/property
property
  namemapred.fairscheduler.weightadjuster/name
  valueorg.apache.hadoop.mapred.NewJobWeightBooster/value
/property
property
  namemapred.fairscheduler.sizebasedweight/name
  valuetrue/value
/property
fair-scheduler.xml 
allocations
   pool name=root
  minMaps10/minMaps
minReduces5/minReduces
maxMaps200/maxMaps
   maxReduces80/maxReduces
   maxRunningJobs100/maxRunningJobs
  minSharePreemptionTimeout30/minSharePreemptionTimeout
weight1.0/weight
  /pool
  pool name=hadoop
   minMaps10/minMaps
minReduces5/minReduces
   maxMaps80/maxMaps
   maxReduces80/maxReduces
maxRunningJobs5/maxRunningJobs
   minSharePreemptionTimeout30/minSharePreemptionTimeout
   weight1.0/weight
  /pool
  user name=user1
   maxRunningJobs10/maxRunningJobs
  /user
poolMaxJobsDefault20/poolMaxJobsDefault
   userMaxJobsDefault10/userMaxJobsDefault
   defaultMinSharePreemptionTimeout30/defaultMinSharePreemptionTimeout
   fairSharePreemptionTimeout30/fairSharePreemptionTimeout
/allocations

regards,

2012-03-07 



hao.wang 


Re: Re: Fair Scheduler Problem

2012-03-06 Thread hao.wang
Hi, Thanks for your reply!
I have solved this problem by setting mapred.fairscheduler.preemption.only.log 
 to false. The preemption works!
But I don't know why can not set mapred.fairscheduler.preemption.only.log  to 
true. Is it a bug?

regards,

2012-03-07 



hao.wang 



发件人: Harsh J 
发送时间: 2012-03-07  14:14:05 
收件人: common-user 
抄送: 
主题: Re: Fair Scheduler Problem 
 
Hello Hao,
Its best to submit CDH user queries to
https://groups.google.com/a/cloudera.org/group/cdh-user/topics
(cdh-u...@cloudera.org) where the majority of CDH users community
resides.
How do you determine that preemption did not/does not work? Preemption
between pools occurs if a pool's minShare isn't satisfied within
preemption-timeout seconds. In this case, it will preempt tasks from
other pools.
Your settings look alright on a high level. Does your log not carry
any preemption logs? What was your pool's share scenario when you
tried to observe if it works or not?
On Wed, Mar 7, 2012 at 8:35 AM, hao.wang hao.w...@ipinyou.com wrote:
 Hi ,All,
I encountered a problem in using Cloudera Hadoop 0.20.2-cdh3u1. When I use 
 the fair Scheduler I find the scheduler seems  not support preemption.
Can anybody tell me whether preemption is supported in this version?
This is my configration:
  mapred-site.xml
 property
  namemapred.jobtracker.taskScheduler/name
  valueorg.apache.hadoop.mapred.FairScheduler/value
 /property
 property
  namemapred.fairscheduler.allocation.file/name
  value/usr/lib/hadoop-0.20/conf/fair-scheduler.xml/value
 /property
 property
 namemapred.fairscheduler.preemption/name
 valuetrue/value
 /property
 property
 namemapred.fairscheduler.preemption.only.log/name
 valuetrue/value
 /property
 property
 namemapred.fairscheduler.preemption.interval/name
 value15000/value
 /property
 property
  namemapred.fairscheduler.weightadjuster/name
  valueorg.apache.hadoop.mapred.NewJobWeightBooster/value
 /property
 property
  namemapred.fairscheduler.sizebasedweight/name
  valuetrue/value
 /property
 fair-scheduler.xml
 allocations
   pool name=root
  minMaps10/minMaps
minReduces5/minReduces
maxMaps200/maxMaps
   maxReduces80/maxReduces
   maxRunningJobs100/maxRunningJobs
  minSharePreemptionTimeout30/minSharePreemptionTimeout
weight1.0/weight
  /pool
  pool name=hadoop
   minMaps10/minMaps
minReduces5/minReduces
   maxMaps80/maxMaps
   maxReduces80/maxReduces
maxRunningJobs5/maxRunningJobs
   minSharePreemptionTimeout30/minSharePreemptionTimeout
   weight1.0/weight
  /pool
  user name=user1
   maxRunningJobs10/maxRunningJobs
  /user
poolMaxJobsDefault20/poolMaxJobsDefault
   userMaxJobsDefault10/userMaxJobsDefault
   defaultMinSharePreemptionTimeout30/defaultMinSharePreemptionTimeout
   fairSharePreemptionTimeout30/fairSharePreemptionTimeout
 /allocations

 regards,

 2012-03-07



 hao.wang
-- 
Harsh J


Re: Re: how to set mapred.tasktracker.map.tasks.maximum and mapred.tasktracker.reduce.tasks.maximum

2012-01-10 Thread hao.wang
Hi,
Thanks for your help, your suggestion is very usefully.
I have another question that is whether the sum of maps and reduces equals 
to the total number of cores.

regards!


2012-01-10 



hao.wang 



发件人: Harsh J 
发送时间: 2012-01-10  16:44:07 
收件人: common-user 
抄送: 
主题: Re: how to set mapred.tasktracker.map.tasks.maximum and 
mapred.tasktracker.reduce.tasks.maximum 
 
Hello Hao,
Am sorry if I confused you. By CPUs I meant the CPUs visible to your OS 
(/proc/cpuinfo), so yes the total number of cores.
On 10-Jan-2012, at 12:39 PM, hao.wang wrote:
 Hi , 
 
 Thanks for your reply!
 According to your suggestion, Maybe I can't apply it to our hadoop cluster.
 Cus, each server in our hadoop cluster just contains 2 CPUs. 
 So, I think maybe you mean the core #  but not CPU # in each searver? 
 I am looking for your reply.
 
 regards!
 
 
 2012-01-10 
 
 
 
 hao.wang 
 
 
 
 发件人: Harsh J 
 发送时间: 2012-01-10  11:33:38 
 收件人: common-user 
 抄送: 
 主题: Re: how to set mapred.tasktracker.map.tasks.maximum and 
 mapred.tasktracker.reduce.tasks.maximum 
 
 Hello again,
 Try a 4:3 ratio between maps and reduces, against a total # of available CPUs 
 per node (minus one or two, for DN and HBase if you run those). Then tweak it 
 as you go (more map-only loads or more map-reduce loads, that depends on your 
 usage, and you can tweak the ratio accordingly over time -- changing those 
 props do not need JobTracker restarts, just TaskTracker).
 On 10-Jan-2012, at 8:17 AM, hao.wang wrote:
 Hi,
   Thanks for your reply!
   I had already read the pages before, can you give me sme more specific 
 suggestions about how to choose the values of  
 mapred.tasktracker.map.tasks.maximum and 
 mapred.tasktracker.reduce.tasks.maximum according to our cluster 
 configuration if possible?
 
 regards!
 
 
 2012-01-10 
 
 
 
 hao.wang 
 
 
 
 发件人: Harsh J 
 发送时间: 2012-01-09  23:19:21 
 收件人: common-user 
 抄送: 
 主题: Re: how to set mapred.tasktracker.map.tasks.maximum and 
 mapred.tasktracker.reduce.tasks.maximum 
 
 Hi,
 Please read 
 http://hadoop.apache.org/common/docs/current/single_node_setup.html to learn 
 how to configure Hadoop using the various *-site.xml configuration files, 
 and then follow 
 http://hadoop.apache.org/common/docs/current/cluster_setup.html to achieve 
 optimal configs for your cluster.
 On 09-Jan-2012, at 5:50 PM, hao.wang wrote:
 Hi ,all
  Our hadoop cluster has 22 nodes including one namenode, one jobtracker and 
 20 datanodes.
  Each node has 2 * 12 cores with 32G RAM
  Dose anyone tell me how to config following parameters:
  mapred.tasktracker.map.tasks.maximum
  mapred.tasktracker.reduce.tasks.maximum
 
 regards!
 2012-01-09 
 
 
 
 hao.wang 


how to set mapred.tasktracker.map.tasks.maximum and mapred.tasktracker.reduce.tasks.maximum

2012-01-09 Thread hao.wang
Hi ,all
Our hadoop cluster has 22 nodes including one namenode, one jobtracker and 
20 datanodes.
Each node has 2 * 12 cores with 32G RAM
Dose anyone tell me how to config following parameters:
mapred.tasktracker.map.tasks.maximum
mapred.tasktracker.reduce.tasks.maximum

regards!
2012-01-09 



hao.wang 


Re: Re: how to set mapred.tasktracker.map.tasks.maximum and mapred.tasktracker.reduce.tasks.maximum

2012-01-09 Thread hao.wang
Hi,
Thanks for your reply!
I had already read the pages before, can you give me sme more specific 
suggestions about how to choose the values of  
mapred.tasktracker.map.tasks.maximum and 
mapred.tasktracker.reduce.tasks.maximum according to our cluster configuration 
if possible?

regards!


2012-01-10 



hao.wang 



发件人: Harsh J 
发送时间: 2012-01-09  23:19:21 
收件人: common-user 
抄送: 
主题: Re: how to set mapred.tasktracker.map.tasks.maximum and 
mapred.tasktracker.reduce.tasks.maximum 
 
Hi,
Please read http://hadoop.apache.org/common/docs/current/single_node_setup.html 
to learn how to configure Hadoop using the various *-site.xml configuration 
files, and then follow 
http://hadoop.apache.org/common/docs/current/cluster_setup.html to achieve 
optimal configs for your cluster.
On 09-Jan-2012, at 5:50 PM, hao.wang wrote:
 Hi ,all
Our hadoop cluster has 22 nodes including one namenode, one jobtracker and 
 20 datanodes.
Each node has 2 * 12 cores with 32G RAM
Dose anyone tell me how to config following parameters:
mapred.tasktracker.map.tasks.maximum
mapred.tasktracker.reduce.tasks.maximum
 
 regards!
 2012-01-09 
 
 
 
 hao.wang 


Re: Re: how to set mapred.tasktracker.map.tasks.maximum and mapred.tasktracker.reduce.tasks.maximum

2012-01-09 Thread hao.wang
Hi , 

Thanks for your reply!
According to your suggestion, Maybe I can't apply it to our hadoop cluster.
Cus, each server in our hadoop cluster just contains 2 CPUs. 
 So, I think maybe you mean the core #  but not CPU # in each searver? 
I am looking for your reply.

regards!


2012-01-10 



hao.wang 



发件人: Harsh J 
发送时间: 2012-01-10  11:33:38 
收件人: common-user 
抄送: 
主题: Re: how to set mapred.tasktracker.map.tasks.maximum and 
mapred.tasktracker.reduce.tasks.maximum 
 
Hello again,
Try a 4:3 ratio between maps and reduces, against a total # of available CPUs 
per node (minus one or two, for DN and HBase if you run those). Then tweak it 
as you go (more map-only loads or more map-reduce loads, that depends on your 
usage, and you can tweak the ratio accordingly over time -- changing those 
props do not need JobTracker restarts, just TaskTracker).
On 10-Jan-2012, at 8:17 AM, hao.wang wrote:
 Hi,
Thanks for your reply!
I had already read the pages before, can you give me sme more specific 
 suggestions about how to choose the values of  
 mapred.tasktracker.map.tasks.maximum and 
 mapred.tasktracker.reduce.tasks.maximum according to our cluster 
 configuration if possible?
 
 regards!
 
 
 2012-01-10 
 
 
 
 hao.wang 
 
 
 
 发件人: Harsh J 
 发送时间: 2012-01-09  23:19:21 
 收件人: common-user 
 抄送: 
 主题: Re: how to set mapred.tasktracker.map.tasks.maximum and 
 mapred.tasktracker.reduce.tasks.maximum 
 
 Hi,
 Please read 
 http://hadoop.apache.org/common/docs/current/single_node_setup.html to learn 
 how to configure Hadoop using the various *-site.xml configuration files, and 
 then follow http://hadoop.apache.org/common/docs/current/cluster_setup.html 
 to achieve optimal configs for your cluster.
 On 09-Jan-2012, at 5:50 PM, hao.wang wrote:
 Hi ,all
   Our hadoop cluster has 22 nodes including one namenode, one jobtracker and 
 20 datanodes.
   Each node has 2 * 12 cores with 32G RAM
   Dose anyone tell me how to config following parameters:
   mapred.tasktracker.map.tasks.maximum
   mapred.tasktracker.reduce.tasks.maximum
 
 regards!
 2012-01-09 
 
 
 
 hao.wang 


block size

2011-09-20 Thread hao.wang
Hi All:
   I have lots of small files stored in HDFS. My HDFS block size is 128M. Each 
file is significantly smaller than the HDFS block size.  Then, I want to know 
whether the small file used 128M in HDFS?

regards
2011-09-21 



hao.wang 


Re: Re: block size

2011-09-20 Thread hao.wang
Hi, Joey:
Thanks for your help!


2011-09-21 



hao.wang 



发件人: Joey Echeverria 
发送时间: 2011-09-21  10:10:54 
收件人: common-user 
抄送: 
主题: Re: block size 
 
HDFS blocks are stored as files in the underlying filesystem of your
datanodes. Those files do not take a fixed amount of space, so if you
store 10 MB in a file and you have 128 MB blocks, you still only use
10 MB (times 3 with default replication).
However, the namenode does incur additional overhead by having to
track a larger number of small files. So, if you can merge files, it's
best practice to do so.
-Joey
On Tue, Sep 20, 2011 at 9:54 PM, hao.wang hao.w...@ipinyou.com wrote:
 Hi All:
   I have lots of small files stored in HDFS. My HDFS block size is 128M. Each 
 file is significantly smaller than the HDFS block size.  Then, I want to know 
 whether the small file used 128M in HDFS?

 regards
 2011-09-21



 hao.wang

-- 
Joseph Echeverria
Cloudera, Inc.
443.305.9434