Fair Scheduler Problem
Hi ,All, I encountered a problem in using Cloudera Hadoop 0.20.2-cdh3u1. When I use the fair Scheduler I find the scheduler seems not support preemption. Can anybody tell me whether preemption is supported in this version? This is my configration: mapred-site.xml property namemapred.jobtracker.taskScheduler/name valueorg.apache.hadoop.mapred.FairScheduler/value /property property namemapred.fairscheduler.allocation.file/name value/usr/lib/hadoop-0.20/conf/fair-scheduler.xml/value /property property namemapred.fairscheduler.preemption/name valuetrue/value /property property namemapred.fairscheduler.preemption.only.log/name valuetrue/value /property property namemapred.fairscheduler.preemption.interval/name value15000/value /property property namemapred.fairscheduler.weightadjuster/name valueorg.apache.hadoop.mapred.NewJobWeightBooster/value /property property namemapred.fairscheduler.sizebasedweight/name valuetrue/value /property fair-scheduler.xml allocations pool name=root minMaps10/minMaps minReduces5/minReduces maxMaps200/maxMaps maxReduces80/maxReduces maxRunningJobs100/maxRunningJobs minSharePreemptionTimeout30/minSharePreemptionTimeout weight1.0/weight /pool pool name=hadoop minMaps10/minMaps minReduces5/minReduces maxMaps80/maxMaps maxReduces80/maxReduces maxRunningJobs5/maxRunningJobs minSharePreemptionTimeout30/minSharePreemptionTimeout weight1.0/weight /pool user name=user1 maxRunningJobs10/maxRunningJobs /user poolMaxJobsDefault20/poolMaxJobsDefault userMaxJobsDefault10/userMaxJobsDefault defaultMinSharePreemptionTimeout30/defaultMinSharePreemptionTimeout fairSharePreemptionTimeout30/fairSharePreemptionTimeout /allocations regards, 2012-03-07 hao.wang
Re: Re: Fair Scheduler Problem
Hi, Thanks for your reply! I have solved this problem by setting mapred.fairscheduler.preemption.only.log to false. The preemption works! But I don't know why can not set mapred.fairscheduler.preemption.only.log to true. Is it a bug? regards, 2012-03-07 hao.wang 发件人: Harsh J 发送时间: 2012-03-07 14:14:05 收件人: common-user 抄送: 主题: Re: Fair Scheduler Problem Hello Hao, Its best to submit CDH user queries to https://groups.google.com/a/cloudera.org/group/cdh-user/topics (cdh-u...@cloudera.org) where the majority of CDH users community resides. How do you determine that preemption did not/does not work? Preemption between pools occurs if a pool's minShare isn't satisfied within preemption-timeout seconds. In this case, it will preempt tasks from other pools. Your settings look alright on a high level. Does your log not carry any preemption logs? What was your pool's share scenario when you tried to observe if it works or not? On Wed, Mar 7, 2012 at 8:35 AM, hao.wang hao.w...@ipinyou.com wrote: Hi ,All, I encountered a problem in using Cloudera Hadoop 0.20.2-cdh3u1. When I use the fair Scheduler I find the scheduler seems not support preemption. Can anybody tell me whether preemption is supported in this version? This is my configration: mapred-site.xml property namemapred.jobtracker.taskScheduler/name valueorg.apache.hadoop.mapred.FairScheduler/value /property property namemapred.fairscheduler.allocation.file/name value/usr/lib/hadoop-0.20/conf/fair-scheduler.xml/value /property property namemapred.fairscheduler.preemption/name valuetrue/value /property property namemapred.fairscheduler.preemption.only.log/name valuetrue/value /property property namemapred.fairscheduler.preemption.interval/name value15000/value /property property namemapred.fairscheduler.weightadjuster/name valueorg.apache.hadoop.mapred.NewJobWeightBooster/value /property property namemapred.fairscheduler.sizebasedweight/name valuetrue/value /property fair-scheduler.xml allocations pool name=root minMaps10/minMaps minReduces5/minReduces maxMaps200/maxMaps maxReduces80/maxReduces maxRunningJobs100/maxRunningJobs minSharePreemptionTimeout30/minSharePreemptionTimeout weight1.0/weight /pool pool name=hadoop minMaps10/minMaps minReduces5/minReduces maxMaps80/maxMaps maxReduces80/maxReduces maxRunningJobs5/maxRunningJobs minSharePreemptionTimeout30/minSharePreemptionTimeout weight1.0/weight /pool user name=user1 maxRunningJobs10/maxRunningJobs /user poolMaxJobsDefault20/poolMaxJobsDefault userMaxJobsDefault10/userMaxJobsDefault defaultMinSharePreemptionTimeout30/defaultMinSharePreemptionTimeout fairSharePreemptionTimeout30/fairSharePreemptionTimeout /allocations regards, 2012-03-07 hao.wang -- Harsh J
Re: Re: how to set mapred.tasktracker.map.tasks.maximum and mapred.tasktracker.reduce.tasks.maximum
Hi, Thanks for your help, your suggestion is very usefully. I have another question that is whether the sum of maps and reduces equals to the total number of cores. regards! 2012-01-10 hao.wang 发件人: Harsh J 发送时间: 2012-01-10 16:44:07 收件人: common-user 抄送: 主题: Re: how to set mapred.tasktracker.map.tasks.maximum and mapred.tasktracker.reduce.tasks.maximum Hello Hao, Am sorry if I confused you. By CPUs I meant the CPUs visible to your OS (/proc/cpuinfo), so yes the total number of cores. On 10-Jan-2012, at 12:39 PM, hao.wang wrote: Hi , Thanks for your reply! According to your suggestion, Maybe I can't apply it to our hadoop cluster. Cus, each server in our hadoop cluster just contains 2 CPUs. So, I think maybe you mean the core # but not CPU # in each searver? I am looking for your reply. regards! 2012-01-10 hao.wang 发件人: Harsh J 发送时间: 2012-01-10 11:33:38 收件人: common-user 抄送: 主题: Re: how to set mapred.tasktracker.map.tasks.maximum and mapred.tasktracker.reduce.tasks.maximum Hello again, Try a 4:3 ratio between maps and reduces, against a total # of available CPUs per node (minus one or two, for DN and HBase if you run those). Then tweak it as you go (more map-only loads or more map-reduce loads, that depends on your usage, and you can tweak the ratio accordingly over time -- changing those props do not need JobTracker restarts, just TaskTracker). On 10-Jan-2012, at 8:17 AM, hao.wang wrote: Hi, Thanks for your reply! I had already read the pages before, can you give me sme more specific suggestions about how to choose the values of mapred.tasktracker.map.tasks.maximum and mapred.tasktracker.reduce.tasks.maximum according to our cluster configuration if possible? regards! 2012-01-10 hao.wang 发件人: Harsh J 发送时间: 2012-01-09 23:19:21 收件人: common-user 抄送: 主题: Re: how to set mapred.tasktracker.map.tasks.maximum and mapred.tasktracker.reduce.tasks.maximum Hi, Please read http://hadoop.apache.org/common/docs/current/single_node_setup.html to learn how to configure Hadoop using the various *-site.xml configuration files, and then follow http://hadoop.apache.org/common/docs/current/cluster_setup.html to achieve optimal configs for your cluster. On 09-Jan-2012, at 5:50 PM, hao.wang wrote: Hi ,all Our hadoop cluster has 22 nodes including one namenode, one jobtracker and 20 datanodes. Each node has 2 * 12 cores with 32G RAM Dose anyone tell me how to config following parameters: mapred.tasktracker.map.tasks.maximum mapred.tasktracker.reduce.tasks.maximum regards! 2012-01-09 hao.wang
how to set mapred.tasktracker.map.tasks.maximum and mapred.tasktracker.reduce.tasks.maximum
Hi ,all Our hadoop cluster has 22 nodes including one namenode, one jobtracker and 20 datanodes. Each node has 2 * 12 cores with 32G RAM Dose anyone tell me how to config following parameters: mapred.tasktracker.map.tasks.maximum mapred.tasktracker.reduce.tasks.maximum regards! 2012-01-09 hao.wang
Re: Re: how to set mapred.tasktracker.map.tasks.maximum and mapred.tasktracker.reduce.tasks.maximum
Hi, Thanks for your reply! I had already read the pages before, can you give me sme more specific suggestions about how to choose the values of mapred.tasktracker.map.tasks.maximum and mapred.tasktracker.reduce.tasks.maximum according to our cluster configuration if possible? regards! 2012-01-10 hao.wang 发件人: Harsh J 发送时间: 2012-01-09 23:19:21 收件人: common-user 抄送: 主题: Re: how to set mapred.tasktracker.map.tasks.maximum and mapred.tasktracker.reduce.tasks.maximum Hi, Please read http://hadoop.apache.org/common/docs/current/single_node_setup.html to learn how to configure Hadoop using the various *-site.xml configuration files, and then follow http://hadoop.apache.org/common/docs/current/cluster_setup.html to achieve optimal configs for your cluster. On 09-Jan-2012, at 5:50 PM, hao.wang wrote: Hi ,all Our hadoop cluster has 22 nodes including one namenode, one jobtracker and 20 datanodes. Each node has 2 * 12 cores with 32G RAM Dose anyone tell me how to config following parameters: mapred.tasktracker.map.tasks.maximum mapred.tasktracker.reduce.tasks.maximum regards! 2012-01-09 hao.wang
Re: Re: how to set mapred.tasktracker.map.tasks.maximum and mapred.tasktracker.reduce.tasks.maximum
Hi , Thanks for your reply! According to your suggestion, Maybe I can't apply it to our hadoop cluster. Cus, each server in our hadoop cluster just contains 2 CPUs. So, I think maybe you mean the core # but not CPU # in each searver? I am looking for your reply. regards! 2012-01-10 hao.wang 发件人: Harsh J 发送时间: 2012-01-10 11:33:38 收件人: common-user 抄送: 主题: Re: how to set mapred.tasktracker.map.tasks.maximum and mapred.tasktracker.reduce.tasks.maximum Hello again, Try a 4:3 ratio between maps and reduces, against a total # of available CPUs per node (minus one or two, for DN and HBase if you run those). Then tweak it as you go (more map-only loads or more map-reduce loads, that depends on your usage, and you can tweak the ratio accordingly over time -- changing those props do not need JobTracker restarts, just TaskTracker). On 10-Jan-2012, at 8:17 AM, hao.wang wrote: Hi, Thanks for your reply! I had already read the pages before, can you give me sme more specific suggestions about how to choose the values of mapred.tasktracker.map.tasks.maximum and mapred.tasktracker.reduce.tasks.maximum according to our cluster configuration if possible? regards! 2012-01-10 hao.wang 发件人: Harsh J 发送时间: 2012-01-09 23:19:21 收件人: common-user 抄送: 主题: Re: how to set mapred.tasktracker.map.tasks.maximum and mapred.tasktracker.reduce.tasks.maximum Hi, Please read http://hadoop.apache.org/common/docs/current/single_node_setup.html to learn how to configure Hadoop using the various *-site.xml configuration files, and then follow http://hadoop.apache.org/common/docs/current/cluster_setup.html to achieve optimal configs for your cluster. On 09-Jan-2012, at 5:50 PM, hao.wang wrote: Hi ,all Our hadoop cluster has 22 nodes including one namenode, one jobtracker and 20 datanodes. Each node has 2 * 12 cores with 32G RAM Dose anyone tell me how to config following parameters: mapred.tasktracker.map.tasks.maximum mapred.tasktracker.reduce.tasks.maximum regards! 2012-01-09 hao.wang
block size
Hi All: I have lots of small files stored in HDFS. My HDFS block size is 128M. Each file is significantly smaller than the HDFS block size. Then, I want to know whether the small file used 128M in HDFS? regards 2011-09-21 hao.wang
Re: Re: block size
Hi, Joey: Thanks for your help! 2011-09-21 hao.wang 发件人: Joey Echeverria 发送时间: 2011-09-21 10:10:54 收件人: common-user 抄送: 主题: Re: block size HDFS blocks are stored as files in the underlying filesystem of your datanodes. Those files do not take a fixed amount of space, so if you store 10 MB in a file and you have 128 MB blocks, you still only use 10 MB (times 3 with default replication). However, the namenode does incur additional overhead by having to track a larger number of small files. So, if you can merge files, it's best practice to do so. -Joey On Tue, Sep 20, 2011 at 9:54 PM, hao.wang hao.w...@ipinyou.com wrote: Hi All: I have lots of small files stored in HDFS. My HDFS block size is 128M. Each file is significantly smaller than the HDFS block size. Then, I want to know whether the small file used 128M in HDFS? regards 2011-09-21 hao.wang -- Joseph Echeverria Cloudera, Inc. 443.305.9434