RE: Why most of the free reduce slots are NOT used for my Hadoop Jobs? Thanks.

2012-03-12 Thread WangRamon

Hi Satish I'm not sure about this, but it's a double Quad-Core CPU, each of 
them is Hyper Threading, so 14 mapper and 14 reducer for a node should be OK, 
right? I find each task finishs very quickly, no more than 20 seconds, so is 
this the root cause for this problem? ThanksRamon
 From: satish.se...@hcl.com
To: mapreduce-user@hadoop.apache.org; ramon_w...@hotmail.com
Date: Mon, 12 Mar 2012 09:00:49 +0530
Subject: RE: Why most of the free reduce slots are NOT used for my Hadoop Jobs? 
Thanks.












 
Just guessing if this has something to do with number of cores/cpus.
Myself noticed this for number of map tasks spawned - depending on number of  
input files and also number of tasks to run concurrently depends on number 
cores/cpus.
 
Thanks
 
 


From: WangRamon [ramon_w...@hotmail.com]

Sent: Saturday, March 10, 2012 5:35 PM

To: mapreduce-user@hadoop.apache.org

Subject: RE: Why most of the free reduce slots are NOT used for my Hadoop Jobs? 
Thanks.






Joey, here is the information:

 

Cluster Summary (Heap Size is 481.88 MB/1.74 GB)

Maps Reduces Total Submissions Nodes Map Task Capacity Reduce Task Capacity 
Avg. Tasks/Node Blacklisted Nodes


06 11  3 42 
  42 28.00 & nbsp; 0 

 

 

Cheers

Ramon



 




Subject: Re: Why most of the free reduce slots are NOT used for my Hadoop Jobs? 
Thanks.

From: j...@cloudera.com

Date: Sat, 10 Mar 2012 07:00:26 -0500

To: mapreduce-user@hadoop.apache.org



What does the jobtracker web page say is the total reduce capacity?



-Joey








On Mar 10, 2012, at 5:39, WangRamon  wrote:







Hi All

 

I'm using Hadoop-0.20-append, the cluster contains 3 nodes, for each node I 
have 14 map and 14 reduce slots, here is the configuration:

 

 



mapred.tasktracker.map.tasks.maximum

14





mapred.tasktracker.reduce.tasks.maximum

14





mapred.reduce.tasks

73





 

When I submit 5 Jobs simultane ously (the input data for each job is not so big 
for the test, it's about 2~5M in size), I assume the Jobs will use the slots as 
much as possible, each Job did created 73 Reduce Tasks as configured above, so 
there will be 5 *
 73 Reduce Tasks in total, but, most of them are in pending state, only about 
12 of them are running, it's too small compared to the total slots number for 
reduce, 42 reduce slots for the 3 nodes cluster.


 

What interestring is that it always about 12 of them are running, I tried a few 
times.

 

So, I thought it might because about the scheduler, I changed it to Fair 
Scheduler, I created 3 pools, the configure is as below:

 





 

  14

  14

  1.0

 

 

  14

  14

  1.0

 

 

  14

  14

  1.0

 

 

 

 

Then I submit the 5 Jobs simultaneously to these pools randomly again, I can 
see the jobs were assigned to different pools, but, it's still the same problem 
only about 12 of the reduce tasks from different pool are running, here is the 
output i copied from
 the Fair Scheduler monitor GUI:

 

pool-a 2 14 14 0 9

pool-b 0 14 14 0 0 

pool-c 2 14 14 0 3 

 

pool-a and pool-c have a total of 12 reduce tasks running, but I do have about 
11 reduce slots at least available in my cluster.

 

So can anyone please give me some suggestions, why NOT all my REDUCE SLOTS are 
working? Thanks in advance.


 

Cheers 

Ramon











::DISCLAIMER::

---



The contents of this e-mail and any attachment(s) are confidential and intended 
for the named recipient(s) only.

It shall not attach any liability on the originator or HCL or its affiliates. 
Any views or opinions presented in

this email are solely those of the author and may not necessarily reflect the 
opinions of HCL or its affiliates.

Any form of reproduction, dissemination, copying, disclosure, modification, 
distribution and / or publication of

this message without the prior written consent of the author of this e-mail is 
strictly prohibited. If you have

received this email in error please delete it and notify the sender 
immediately. Before opening any mail and

attachments please check them for viruses and defect.



---

  

RE: Why most of the free reduce slots are NOT used for my Hadoop Jobs? Thanks.

2012-03-10 Thread WangRamon

Joey, here is the information: Cluster Summary (Heap Size is 481.88 MB/1.74 
GB)Maps Reduces Total Submissions Nodes Map Task Capacity Reduce Task Capacity 
Avg. Tasks/Node Blacklisted Nodes 
06 11  3 42 
  42 28.00   0   
CheersRamon

 Subject: Re: Why most of the free reduce slots are NOT used for my Hadoop 
Jobs? Thanks.
From: j...@cloudera.com
Date: Sat, 10 Mar 2012 07:00:26 -0500
To: mapreduce-user@hadoop.apache.org



What does the jobtracker web page say is the total reduce capacity?
-Joey



On Mar 10, 2012, at 5:39, WangRamon  wrote:








Hi All
 
I'm using Hadoop-0.20-append, the cluster contains 3 nodes, for each node I 
have 14 map and 14 reduce slots, here is the configuration:
 
 

mapred.tasktracker.map.tasks.maximum
14


mapred.tasktracker.reduce.tasks.maximum
14


mapred.reduce.tasks
73


 
When I submit 5 Jobs simultane
 ously (the input data for each job is not so big for the test, it's about 2~5M 
in size), I assume the Jobs will use the slots as much as possible, each Job 
did created 73 Reduce Tasks as configured above, so there will be 5 * 73 Reduce 
Tasks in total, but, most of them are in pending state, only about 12 of them 
are running, it's too small compared to the total slots number for reduce, 42 
reduce slots for the 3 nodes cluster. 
 
What interestring is that it always about 12 of them are running, I tried a few 
times.
 
So, I thought it might because about the scheduler, I changed it to Fair 
Scheduler, I created 3 pools, the configure is as below:
 


 
  14
  14
  1.0
 
 
  14
  14
  1.0
 
 
  14
  14
  1.0
 
 
 
 
Then I submit the 5 Jobs simultaneously to these pools randomly again, I can 
see the jobs were assigned to different pools, but, it's still the same problem 
only about 12 of the reduce tasks from different pool are running, here is the 
output i copied from the Fair Scheduler monitor GUI:
 
pool-a 2 14 14 0 9
pool-b 0 14 14 0 0 
pool-c 2 14 14 0 3 
 
pool-a and pool-c have a total of 12 reduce tasks running, but I do have about 
11 reduce slots at least available in my cluster.
 
So can anyone
  please give me some suggestions, why NOT all my REDUCE SLOTS are working? 
Thanks in advance. 
 
Cheers 
Ramon
  
  

Re: Why most of the free reduce slots are NOT used for my Hadoop Jobs? Thanks.

2012-03-10 Thread Joey Echeverria
What does the jobtracker web page say is the total reduce capacity?

-Joey



On Mar 10, 2012, at 5:39, WangRamon  wrote:

> Hi All
>  
> I'm using Hadoop-0.20-append, the cluster contains 3 nodes, for each node I 
> have 14 map and 14 reduce slots, here is the configuration:
>  
>  
> 
> mapred.tasktracker.map.tasks.maximum
> 14
> 
> 
> mapred.tasktracker.reduce.tasks.maximum
> 14
> 
> 
> mapred.reduce.tasks
> 73
> 
> 
>  
> When I submit 5 Jobs simultane ously (the input data for each job is not so 
> big for the test, it's about 2~5M in size), I assume the Jobs will use the 
> slots as much as possible, each Job did created 73 Reduce Tasks as configured 
> above, so there will be 5 * 73 Reduce Tasks in total, but, most of them are 
> in pending state, only about 12 of them are running, it's too small compared 
> to the total slots number for reduce, 42 reduce slots for the 3 nodes 
> cluster. 
>  
> What interestring is that it always about 12 of them are running, I tried a 
> few times.
>  
> So, I thought it might because about the scheduler, I changed it to Fair 
> Scheduler, I created 3 pools, the configure is as below:
>  
> 
> 
>  
>   14
>   14
>   1.0
>  
>  
>   14
>   14
>   1.0
>  
>  
>   14
>   14
>   1.0
>  
>  
>  
>  
> Then I submit the 5 Jobs simultaneously to these pools randomly again, I can 
> see the jobs were assigned to different pools, but, it's still the same 
> problem only about 12 of the reduce tasks from different pool are running, 
> here is the output i copied from the Fair Scheduler monitor GUI:
>  
> pool-a 2 14 14 0 9
> pool-b 0 14 14 0 0 
> pool-c 2 14 14 0 3 
>  
> pool-a and pool-c have a total of 12 reduce tasks running, but I do have 
> about 11 reduce slots at least available in my cluster.
>  
> So can anyone please give me some suggestions, why NOT all my REDUCE SLOTS 
> are working? Thanks in advance. 
>  
> Cheers 
> Ramon


Why most of the free reduce slots are NOT used for my Hadoop Jobs? Thanks.

2012-03-10 Thread WangRamon




Hi All I'm using Hadoop-0.20-append, the cluster contains 3 nodes, for each 
node I have 14 map and 14 reduce slots, here is the configuration:  

mapred.tasktracker.map.tasks.maximum
14

mapred.tasktracker.reduce.tasks.maximum
14

mapred.reduce.tasks
73

 When I submit 5 Jobs simultaneously (the input data for each job is not so big 
for the test, it's about 2~5M in size), I assume the Jobs will use the slots as 
much as possible, each Job did created 73 Reduce Tasks as configured above, so 
there will be 5 * 73 Reduce Tasks in total, but, most of them are in pending 
state, only about 12 of them are running, it's too small compared to the total 
slots number for reduce, 42 reduce slots for the 3 nodes cluster.  What 
interestring is that it always about 12 of them are running, I tried a few 
times. So, I thought it might because about the scheduler, I changed it to Fair 
Scheduler, I created 3 pools, the configure is as below: 

 
  14
  14
  1.0
 
 
  14
  14
  1.0
 
 
  14
  14
  1.0
 
 
  Then I submit the 5 Jobs simultaneously to these pools randomly 
again, I can see the jobs were assigned to different pools, but, it's still the 
same problem only about 12 of the reduce tasks from different pool are running, 
here is the output i copied from the Fair Scheduler monitor GUI: pool-a 2 14 14 
0 9
pool-b 0 14 14 0 0 
pool-c 2 14 14 0 3  pool-a and pool-c have a total of 12 reduce tasks running, 
but I do have about 11 reduce slots at least available in my cluster. So can 
anyone please give me some suggestions, why NOT all my REDUCE SLOTS are 
working? Thanks in advance.  Cheers Ramon