Re: dual core configuration

2008-10-08 Thread Taeho Kang
First of all, mapred.tasktracker.map.tasks.maximum and
mapred.tasktracker.reduce.tasks.maximum are both set to 2 in
hadoop-default.xml file; this file is read before hadoop-site.xml file so
any properties that aren't set in hadoop-site.xml will follow the values set
in hadoop-default.xml.
As for the question on why only one core is utilized...
I think it really depends on the process scheduling of the underlying OS.
It's not like two tasks (two JVM subprocesses spawned by the tasktracker)
will always run on independent cores as there are other processes which need
one or more cores to be run.

By the way, what tools did you use to find out which tasks (or processes)
use which cores?

/Taeho


On Wed, Oct 8, 2008 at 1:01 PM, Alex Loddengaard
[EMAIL PROTECTED]wrote:

 Taeho, I was going to suggest this change as well, but it's documented that
 mapred.tasktracker.map.tasks.maximum defaults to 2.  Can you explain why
 Elia is only having one core utilized when this config option is set to 2?

 Here is the documentation I'm referring to:
 http://hadoop.apache.org/core/docs/r0.18.1/cluster_setup.html

 Alex

 On Tue, Oct 7, 2008 at 8:27 PM, Taeho Kang [EMAIL PROTECTED] wrote:

  You can have your node (tasktracker) running more than 1 task
  simultaneously.
  You may set mapred.tasktracker.map.tasks.maximum and
  mapred.tasktracker.reduce.tasks.maximum properties found in
  hadoop-site.xml file. You should change hadoop-site.xml file on all your
  slave nodes depending on how many cores each slave has. For example, you
  don't really want to have 8 tasks running at once on a 2 core machine.
 
  /Taeho
 
  On Wed, Oct 8, 2008 at 5:53 AM, Elia Mazzawi
  [EMAIL PROTECTED]wrote:
 
   hello,
  
   I have some dual core nodes, and I've noticed hadoop is only running 1
   instance, and so is only using 1 on the CPU's on each node.
   is there a configuration to tell it to run more than once?
   or do i need to turn each machine into 2 nodes?
  
   Thanks.
  
 



Re: dual core configuration

2008-10-08 Thread Alex Loddengaard
Elia, perhaps you can try changing mapred.tasktracker.map.tasks.maximum
and mapred.tasktracker.reduce.tasks.maximum to 4 in hadoop-site.xml in
hopes of getting better utilization.  It's strange to me that having these
both set to 2 only utilizes a single core, because I would imagine that any
modern OS scheduler would do a good job of core utilization.

Just a thought.

Alex

On Wed, Oct 8, 2008 at 12:52 AM, Taeho Kang [EMAIL PROTECTED] wrote:

 First of all, mapred.tasktracker.map.tasks.maximum and
 mapred.tasktracker.reduce.tasks.maximum are both set to 2 in
 hadoop-default.xml file; this file is read before hadoop-site.xml file so
 any properties that aren't set in hadoop-site.xml will follow the values
 set
 in hadoop-default.xml.
 As for the question on why only one core is utilized...
 I think it really depends on the process scheduling of the underlying OS.
 It's not like two tasks (two JVM subprocesses spawned by the tasktracker)
 will always run on independent cores as there are other processes which
 need
 one or more cores to be run.

 By the way, what tools did you use to find out which tasks (or processes)
 use which cores?

 /Taeho


 On Wed, Oct 8, 2008 at 1:01 PM, Alex Loddengaard
 [EMAIL PROTECTED]wrote:

  Taeho, I was going to suggest this change as well, but it's documented
 that
  mapred.tasktracker.map.tasks.maximum defaults to 2.  Can you explain
 why
  Elia is only having one core utilized when this config option is set to
 2?
 
  Here is the documentation I'm referring to:
  http://hadoop.apache.org/core/docs/r0.18.1/cluster_setup.html
 
  Alex
 
  On Tue, Oct 7, 2008 at 8:27 PM, Taeho Kang [EMAIL PROTECTED] wrote:
 
   You can have your node (tasktracker) running more than 1 task
   simultaneously.
   You may set mapred.tasktracker.map.tasks.maximum and
   mapred.tasktracker.reduce.tasks.maximum properties found in
   hadoop-site.xml file. You should change hadoop-site.xml file on all
 your
   slave nodes depending on how many cores each slave has. For example,
 you
   don't really want to have 8 tasks running at once on a 2 core machine.
  
   /Taeho
  
   On Wed, Oct 8, 2008 at 5:53 AM, Elia Mazzawi
   [EMAIL PROTECTED]wrote:
  
hello,
   
I have some dual core nodes, and I've noticed hadoop is only running
 1
instance, and so is only using 1 on the CPU's on each node.
is there a configuration to tell it to run more than once?
or do i need to turn each machine into 2 nodes?
   
Thanks.
   
  
 



Re: dual core configuration

2008-10-08 Thread Elia Mazzawi

false alarm guys, thanks for the replies,
I do have 2 set as the task maximum, and it is utilizing 2 cores 
according to top.
I must have caught it in between tasks or during the reduce, since i had 
only 1 reducer per node going on at the time.


hadoop-default.xml:
property
 namemapred.tasktracker.map.tasks.maximum/name
 value2/value
/property

output from top:

top - 12:54:50 up 48 days, 16:19,  1 user,  load average: 2.60, 1.55, 0.66
Tasks:  80 total,   3 running,  77 sleeping,   0 stopped,   0 zombie
Cpu0  : 98.1%us,  1.6%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.0%hi,  0.3%si,  
0.0%st
Cpu1  : 95.8%us,  2.9%sy,  0.0%ni,  0.0%id,  1.3%wa,  0.0%hi,  0.0%si,  
0.0%st

Mem:   1035160k total,  1019608k used,15552k free, 1808k buffers
Swap:  2031608k total,  372k used,  2031236k free,   293612k cached

 PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+  
COMMAND  
2469 root  25   0  410m 161m  10m R 44.5 15.9   0:40.40 
java 
2446 root  25   0  411m 161m  11m R 43.2 16.0   0:45.88 
java   



Alex Loddengaard wrote:

Elia, perhaps you can try changing mapred.tasktracker.map.tasks.maximum
and mapred.tasktracker.reduce.tasks.maximum to 4 in hadoop-site.xml in
hopes of getting better utilization.  It's strange to me that having these
both set to 2 only utilizes a single core, because I would imagine that any
modern OS scheduler would do a good job of core utilization.

Just a thought.

Alex

On Wed, Oct 8, 2008 at 12:52 AM, Taeho Kang [EMAIL PROTECTED] wrote:

  

First of all, mapred.tasktracker.map.tasks.maximum and
mapred.tasktracker.reduce.tasks.maximum are both set to 2 in
hadoop-default.xml file; this file is read before hadoop-site.xml file so
any properties that aren't set in hadoop-site.xml will follow the values
set
in hadoop-default.xml.
As for the question on why only one core is utilized...
I think it really depends on the process scheduling of the underlying OS.
It's not like two tasks (two JVM subprocesses spawned by the tasktracker)
will always run on independent cores as there are other processes which
need
one or more cores to be run.

By the way, what tools did you use to find out which tasks (or processes)
use which cores?

/Taeho


On Wed, Oct 8, 2008 at 1:01 PM, Alex Loddengaard
[EMAIL PROTECTED]wrote:



Taeho, I was going to suggest this change as well, but it's documented
  

that


mapred.tasktracker.map.tasks.maximum defaults to 2.  Can you explain
  

why


Elia is only having one core utilized when this config option is set to
  

2?


Here is the documentation I'm referring to:
http://hadoop.apache.org/core/docs/r0.18.1/cluster_setup.html

Alex

On Tue, Oct 7, 2008 at 8:27 PM, Taeho Kang [EMAIL PROTECTED] wrote:

  

You can have your node (tasktracker) running more than 1 task
simultaneously.
You may set mapred.tasktracker.map.tasks.maximum and
mapred.tasktracker.reduce.tasks.maximum properties found in
hadoop-site.xml file. You should change hadoop-site.xml file on all


your


slave nodes depending on how many cores each slave has. For example,


you


don't really want to have 8 tasks running at once on a 2 core machine.

/Taeho

On Wed, Oct 8, 2008 at 5:53 AM, Elia Mazzawi
[EMAIL PROTECTED]wrote:



hello,

I have some dual core nodes, and I've noticed hadoop is only running
  

1


instance, and so is only using 1 on the CPU's on each node.
is there a configuration to tell it to run more than once?
or do i need to turn each machine into 2 nodes?

Thanks.

  


  




dual core configuration

2008-10-07 Thread Elia Mazzawi

hello,

I have some dual core nodes, and I've noticed hadoop is only running 1 
instance, and so is only using 1 on the CPU's on each node.

is there a configuration to tell it to run more than once?
or do i need to turn each machine into 2 nodes?

Thanks.


Re: dual core configuration

2008-10-07 Thread Taeho Kang
You can have your node (tasktracker) running more than 1 task
simultaneously.
You may set mapred.tasktracker.map.tasks.maximum and
mapred.tasktracker.reduce.tasks.maximum properties found in
hadoop-site.xml file. You should change hadoop-site.xml file on all your
slave nodes depending on how many cores each slave has. For example, you
don't really want to have 8 tasks running at once on a 2 core machine.

/Taeho

On Wed, Oct 8, 2008 at 5:53 AM, Elia Mazzawi
[EMAIL PROTECTED]wrote:

 hello,

 I have some dual core nodes, and I've noticed hadoop is only running 1
 instance, and so is only using 1 on the CPU's on each node.
 is there a configuration to tell it to run more than once?
 or do i need to turn each machine into 2 nodes?

 Thanks.



Re: dual core configuration

2008-10-07 Thread Alex Loddengaard
Taeho, I was going to suggest this change as well, but it's documented that
mapred.tasktracker.map.tasks.maximum defaults to 2.  Can you explain why
Elia is only having one core utilized when this config option is set to 2?

Here is the documentation I'm referring to:
http://hadoop.apache.org/core/docs/r0.18.1/cluster_setup.html

Alex

On Tue, Oct 7, 2008 at 8:27 PM, Taeho Kang [EMAIL PROTECTED] wrote:

 You can have your node (tasktracker) running more than 1 task
 simultaneously.
 You may set mapred.tasktracker.map.tasks.maximum and
 mapred.tasktracker.reduce.tasks.maximum properties found in
 hadoop-site.xml file. You should change hadoop-site.xml file on all your
 slave nodes depending on how many cores each slave has. For example, you
 don't really want to have 8 tasks running at once on a 2 core machine.

 /Taeho

 On Wed, Oct 8, 2008 at 5:53 AM, Elia Mazzawi
 [EMAIL PROTECTED]wrote:

  hello,
 
  I have some dual core nodes, and I've noticed hadoop is only running 1
  instance, and so is only using 1 on the CPU's on each node.
  is there a configuration to tell it to run more than once?
  or do i need to turn each machine into 2 nodes?
 
  Thanks.