Re: specifying number of nodes for job

2008-09-09 Thread Owen O'Malley
On Mon, Sep 8, 2008 at 4:26 PM, Sandy [EMAIL PROTECTED] wrote:

 In all seriousness though, why is this not possible? Is there something
 about the MapReduce model of parallel computation that I am not
 understanding? Or this more of an arbitrary implementation choice made by
 the Hadoop framework? If so, I am curious why this is the case. What are
 the
 benefits?


It is possible to do with changes to Hadoop. There was a jira filed for it,
but I don't think anyone has worked on it. (HADOOP-2573) For Map/Reduce it
is a design goal that number of tasks not nodes are the important metric.
You want a job to be able to run with any given cluster size. For
scalability testing, you could just remove task trackers...

-- Owen


Re: specifying number of nodes for job

2008-09-08 Thread hmarti2
yeah. snickerdoodle. really.


 I see.. so if I have a cluster with n nodes, there is no way for me to
 have
 it spawn on just 2 of those nodes, or just one of those nodes? And
 furthermore, there is no way for me to have it spawn on just a subset of
 the
 processors? Or am I misunderstanding?

 Also, when you say specify the number of tasks for each node are you
 referring to specifying the number of mappers and reducers I can spawn on
 each node?

 -SM

 On Sun, Sep 7, 2008 at 8:29 PM, Mafish Liu [EMAIL PROTECTED] wrote:

 On Mon, Sep 8, 2008 at 2:25 AM, Sandy [EMAIL PROTECTED] wrote:

  Hi,
 
  This may be a silly question, but I'm strangely having trouble finding
 an
  answer for it (perhaps I'm looking in the wrong places?).
 
  Suppose I have a cluster with n nodes each with m processors.
 
  I wish to test the performance of, say,  the wordcount program on k
  processors, where k is varied from k = 1 ... nm.


 You can  specify the number of tasks for each node in your
 hadoop-site.xml
 file.
 So you can get k varied from k = n, 2*nm*n instead of k = 1...nm.


  How would I do this? I'm having trouble finding the proper command
 line
  option in the commands manual (
  http://hadoop.apache.org/core/docs/current/commands_manual.html)
 
 
 
  Thank you very much for you time.
 
  -SM
 



 --
 [EMAIL PROTECTED]
 Institute of Computing Technology, Chinese Academy of Sciences, Beijing.






Re: specifying number of nodes for job

2008-09-08 Thread Sandy
-smiles- It's not nice to poke fun at people's e-mail aliases... and
snickerdoodles are delicious cookies.

In all seriousness though, why is this not possible? Is there something
about the MapReduce model of parallel computation that I am not
understanding? Or this more of an arbitrary implementation choice made by
the Hadoop framework? If so, I am curious why this is the case. What are the
benefits?

What I'm talking about is not uncommon for scalability studies. Is being
able to specify the number of processors considered a desirable feature by
developers?

Just curious, of course.

Regards,

-SM

On Mon, Sep 8, 2008 at 3:36 PM, [EMAIL PROTECTED] wrote:

 yeah. snickerdoodle. really.


  I see.. so if I have a cluster with n nodes, there is no way for me to
  have
  it spawn on just 2 of those nodes, or just one of those nodes? And
  furthermore, there is no way for me to have it spawn on just a subset of
  the
  processors? Or am I misunderstanding?
 
  Also, when you say specify the number of tasks for each node are you
  referring to specifying the number of mappers and reducers I can spawn on
  each node?
 
  -SM
 
  On Sun, Sep 7, 2008 at 8:29 PM, Mafish Liu [EMAIL PROTECTED] wrote:
 
  On Mon, Sep 8, 2008 at 2:25 AM, Sandy [EMAIL PROTECTED]
 wrote:
 
   Hi,
  
   This may be a silly question, but I'm strangely having trouble finding
  an
   answer for it (perhaps I'm looking in the wrong places?).
  
   Suppose I have a cluster with n nodes each with m processors.
  
   I wish to test the performance of, say,  the wordcount program on k
   processors, where k is varied from k = 1 ... nm.
 
 
  You can  specify the number of tasks for each node in your
  hadoop-site.xml
  file.
  So you can get k varied from k = n, 2*nm*n instead of k = 1...nm.
 
 
   How would I do this? I'm having trouble finding the proper command
  line
   option in the commands manual (
   http://hadoop.apache.org/core/docs/current/commands_manual.html)
  
  
  
   Thank you very much for you time.
  
   -SM
  
 
 
 
  --
  [EMAIL PROTECTED]
  Institute of Computing Technology, Chinese Academy of Sciences, Beijing.
 
 





Re: specifying number of nodes for job

2008-09-07 Thread Mafish Liu
On Mon, Sep 8, 2008 at 2:25 AM, Sandy [EMAIL PROTECTED] wrote:

 Hi,

 This may be a silly question, but I'm strangely having trouble finding an
 answer for it (perhaps I'm looking in the wrong places?).

 Suppose I have a cluster with n nodes each with m processors.

 I wish to test the performance of, say,  the wordcount program on k
 processors, where k is varied from k = 1 ... nm.


You can  specify the number of tasks for each node in your hadoop-site.xml
file.
So you can get k varied from k = n, 2*nm*n instead of k = 1...nm.


 How would I do this? I'm having trouble finding the proper command line
 option in the commands manual (
 http://hadoop.apache.org/core/docs/current/commands_manual.html)



 Thank you very much for you time.

 -SM




-- 
[EMAIL PROTECTED]
Institute of Computing Technology, Chinese Academy of Sciences, Beijing.