Re: specifying number of nodes for job

2008-09-10 Thread Sandy
Thank you very much for you response :)
-Suzanne

On Tue, Sep 9, 2008 at 12:13 PM, Owen O'Malley <[EMAIL PROTECTED]> wrote:

> On Mon, Sep 8, 2008 at 4:26 PM, Sandy <[EMAIL PROTECTED]> wrote:
>
> > In all seriousness though, why is this not possible? Is there something
> > about the MapReduce model of parallel computation that I am not
> > understanding? Or this more of an arbitrary implementation choice made by
> > the Hadoop framework? If so, I am curious why this is the case. What are
> > the
> > benefits?
>
>
> It is possible to do with changes to Hadoop. There was a jira filed for it,
> but I don't think anyone has worked on it. (HADOOP-2573) For Map/Reduce it
> is a design goal that number of tasks not nodes are the important metric.
> You want a job to be able to run with any given cluster size. For
> scalability testing, you could just remove task trackers...
>
> -- Owen
>


Re: specifying number of nodes for job

2008-09-09 Thread Owen O'Malley
On Mon, Sep 8, 2008 at 4:26 PM, Sandy <[EMAIL PROTECTED]> wrote:

> In all seriousness though, why is this not possible? Is there something
> about the MapReduce model of parallel computation that I am not
> understanding? Or this more of an arbitrary implementation choice made by
> the Hadoop framework? If so, I am curious why this is the case. What are
> the
> benefits?


It is possible to do with changes to Hadoop. There was a jira filed for it,
but I don't think anyone has worked on it. (HADOOP-2573) For Map/Reduce it
is a design goal that number of tasks not nodes are the important metric.
You want a job to be able to run with any given cluster size. For
scalability testing, you could just remove task trackers...

-- Owen


Re: specifying number of nodes for job

2008-09-08 Thread Sandy
-smiles- It's not nice to poke fun at people's e-mail aliases... and
snickerdoodles are delicious cookies.

In all seriousness though, why is this not possible? Is there something
about the MapReduce model of parallel computation that I am not
understanding? Or this more of an arbitrary implementation choice made by
the Hadoop framework? If so, I am curious why this is the case. What are the
benefits?

What I'm talking about is not uncommon for scalability studies. Is being
able to specify the number of processors considered a desirable feature by
developers?

Just curious, of course.

Regards,

-SM

On Mon, Sep 8, 2008 at 3:36 PM, <[EMAIL PROTECTED]> wrote:

> yeah. snickerdoodle. really.
>
>
> > I see.. so if I have a cluster with n nodes, there is no way for me to
> > have
> > it spawn on just 2 of those nodes, or just one of those nodes? And
> > furthermore, there is no way for me to have it spawn on just a subset of
> > the
> > processors? Or am I misunderstanding?
> >
> > Also, when you say "specify the number of tasks for each node" are you
> > referring to specifying the number of mappers and reducers I can spawn on
> > each node?
> >
> > -SM
> >
> > On Sun, Sep 7, 2008 at 8:29 PM, Mafish Liu <[EMAIL PROTECTED]> wrote:
> >
> >> On Mon, Sep 8, 2008 at 2:25 AM, Sandy <[EMAIL PROTECTED]>
> wrote:
> >>
> >> > Hi,
> >> >
> >> > This may be a silly question, but I'm strangely having trouble finding
> >> an
> >> > answer for it (perhaps I'm looking in the wrong places?).
> >> >
> >> > Suppose I have a cluster with n nodes each with m processors.
> >> >
> >> > I wish to test the performance of, say,  the wordcount program on k
> >> > processors, where k is varied from k = 1 ... nm.
> >>
> >>
> >> You can  specify the number of tasks for each node in your
> >> hadoop-site.xml
> >> file.
> >> So you can get k varied from k = n, 2*nm*n instead of k = 1...nm.
> >>
> >>
> >> > How would I do this? I'm having trouble finding the proper command
> >> line
> >> > option in the commands manual (
> >> > http://hadoop.apache.org/core/docs/current/commands_manual.html)
> >> >
> >> >
> >> >
> >> > Thank you very much for you time.
> >> >
> >> > -SM
> >> >
> >>
> >>
> >>
> >> --
> >> [EMAIL PROTECTED]
> >> Institute of Computing Technology, Chinese Academy of Sciences, Beijing.
> >>
> >
>
>
>


Re: specifying number of nodes for job

2008-09-08 Thread hmarti2
yeah. snickerdoodle. really.


> I see.. so if I have a cluster with n nodes, there is no way for me to
> have
> it spawn on just 2 of those nodes, or just one of those nodes? And
> furthermore, there is no way for me to have it spawn on just a subset of
> the
> processors? Or am I misunderstanding?
>
> Also, when you say "specify the number of tasks for each node" are you
> referring to specifying the number of mappers and reducers I can spawn on
> each node?
>
> -SM
>
> On Sun, Sep 7, 2008 at 8:29 PM, Mafish Liu <[EMAIL PROTECTED]> wrote:
>
>> On Mon, Sep 8, 2008 at 2:25 AM, Sandy <[EMAIL PROTECTED]> wrote:
>>
>> > Hi,
>> >
>> > This may be a silly question, but I'm strangely having trouble finding
>> an
>> > answer for it (perhaps I'm looking in the wrong places?).
>> >
>> > Suppose I have a cluster with n nodes each with m processors.
>> >
>> > I wish to test the performance of, say,  the wordcount program on k
>> > processors, where k is varied from k = 1 ... nm.
>>
>>
>> You can  specify the number of tasks for each node in your
>> hadoop-site.xml
>> file.
>> So you can get k varied from k = n, 2*nm*n instead of k = 1...nm.
>>
>>
>> > How would I do this? I'm having trouble finding the proper command
>> line
>> > option in the commands manual (
>> > http://hadoop.apache.org/core/docs/current/commands_manual.html)
>> >
>> >
>> >
>> > Thank you very much for you time.
>> >
>> > -SM
>> >
>>
>>
>>
>> --
>> [EMAIL PROTECTED]
>> Institute of Computing Technology, Chinese Academy of Sciences, Beijing.
>>
>




Re: specifying number of nodes for job

2008-09-08 Thread Sandy
I see.. so if I have a cluster with n nodes, there is no way for me to have
it spawn on just 2 of those nodes, or just one of those nodes? And
furthermore, there is no way for me to have it spawn on just a subset of the
processors? Or am I misunderstanding?

Also, when you say "specify the number of tasks for each node" are you
referring to specifying the number of mappers and reducers I can spawn on
each node?

-SM

On Sun, Sep 7, 2008 at 8:29 PM, Mafish Liu <[EMAIL PROTECTED]> wrote:

> On Mon, Sep 8, 2008 at 2:25 AM, Sandy <[EMAIL PROTECTED]> wrote:
>
> > Hi,
> >
> > This may be a silly question, but I'm strangely having trouble finding an
> > answer for it (perhaps I'm looking in the wrong places?).
> >
> > Suppose I have a cluster with n nodes each with m processors.
> >
> > I wish to test the performance of, say,  the wordcount program on k
> > processors, where k is varied from k = 1 ... nm.
>
>
> You can  specify the number of tasks for each node in your hadoop-site.xml
> file.
> So you can get k varied from k = n, 2*nm*n instead of k = 1...nm.
>
>
> > How would I do this? I'm having trouble finding the proper command line
> > option in the commands manual (
> > http://hadoop.apache.org/core/docs/current/commands_manual.html)
> >
> >
> >
> > Thank you very much for you time.
> >
> > -SM
> >
>
>
>
> --
> [EMAIL PROTECTED]
> Institute of Computing Technology, Chinese Academy of Sciences, Beijing.
>


Re: specifying number of nodes for job

2008-09-07 Thread Mafish Liu
On Mon, Sep 8, 2008 at 2:25 AM, Sandy <[EMAIL PROTECTED]> wrote:

> Hi,
>
> This may be a silly question, but I'm strangely having trouble finding an
> answer for it (perhaps I'm looking in the wrong places?).
>
> Suppose I have a cluster with n nodes each with m processors.
>
> I wish to test the performance of, say,  the wordcount program on k
> processors, where k is varied from k = 1 ... nm.


You can  specify the number of tasks for each node in your hadoop-site.xml
file.
So you can get k varied from k = n, 2*nm*n instead of k = 1...nm.


> How would I do this? I'm having trouble finding the proper command line
> option in the commands manual (
> http://hadoop.apache.org/core/docs/current/commands_manual.html)
>
>
>
> Thank you very much for you time.
>
> -SM
>



-- 
[EMAIL PROTECTED]
Institute of Computing Technology, Chinese Academy of Sciences, Beijing.


specifying number of nodes for job

2008-09-07 Thread Sandy
Hi,

This may be a silly question, but I'm strangely having trouble finding an
answer for it (perhaps I'm looking in the wrong places?).

Suppose I have a cluster with n nodes each with m processors.

I wish to test the performance of, say,  the wordcount program on k
processors, where k is varied from k = 1 ... nm.

How would I do this? I'm having trouble finding the proper command line
option in the commands manual (
http://hadoop.apache.org/core/docs/current/commands_manual.html)



Thank you very much for you time.

-SM