Re: specifying number of nodes for job
Thank you very much for you response :) -Suzanne On Tue, Sep 9, 2008 at 12:13 PM, Owen O'Malley <[EMAIL PROTECTED]> wrote: > On Mon, Sep 8, 2008 at 4:26 PM, Sandy <[EMAIL PROTECTED]> wrote: > > > In all seriousness though, why is this not possible? Is there something > > about the MapReduce model of parallel computation that I am not > > understanding? Or this more of an arbitrary implementation choice made by > > the Hadoop framework? If so, I am curious why this is the case. What are > > the > > benefits? > > > It is possible to do with changes to Hadoop. There was a jira filed for it, > but I don't think anyone has worked on it. (HADOOP-2573) For Map/Reduce it > is a design goal that number of tasks not nodes are the important metric. > You want a job to be able to run with any given cluster size. For > scalability testing, you could just remove task trackers... > > -- Owen >
Re: specifying number of nodes for job
On Mon, Sep 8, 2008 at 4:26 PM, Sandy <[EMAIL PROTECTED]> wrote: > In all seriousness though, why is this not possible? Is there something > about the MapReduce model of parallel computation that I am not > understanding? Or this more of an arbitrary implementation choice made by > the Hadoop framework? If so, I am curious why this is the case. What are > the > benefits? It is possible to do with changes to Hadoop. There was a jira filed for it, but I don't think anyone has worked on it. (HADOOP-2573) For Map/Reduce it is a design goal that number of tasks not nodes are the important metric. You want a job to be able to run with any given cluster size. For scalability testing, you could just remove task trackers... -- Owen
Re: specifying number of nodes for job
-smiles- It's not nice to poke fun at people's e-mail aliases... and snickerdoodles are delicious cookies. In all seriousness though, why is this not possible? Is there something about the MapReduce model of parallel computation that I am not understanding? Or this more of an arbitrary implementation choice made by the Hadoop framework? If so, I am curious why this is the case. What are the benefits? What I'm talking about is not uncommon for scalability studies. Is being able to specify the number of processors considered a desirable feature by developers? Just curious, of course. Regards, -SM On Mon, Sep 8, 2008 at 3:36 PM, <[EMAIL PROTECTED]> wrote: > yeah. snickerdoodle. really. > > > > I see.. so if I have a cluster with n nodes, there is no way for me to > > have > > it spawn on just 2 of those nodes, or just one of those nodes? And > > furthermore, there is no way for me to have it spawn on just a subset of > > the > > processors? Or am I misunderstanding? > > > > Also, when you say "specify the number of tasks for each node" are you > > referring to specifying the number of mappers and reducers I can spawn on > > each node? > > > > -SM > > > > On Sun, Sep 7, 2008 at 8:29 PM, Mafish Liu <[EMAIL PROTECTED]> wrote: > > > >> On Mon, Sep 8, 2008 at 2:25 AM, Sandy <[EMAIL PROTECTED]> > wrote: > >> > >> > Hi, > >> > > >> > This may be a silly question, but I'm strangely having trouble finding > >> an > >> > answer for it (perhaps I'm looking in the wrong places?). > >> > > >> > Suppose I have a cluster with n nodes each with m processors. > >> > > >> > I wish to test the performance of, say, the wordcount program on k > >> > processors, where k is varied from k = 1 ... nm. > >> > >> > >> You can specify the number of tasks for each node in your > >> hadoop-site.xml > >> file. > >> So you can get k varied from k = n, 2*nm*n instead of k = 1...nm. > >> > >> > >> > How would I do this? I'm having trouble finding the proper command > >> line > >> > option in the commands manual ( > >> > http://hadoop.apache.org/core/docs/current/commands_manual.html) > >> > > >> > > >> > > >> > Thank you very much for you time. > >> > > >> > -SM > >> > > >> > >> > >> > >> -- > >> [EMAIL PROTECTED] > >> Institute of Computing Technology, Chinese Academy of Sciences, Beijing. > >> > > > > >
Re: specifying number of nodes for job
yeah. snickerdoodle. really. > I see.. so if I have a cluster with n nodes, there is no way for me to > have > it spawn on just 2 of those nodes, or just one of those nodes? And > furthermore, there is no way for me to have it spawn on just a subset of > the > processors? Or am I misunderstanding? > > Also, when you say "specify the number of tasks for each node" are you > referring to specifying the number of mappers and reducers I can spawn on > each node? > > -SM > > On Sun, Sep 7, 2008 at 8:29 PM, Mafish Liu <[EMAIL PROTECTED]> wrote: > >> On Mon, Sep 8, 2008 at 2:25 AM, Sandy <[EMAIL PROTECTED]> wrote: >> >> > Hi, >> > >> > This may be a silly question, but I'm strangely having trouble finding >> an >> > answer for it (perhaps I'm looking in the wrong places?). >> > >> > Suppose I have a cluster with n nodes each with m processors. >> > >> > I wish to test the performance of, say, the wordcount program on k >> > processors, where k is varied from k = 1 ... nm. >> >> >> You can specify the number of tasks for each node in your >> hadoop-site.xml >> file. >> So you can get k varied from k = n, 2*nm*n instead of k = 1...nm. >> >> >> > How would I do this? I'm having trouble finding the proper command >> line >> > option in the commands manual ( >> > http://hadoop.apache.org/core/docs/current/commands_manual.html) >> > >> > >> > >> > Thank you very much for you time. >> > >> > -SM >> > >> >> >> >> -- >> [EMAIL PROTECTED] >> Institute of Computing Technology, Chinese Academy of Sciences, Beijing. >> >
Re: specifying number of nodes for job
I see.. so if I have a cluster with n nodes, there is no way for me to have it spawn on just 2 of those nodes, or just one of those nodes? And furthermore, there is no way for me to have it spawn on just a subset of the processors? Or am I misunderstanding? Also, when you say "specify the number of tasks for each node" are you referring to specifying the number of mappers and reducers I can spawn on each node? -SM On Sun, Sep 7, 2008 at 8:29 PM, Mafish Liu <[EMAIL PROTECTED]> wrote: > On Mon, Sep 8, 2008 at 2:25 AM, Sandy <[EMAIL PROTECTED]> wrote: > > > Hi, > > > > This may be a silly question, but I'm strangely having trouble finding an > > answer for it (perhaps I'm looking in the wrong places?). > > > > Suppose I have a cluster with n nodes each with m processors. > > > > I wish to test the performance of, say, the wordcount program on k > > processors, where k is varied from k = 1 ... nm. > > > You can specify the number of tasks for each node in your hadoop-site.xml > file. > So you can get k varied from k = n, 2*nm*n instead of k = 1...nm. > > > > How would I do this? I'm having trouble finding the proper command line > > option in the commands manual ( > > http://hadoop.apache.org/core/docs/current/commands_manual.html) > > > > > > > > Thank you very much for you time. > > > > -SM > > > > > > -- > [EMAIL PROTECTED] > Institute of Computing Technology, Chinese Academy of Sciences, Beijing. >
Re: specifying number of nodes for job
On Mon, Sep 8, 2008 at 2:25 AM, Sandy <[EMAIL PROTECTED]> wrote: > Hi, > > This may be a silly question, but I'm strangely having trouble finding an > answer for it (perhaps I'm looking in the wrong places?). > > Suppose I have a cluster with n nodes each with m processors. > > I wish to test the performance of, say, the wordcount program on k > processors, where k is varied from k = 1 ... nm. You can specify the number of tasks for each node in your hadoop-site.xml file. So you can get k varied from k = n, 2*nm*n instead of k = 1...nm. > How would I do this? I'm having trouble finding the proper command line > option in the commands manual ( > http://hadoop.apache.org/core/docs/current/commands_manual.html) > > > > Thank you very much for you time. > > -SM > -- [EMAIL PROTECTED] Institute of Computing Technology, Chinese Academy of Sciences, Beijing.
specifying number of nodes for job
Hi, This may be a silly question, but I'm strangely having trouble finding an answer for it (perhaps I'm looking in the wrong places?). Suppose I have a cluster with n nodes each with m processors. I wish to test the performance of, say, the wordcount program on k processors, where k is varied from k = 1 ... nm. How would I do this? I'm having trouble finding the proper command line option in the commands manual ( http://hadoop.apache.org/core/docs/current/commands_manual.html) Thank you very much for you time. -SM