Re: specifying number of nodes for job
On Mon, Sep 8, 2008 at 4:26 PM, Sandy [EMAIL PROTECTED] wrote: In all seriousness though, why is this not possible? Is there something about the MapReduce model of parallel computation that I am not understanding? Or this more of an arbitrary implementation choice made by the Hadoop framework? If so, I am curious why this is the case. What are the benefits? It is possible to do with changes to Hadoop. There was a jira filed for it, but I don't think anyone has worked on it. (HADOOP-2573) For Map/Reduce it is a design goal that number of tasks not nodes are the important metric. You want a job to be able to run with any given cluster size. For scalability testing, you could just remove task trackers... -- Owen
Re: specifying number of nodes for job
yeah. snickerdoodle. really. I see.. so if I have a cluster with n nodes, there is no way for me to have it spawn on just 2 of those nodes, or just one of those nodes? And furthermore, there is no way for me to have it spawn on just a subset of the processors? Or am I misunderstanding? Also, when you say specify the number of tasks for each node are you referring to specifying the number of mappers and reducers I can spawn on each node? -SM On Sun, Sep 7, 2008 at 8:29 PM, Mafish Liu [EMAIL PROTECTED] wrote: On Mon, Sep 8, 2008 at 2:25 AM, Sandy [EMAIL PROTECTED] wrote: Hi, This may be a silly question, but I'm strangely having trouble finding an answer for it (perhaps I'm looking in the wrong places?). Suppose I have a cluster with n nodes each with m processors. I wish to test the performance of, say, the wordcount program on k processors, where k is varied from k = 1 ... nm. You can specify the number of tasks for each node in your hadoop-site.xml file. So you can get k varied from k = n, 2*nm*n instead of k = 1...nm. How would I do this? I'm having trouble finding the proper command line option in the commands manual ( http://hadoop.apache.org/core/docs/current/commands_manual.html) Thank you very much for you time. -SM -- [EMAIL PROTECTED] Institute of Computing Technology, Chinese Academy of Sciences, Beijing.
Re: specifying number of nodes for job
-smiles- It's not nice to poke fun at people's e-mail aliases... and snickerdoodles are delicious cookies. In all seriousness though, why is this not possible? Is there something about the MapReduce model of parallel computation that I am not understanding? Or this more of an arbitrary implementation choice made by the Hadoop framework? If so, I am curious why this is the case. What are the benefits? What I'm talking about is not uncommon for scalability studies. Is being able to specify the number of processors considered a desirable feature by developers? Just curious, of course. Regards, -SM On Mon, Sep 8, 2008 at 3:36 PM, [EMAIL PROTECTED] wrote: yeah. snickerdoodle. really. I see.. so if I have a cluster with n nodes, there is no way for me to have it spawn on just 2 of those nodes, or just one of those nodes? And furthermore, there is no way for me to have it spawn on just a subset of the processors? Or am I misunderstanding? Also, when you say specify the number of tasks for each node are you referring to specifying the number of mappers and reducers I can spawn on each node? -SM On Sun, Sep 7, 2008 at 8:29 PM, Mafish Liu [EMAIL PROTECTED] wrote: On Mon, Sep 8, 2008 at 2:25 AM, Sandy [EMAIL PROTECTED] wrote: Hi, This may be a silly question, but I'm strangely having trouble finding an answer for it (perhaps I'm looking in the wrong places?). Suppose I have a cluster with n nodes each with m processors. I wish to test the performance of, say, the wordcount program on k processors, where k is varied from k = 1 ... nm. You can specify the number of tasks for each node in your hadoop-site.xml file. So you can get k varied from k = n, 2*nm*n instead of k = 1...nm. How would I do this? I'm having trouble finding the proper command line option in the commands manual ( http://hadoop.apache.org/core/docs/current/commands_manual.html) Thank you very much for you time. -SM -- [EMAIL PROTECTED] Institute of Computing Technology, Chinese Academy of Sciences, Beijing.
Re: specifying number of nodes for job
On Mon, Sep 8, 2008 at 2:25 AM, Sandy [EMAIL PROTECTED] wrote: Hi, This may be a silly question, but I'm strangely having trouble finding an answer for it (perhaps I'm looking in the wrong places?). Suppose I have a cluster with n nodes each with m processors. I wish to test the performance of, say, the wordcount program on k processors, where k is varied from k = 1 ... nm. You can specify the number of tasks for each node in your hadoop-site.xml file. So you can get k varied from k = n, 2*nm*n instead of k = 1...nm. How would I do this? I'm having trouble finding the proper command line option in the commands manual ( http://hadoop.apache.org/core/docs/current/commands_manual.html) Thank you very much for you time. -SM -- [EMAIL PROTECTED] Institute of Computing Technology, Chinese Academy of Sciences, Beijing.