Re: setNumTasks
If you want to control the number of input splits at fine granularity, you could customize the NLineInputFormat. You need to determine the number of lines per each split. Thus you need to know before is the number of lines in your input data, for instance, using hadoop -text /input/dir/* | wc -l will give you a number, lets assume it is N If you have K number of nodes, each nodes has C number of core, basically you could start K*C number of mapper jobs. And you want to further assume each mapper process 2 splits (in case that some jobs are finished earlier), therefore the optimal number of lines in NLineInputFormat is around N/(2*K*C) Thus might give you an optimal job balance. Remember, the NLineInputFormat usually takes longer time than other input format to initialize, and the line split only concerns about number of lines, but is unaware about the content length per each line. Thus, in sequence data analysis is some lines are significantly longer than other lines, the mapper assigned with longer lines will be much slower than those assigned with smaller lines. So randomly mixing short and long lines before split is more preferable. Shi On 3/22/2012 10:01 AM, Bejoy Ks wrote: Hi Mohit The number of map tasks is determined by your number of input splits and the Input Format used by your MR job. Setting this value won't help you control the same. AFAIK it would get effective if the value in mapred.map.tasks is greater than the no of tasks calculated by the Job based on the splits and Input Format. Regards Bejoy KS On Thu, Mar 22, 2012 at 8:28 PM, Mohit Anchliawrote: Sorry I meant *setNumMapTasks. *What is mapred.map.tasks for? It's confusing as to what it's purpose is for? I tried setting it for my job still I see more map tasks running than *mapred.map.tasks* On Thu, Mar 22, 2012 at 7:53 AM, Harsh J wrote: There isn't such an API as "setNumTasks". There is however, "setNumReduceTasks", which sets "mapred.reduce.tasks". Does this answer your question? On Thu, Mar 22, 2012 at 8:21 PM, Mohit Anchlia wrote: Could someone please help me answer this question? On Wed, Mar 14, 2012 at 8:06 AM, Mohit Anchlia What is the corresponding system property for setNumTasks? Can it be used explicitly as system property like "mapred.tasks."? -- Harsh J
Re: setNumTasks
Hi Mohit The number of map tasks is determined by your number of input splits and the Input Format used by your MR job. Setting this value won't help you control the same. AFAIK it would get effective if the value in mapred.map.tasks is greater than the no of tasks calculated by the Job based on the splits and Input Format. Regards Bejoy KS On Thu, Mar 22, 2012 at 8:28 PM, Mohit Anchlia wrote: > Sorry I meant *setNumMapTasks. *What is mapred.map.tasks for? It's > confusing as to what it's purpose is for? I tried setting it for my job > still I see more map tasks running than *mapred.map.tasks* > > On Thu, Mar 22, 2012 at 7:53 AM, Harsh J wrote: > > > There isn't such an API as "setNumTasks". There is however, > > "setNumReduceTasks", which sets "mapred.reduce.tasks". > > > > Does this answer your question? > > > > On Thu, Mar 22, 2012 at 8:21 PM, Mohit Anchlia > > wrote: > > > Could someone please help me answer this question? > > > > > > On Wed, Mar 14, 2012 at 8:06 AM, Mohit Anchlia > >wrote: > > > > > >> What is the corresponding system property for setNumTasks? Can it be > > used > > >> explicitly as system property like "mapred.tasks."? > > > > > > > > -- > > Harsh J > > >
Re: setNumTasks
Sorry I meant *setNumMapTasks. *What is mapred.map.tasks for? It's confusing as to what it's purpose is for? I tried setting it for my job still I see more map tasks running than *mapred.map.tasks* On Thu, Mar 22, 2012 at 7:53 AM, Harsh J wrote: > There isn't such an API as "setNumTasks". There is however, > "setNumReduceTasks", which sets "mapred.reduce.tasks". > > Does this answer your question? > > On Thu, Mar 22, 2012 at 8:21 PM, Mohit Anchlia > wrote: > > Could someone please help me answer this question? > > > > On Wed, Mar 14, 2012 at 8:06 AM, Mohit Anchlia >wrote: > > > >> What is the corresponding system property for setNumTasks? Can it be > used > >> explicitly as system property like "mapred.tasks."? > > > > -- > Harsh J >
Re: setNumTasks
There isn't such an API as "setNumTasks". There is however, "setNumReduceTasks", which sets "mapred.reduce.tasks". Does this answer your question? On Thu, Mar 22, 2012 at 8:21 PM, Mohit Anchlia wrote: > Could someone please help me answer this question? > > On Wed, Mar 14, 2012 at 8:06 AM, Mohit Anchlia wrote: > >> What is the corresponding system property for setNumTasks? Can it be used >> explicitly as system property like "mapred.tasks."? -- Harsh J
Re: setNumTasks
Could someone please help me answer this question? On Wed, Mar 14, 2012 at 8:06 AM, Mohit Anchlia wrote: > What is the corresponding system property for setNumTasks? Can it be used > explicitly as system property like "mapred.tasks."?