Actually, I'm not used any reducer at all, the output of the mappers is 
collected and handled by the main program after the end of the job.

Running the job with 10 map tasks in a 10 instances (c1.medium) cluster takes 
0h 11m 39s 209, speculative execution is on so 12 map tasks have been launched.

running the same job with 5x10 map tasks takes 0h 11m 54s 962, 59 map tasks 
have been launched.

And running the same job again with 5x10 map tasks with job parameter 
mapred.job.reuse.jvm.num.tasks=-1 (no limit how many tasks to run per jvm) 
takes 0h 11m 57s 115 

--- En date de : Sam 18.7.09, Ted Dunning <ted.dunn...@gmail.com> a écrit :

> De: Ted Dunning <ted.dunn...@gmail.com>
> Objet: Re: [jira] Commented: (MAHOUT-140) In-memory mapreduce Random Forests
> À: mahout-dev@lucene.apache.org
> Date: Samedi 18 Juillet 2009, 20h36
> This is interesting.
> 
> Is the reduce trivial here? (if so, then and shuffling
> isn't the problem and
> you may have demonstrated this with your no output
> version)
> 
> WHat happens if you increase the number of maps to 5x the
> number of nodes?
> 
> 
> 
> On Sat, Jul 18, 2009 at 11:11 AM, Deneche A. Hakim (JIRA)
> <j...@apache.org>wrote:
> 
> > It looks like building a single tree in a sequential
> manner is 2x faster
> > than building the same tree with the cluster !!! I
> don't have a lot of
> > experience with clusters, is it normal ??? may be 10
> instances is just too
> > small to get a good speedup, or may be there is a bug
> hiding somewhere (I
> > can hear it walking in the code when the moon...)
> >
> 
> 
> 
> -- 
> Ted Dunning, CTO
> DeepDyve
> 



Reply via email to