On Sun, Jul 22, 2012 at 12:01:01PM -0700, Ted Dunning wrote:
> I don't believe that there are any commons math algorithms that would
> benefit from execution in a Hadoop map-reduce style.  The issue is that
> iterative algorithms are essentially incompatible with the very large
> startup costs of map-reduce programs under Hadoop.
> 
> Some algorithms can be recast to make use of an all-reduce operator which
> can be implemented in a map-only job.  EM algorithms often have this
> structure.
> 
> Otherwise, massive algorithmic change is usually necessary.  For instance,
> partial SVD can be done using a fixed and small number of map-reduce
> operations by using stochastic projection.
> 
> Threaded execution, on the other hand, can be very, very helpful for a
> number of math algorithms and thread management inside commons math is a
> very reasonable option in those cases.  This would provide a performance
> boost with very little complexity for the user of math.  Managing these
> threads is really pretty simple as well.

I agree. I.e. let's make a list of the algorithms that would certainly
benefit from parallelization, and for which the parallelization would be
pretty simple (the devilish details notwithstanding...).

Suggestions, in order of simplicity, welcome.


Gilles

> 
> 
> 
> On Sun, Jul 22, 2012 at 9:27 AM, Phil Steitz <phil.ste...@gmail.com> wrote:
> 
> > On 7/21/12 6:17 AM, Gilles Sadowski wrote:
> > > Hi.
> > >
> > > My previous post (with subject "Synchronisation") made me think (again)
> > that
> > > it might be useful to start considering how to take advantage of
> > > multi-threading in Commons Math.
> > > Indeed, it seems that some parts of the library might end up not being
> > used
> > > anymore because their performance simply cannot match competing
> > > implementations that do benefit form parallelization. [The recent example
> > > that comes to mind is the FFT.]
> >
> > This is an interesting question.  I am also -1 on adding
> > dependencies, but it would be a good idea to look at how others have
> > solved the problem of how to support parallel execution by multiple
> > threads without managing threads directly.  Lots of [math]
> > algorithms could be parallelized.  The question is how to
> > effectively coordinate the work without owning or creating the
> > workers.  I would be -0 to any suggestion that involved [math]
> > itself spawning threads, since that 0) creates management headeaches
> > 1) may violate some container contracts and 2) forces execution
> > threads to be in the same process.  I think it is worth thinking
> > about how we might support parallel execution by externally managed
> > workers.  An obvious thing to look at is how to break our
> > parallelizable algorithms into pieces that could be executed in
> > Hadoop Map/Reduce jobs.  Step 0) is the breaking up part.  Then step
> > 1) might be either some examples added to the user guide or custom
> > Pig functions (or examples of how to code them).
> >
> > Phil
> > >
> > >
> > > Best regards,
> > > Gilles
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
> > > For additional commands, e-mail: dev-h...@commons.apache.org
> > >
> > >
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
> > For additional commands, e-mail: dev-h...@commons.apache.org
> >
> >

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

Reply via email to