I don't believe that there are any commons math algorithms that would
benefit from execution in a Hadoop map-reduce style.  The issue is that
iterative algorithms are essentially incompatible with the very large
startup costs of map-reduce programs under Hadoop.

Some algorithms can be recast to make use of an all-reduce operator which
can be implemented in a map-only job.  EM algorithms often have this
structure.

Otherwise, massive algorithmic change is usually necessary.  For instance,
partial SVD can be done using a fixed and small number of map-reduce
operations by using stochastic projection.

Threaded execution, on the other hand, can be very, very helpful for a
number of math algorithms and thread management inside commons math is a
very reasonable option in those cases.  This would provide a performance
boost with very little complexity for the user of math.  Managing these
threads is really pretty simple as well.



On Sun, Jul 22, 2012 at 9:27 AM, Phil Steitz <phil.ste...@gmail.com> wrote:

> On 7/21/12 6:17 AM, Gilles Sadowski wrote:
> > Hi.
> >
> > My previous post (with subject "Synchronisation") made me think (again)
> that
> > it might be useful to start considering how to take advantage of
> > multi-threading in Commons Math.
> > Indeed, it seems that some parts of the library might end up not being
> used
> > anymore because their performance simply cannot match competing
> > implementations that do benefit form parallelization. [The recent example
> > that comes to mind is the FFT.]
>
> This is an interesting question.  I am also -1 on adding
> dependencies, but it would be a good idea to look at how others have
> solved the problem of how to support parallel execution by multiple
> threads without managing threads directly.  Lots of [math]
> algorithms could be parallelized.  The question is how to
> effectively coordinate the work without owning or creating the
> workers.  I would be -0 to any suggestion that involved [math]
> itself spawning threads, since that 0) creates management headeaches
> 1) may violate some container contracts and 2) forces execution
> threads to be in the same process.  I think it is worth thinking
> about how we might support parallel execution by externally managed
> workers.  An obvious thing to look at is how to break our
> parallelizable algorithms into pieces that could be executed in
> Hadoop Map/Reduce jobs.  Step 0) is the breaking up part.  Then step
> 1) might be either some examples added to the user guide or custom
> Pig functions (or examples of how to code them).
>
> Phil
> >
> >
> > Best regards,
> > Gilles
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
> > For additional commands, e-mail: dev-h...@commons.apache.org
> >
> >
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
> For additional commands, e-mail: dev-h...@commons.apache.org
>
>

Reply via email to