I don't believe that there are any commons math algorithms that would benefit from execution in a Hadoop map-reduce style. The issue is that iterative algorithms are essentially incompatible with the very large startup costs of map-reduce programs under Hadoop.
Some algorithms can be recast to make use of an all-reduce operator which can be implemented in a map-only job. EM algorithms often have this structure. Otherwise, massive algorithmic change is usually necessary. For instance, partial SVD can be done using a fixed and small number of map-reduce operations by using stochastic projection. Threaded execution, on the other hand, can be very, very helpful for a number of math algorithms and thread management inside commons math is a very reasonable option in those cases. This would provide a performance boost with very little complexity for the user of math. Managing these threads is really pretty simple as well. On Sun, Jul 22, 2012 at 9:27 AM, Phil Steitz <phil.ste...@gmail.com> wrote: > On 7/21/12 6:17 AM, Gilles Sadowski wrote: > > Hi. > > > > My previous post (with subject "Synchronisation") made me think (again) > that > > it might be useful to start considering how to take advantage of > > multi-threading in Commons Math. > > Indeed, it seems that some parts of the library might end up not being > used > > anymore because their performance simply cannot match competing > > implementations that do benefit form parallelization. [The recent example > > that comes to mind is the FFT.] > > This is an interesting question. I am also -1 on adding > dependencies, but it would be a good idea to look at how others have > solved the problem of how to support parallel execution by multiple > threads without managing threads directly. Lots of [math] > algorithms could be parallelized. The question is how to > effectively coordinate the work without owning or creating the > workers. I would be -0 to any suggestion that involved [math] > itself spawning threads, since that 0) creates management headeaches > 1) may violate some container contracts and 2) forces execution > threads to be in the same process. I think it is worth thinking > about how we might support parallel execution by externally managed > workers. An obvious thing to look at is how to break our > parallelizable algorithms into pieces that could be executed in > Hadoop Map/Reduce jobs. Step 0) is the breaking up part. Then step > 1) might be either some examples added to the user guide or custom > Pig functions (or examples of how to code them). > > Phil > > > > > > Best regards, > > Gilles > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org > > For additional commands, e-mail: dev-h...@commons.apache.org > > > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org > For additional commands, e-mail: dev-h...@commons.apache.org > >