On Sun, Jul 22, 2012 at 12:01:01PM -0700, Ted Dunning wrote: > I don't believe that there are any commons math algorithms that would > benefit from execution in a Hadoop map-reduce style. The issue is that > iterative algorithms are essentially incompatible with the very large > startup costs of map-reduce programs under Hadoop. > > Some algorithms can be recast to make use of an all-reduce operator which > can be implemented in a map-only job. EM algorithms often have this > structure. > > Otherwise, massive algorithmic change is usually necessary. For instance, > partial SVD can be done using a fixed and small number of map-reduce > operations by using stochastic projection. > > Threaded execution, on the other hand, can be very, very helpful for a > number of math algorithms and thread management inside commons math is a > very reasonable option in those cases. This would provide a performance > boost with very little complexity for the user of math. Managing these > threads is really pretty simple as well.
I agree. I.e. let's make a list of the algorithms that would certainly benefit from parallelization, and for which the parallelization would be pretty simple (the devilish details notwithstanding...). Suggestions, in order of simplicity, welcome. Gilles > > > > On Sun, Jul 22, 2012 at 9:27 AM, Phil Steitz <phil.ste...@gmail.com> wrote: > > > On 7/21/12 6:17 AM, Gilles Sadowski wrote: > > > Hi. > > > > > > My previous post (with subject "Synchronisation") made me think (again) > > that > > > it might be useful to start considering how to take advantage of > > > multi-threading in Commons Math. > > > Indeed, it seems that some parts of the library might end up not being > > used > > > anymore because their performance simply cannot match competing > > > implementations that do benefit form parallelization. [The recent example > > > that comes to mind is the FFT.] > > > > This is an interesting question. I am also -1 on adding > > dependencies, but it would be a good idea to look at how others have > > solved the problem of how to support parallel execution by multiple > > threads without managing threads directly. Lots of [math] > > algorithms could be parallelized. The question is how to > > effectively coordinate the work without owning or creating the > > workers. I would be -0 to any suggestion that involved [math] > > itself spawning threads, since that 0) creates management headeaches > > 1) may violate some container contracts and 2) forces execution > > threads to be in the same process. I think it is worth thinking > > about how we might support parallel execution by externally managed > > workers. An obvious thing to look at is how to break our > > parallelizable algorithms into pieces that could be executed in > > Hadoop Map/Reduce jobs. Step 0) is the breaking up part. Then step > > 1) might be either some examples added to the user guide or custom > > Pig functions (or examples of how to code them). > > > > Phil > > > > > > > > > Best regards, > > > Gilles > > > > > > --------------------------------------------------------------------- > > > To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org > > > For additional commands, e-mail: dev-h...@commons.apache.org > > > > > > > > > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org > > For additional commands, e-mail: dev-h...@commons.apache.org > > > > --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org For additional commands, e-mail: dev-h...@commons.apache.org