Thanks for the insight Dimitri. I will look further into spark to understand how it handles parallelization and distributed processing.
> On May 2, 2016, at 12:39 PM, Dmitriy Lyubimov <dlie...@gmail.com> wrote: > > by probabilistic algorithms i mostly mean inference involving monte carlo > type mechanisms (Gibbs sampling LDA which i think might still be part of > our MR collection might be an example, as well as its faster counterpart, > variational Bayes inference. > > the parallelization strategies are are just standard spark mechanisms (in > case of spark), mostly are using their standard hash samplers (which are in > math speak are uniform multinomial samplers really). > > On Mon, May 2, 2016 at 9:25 AM, Khurrum Nasim <khurrum.na...@useitc.com> > wrote: > >> Hey Dimitri - >> >> Yes I meant probabilistic algorithms. If mahout doesn’t use probabilistic >> algos then how does it accomplish a degree of optimal parallelization ? >> Wouldn’t you need randomization to spread out the processing of tasks. >> >>> On May 2, 2016, at 12:13 PM, Dmitriy Lyubimov <dlie...@gmail.com> wrote: >>> >>> yes mahout has stochastic svd and pca which are described at length in >> the >>> samsara book. The book examples in Andrew Palumbo's github also contain >> an >>> example of computing k-means|| sketch. >>> >>> if you mean _probabilistic_ algorithms, although i have done some things >>> outside the public domain, nothing has been contributed. >>> >>> You are very welcome to try something if you don't have big constraints >> on >>> oss contribution. >>> >>> -d >>> >>> On Mon, May 2, 2016 at 7:49 AM, Khurrum Nasim <khurrum.na...@useitc.com> >>> wrote: >>> >>>> Hey All, >>>> >>>> I’d like to know if Mahout uses any randomized algorithms. I’m >> thinking >>>> it probably does. Can somebody point me to the packages that utilized >>>> randomized algos. >>>> >>>> Thanks, >>>> >>>> Khurrum >>>> >>>> >> >>