Thanks for the insight Dimitri.   I will look further into spark to understand 
how it handles parallelization and distributed processing.


> On May 2, 2016, at 12:39 PM, Dmitriy Lyubimov <dlie...@gmail.com> wrote:
> 
> by probabilistic algorithms i mostly mean inference involving monte carlo
> type mechanisms (Gibbs sampling LDA which i think might still be part of
> our MR collection might be an example, as well as its faster counterpart,
> variational Bayes inference.
> 
> the parallelization strategies are are just standard spark mechanisms (in
> case of spark), mostly are using their standard hash samplers (which are in
> math speak are uniform multinomial samplers really).
> 
> On Mon, May 2, 2016 at 9:25 AM, Khurrum Nasim <khurrum.na...@useitc.com>
> wrote:
> 
>> Hey Dimitri -
>> 
>> Yes I meant probabilistic algorithms.  If mahout doesn’t use probabilistic
>> algos then how does it accomplish a degree of optimal parallelization ?
>> Wouldn’t you need randomization to spread out the processing of tasks.
>> 
>>> On May 2, 2016, at 12:13 PM, Dmitriy Lyubimov <dlie...@gmail.com> wrote:
>>> 
>>> yes mahout has stochastic svd and pca which are described at length in
>> the
>>> samsara book. The book examples in Andrew Palumbo's github also contain
>> an
>>> example of computing k-means|| sketch.
>>> 
>>> if you mean _probabilistic_ algorithms, although i have done some things
>>> outside the public domain, nothing has been contributed.
>>> 
>>> You are very welcome to try something if you don't have big constraints
>> on
>>> oss contribution.
>>> 
>>> -d
>>> 
>>> On Mon, May 2, 2016 at 7:49 AM, Khurrum Nasim <khurrum.na...@useitc.com>
>>> wrote:
>>> 
>>>> Hey All,
>>>> 
>>>> I’d like to know if Mahout uses any randomized algorithms.   I’m
>> thinking
>>>> it probably does.  Can somebody point me to the packages that utilized
>>>> randomized algos.
>>>> 
>>>> Thanks,
>>>> 
>>>> Khurrum
>>>> 
>>>> 
>> 
>> 

Reply via email to