To start this off I figure we should spend some time understanding the current 
implementations and theory before we dig deep into implementing this in mahout:


1) 
https://bugra.github.io/work/notes/2014-04-19/alternating-least-squares-method-for-collaborative-filtering/

Alternating Least Squares Method for Collaborative 
...<https://bugra.github.io/work/notes/2014-04-19/alternating-least-squares-method-for-collaborative-filtering/>
bugra.github.io
Alternating Least Square Formulation for Recommender Systems¶ We have users $u$ 
for items $i$ matrix as in the following: $$ Q_{ui} = \cases{ r & \text{if user 
u ...


2) 
https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/mllib/recommendation/ALS.scala


[https://avatars1.githubusercontent.com/u/47359?v=3&s=400]<https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/mllib/recommendation/ALS.scala>

spark/ALS.scala at master · apache/spark · 
GitHub<https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/mllib/recommendation/ALS.scala>
github.com
spark - Mirror of Apache Spark ... * Licensed to the Apache Software Foundation 
(ASF) under one or more * contributor license agreements.


3) 
https://github.com/apache/mahout/blob/master/math-scala/src/main/scala/org/apache/mahout/math/decompositions/ALS.scala
mahout/ALS.scala at master · apache/mahout · 
GitHub<https://github.com/apache/mahout/blob/master/math-scala/src/main/scala/org/apache/mahout/math/decompositions/ALS.scala>
github.com
mahout - Mirror of Apache Mahout


4) https://datasciencemadesimpler.wordpress.com/tag/alternating-least-squares/
Alternating Least Squares – Data Science Made 
Simpler<https://datasciencemadesimpler.wordpress.com/tag/alternating-least-squares/>
datasciencemadesimpler.wordpress.com
Collaborative Filtering. Collaborative Filtering (CF) is a method of making 
automatic predictions about the interests of a user by learning its preferences 
(or taste ...




Jim I would suggest we spend some time researching and digging into these 
resources and circle back next week to get this off the ground, let me know if 
you want to meet offline as well, I would recommend the next steps is a design 
proposal to the dev list of how the implementation will fit into the current 
samsara algorithms, what do you think?

Regards

________________________________
From: Jim Jagielski <j...@jagunet.com>
Sent: Friday, February 17, 2017 8:18 AM
To: dev@mahout.apache.org
Subject: Re: Contributing an algorithm for samsara

Sounds good to me. +1

> On Feb 17, 2017, at 11:15 AM, Saikat Kanjilal <sxk1...@hotmail.com> wrote:
>
> Jim,
> What do you say we start with ALS and then tackle glm?
>
>
> Sent from my iPhone
>
>> On Feb 17, 2017, at 6:56 AM, Trevor Grant <trevor.d.gr...@gmail.com> wrote:
>>
>> Jim is right, and I would take it one further and say, it would be best to
>> implement GLMs https://en.wikipedia.org/wiki/Generalized_linear_model ,
[http://upload.wikimedia.org/wikipedia/commons/thumb/3/37/Biologist_and_statistician_Ronald_Fisher.jpg/200px-Biologist_and_statistician_Ronald_Fisher.jpg]<https://en.wikipedia.org/wiki/Generalized_linear_model>

Generalized linear model - 
Wikipedia<https://en.wikipedia.org/wiki/Generalized_linear_model>
en.wikipedia.org
Part of a series on Statistics: Regression analysis; Models; Linear regression; 
Simple regression; Ordinary least squares; Polynomial regression; General 
linear model



>> from there a Logistic regression is a trivial extension.
>>
>> Buyer beware- GLMs will be a bit of work- doable, but that would be jumping
>> in neck first for both Jim and Saikat...
>>
>> MAHOUT-1928 and MAHOUT-1929
>>
>> https://issues.apache.org/jira/browse/MAHOUT-1925?jql=project%20%3D%20MAHOUT%20AND%20component%20%3D%20Algorithms%20AND%20resolution%20%3D%20Unresolved%20ORDER%20BY%20due%20ASC%2C%20priority%20DESC%2C%20created%20ASC
>>
>> ^^ currently open JIRAs around Algorithms- you'll see Logistic and GLMs are
>> in there.
>>
>> If you have an algorithm you are particularly intimate with, or explicitly
>> need/want- feel free to open a JIRA and assign to yourself.
>>
>> There is also a case to be made for implementing the ALS...
>>
>> 1) It's a much better 'beginner' project.
>> 2) Mahout has some world class Recommenders, a toy ALS implementation might
>> help us think through how the other reccomenders (e.g. CCO) will 'fit' into
>> the framework. E.g. ALS being the toy-prototype reccomender that helps us
>> think through building out that section of the framework.
>>
>>
>>
>> Trevor Grant
>> Data Scientist
>> https://github.com/rawkintrevo
[https://avatars3.githubusercontent.com/u/5852441?v=3&s=400]<https://github.com/rawkintrevo>

rawkintrevo (Trevor Grant) · GitHub<https://github.com/rawkintrevo>
github.com
rawkintrevo has 22 repositories available. Follow their code on GitHub.



>> http://stackexchange.com/users/3002022/rawkintrevo
User rawkintrevo - Stack 
Exchange<http://stackexchange.com/users/3002022/rawkintrevo>
stackexchange.com
Fortuna Audaces Iuvat ~Chance Favors the Bold. top accounts reputation activity 
favorites subscriptions. Top Questions



>> http://trevorgrant.org
[https://s0.wp.com/i/blank.jpg]<http://trevorgrant.org/>

The musings of rawkintrevo<http://trevorgrant.org/>
trevorgrant.org
Hot-rodder, opera enthusiast, mad data scientist; a man for all seasons.



>>
>> *"Fortunate is he, who is able to know the causes of things."  -Virgil*
>>
>>
>>> On Fri, Feb 17, 2017 at 7:59 AM, Jim Jagielski <j...@jagunet.com> wrote:
>>>
>>> My own thoughts are that logistic regression seems a more "generalized"
>>> and hence more useful algo to be factored in... At least in the
>>> use cases that I've been toying with.
>>>
>>> So I'd like to help out with that if wanted...
>>>
>>>> On Feb 9, 2017, at 3:59 PM, Saikat Kanjilal <sxk1...@hotmail.com> wrote:
>>>>
>>>> Trevor et al,
>>>>
>>>> I'd like to contribute an algorithm or two in samsara using spark as I
>>> would like to do a compare and contrast with mahout with R server for a
>>> data science pipeline, machine learning repo that I'm working on, in
>>> looking at the list of algorithms (https://mahout.apache.org/
>>> users/basics/algorithms.html) is there an algorithm for spark that would
>>> be beneficial for the community, my use cases would typically be around
>>> clustering or real time machine learning for building recommendations on
>>> the fly.    The algorithms I see that could potentially be useful are: 1)
>>> Matrix Factorization with ALS 2) Logistic regression with SVD.
>>>>
>>>> Apache Mahout: Scalable machine learning and data mining<
>>> https://mahout.apache.org/users/basics/algorithms.html>
>>>> mahout.apache.org
>>>> Mahout 0.12.0 Features by Engine¶ Single Machine MapReduce Spark H2O
>>> Flink; Mahout Math-Scala Core Library and Scala DSL
>>>>
>>>>
>>>>
>>>> Any thoughts/guidance or recommendations would be very helpful.
>>>> Thanks in advance.
>>>
>>>

Reply via email to