To start this off I figure we should spend some time understanding the current implementations and theory before we dig deep into implementing this in mahout:
1) https://bugra.github.io/work/notes/2014-04-19/alternating-least-squares-method-for-collaborative-filtering/ Alternating Least Squares Method for Collaborative ...<https://bugra.github.io/work/notes/2014-04-19/alternating-least-squares-method-for-collaborative-filtering/> bugra.github.io Alternating Least Square Formulation for Recommender Systems¶ We have users $u$ for items $i$ matrix as in the following: $$ Q_{ui} = \cases{ r & \text{if user u ... 2) https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/mllib/recommendation/ALS.scala [https://avatars1.githubusercontent.com/u/47359?v=3&s=400]<https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/mllib/recommendation/ALS.scala> spark/ALS.scala at master · apache/spark · GitHub<https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/mllib/recommendation/ALS.scala> github.com spark - Mirror of Apache Spark ... * Licensed to the Apache Software Foundation (ASF) under one or more * contributor license agreements. 3) https://github.com/apache/mahout/blob/master/math-scala/src/main/scala/org/apache/mahout/math/decompositions/ALS.scala mahout/ALS.scala at master · apache/mahout · GitHub<https://github.com/apache/mahout/blob/master/math-scala/src/main/scala/org/apache/mahout/math/decompositions/ALS.scala> github.com mahout - Mirror of Apache Mahout 4) https://datasciencemadesimpler.wordpress.com/tag/alternating-least-squares/ Alternating Least Squares – Data Science Made Simpler<https://datasciencemadesimpler.wordpress.com/tag/alternating-least-squares/> datasciencemadesimpler.wordpress.com Collaborative Filtering. Collaborative Filtering (CF) is a method of making automatic predictions about the interests of a user by learning its preferences (or taste ... Jim I would suggest we spend some time researching and digging into these resources and circle back next week to get this off the ground, let me know if you want to meet offline as well, I would recommend the next steps is a design proposal to the dev list of how the implementation will fit into the current samsara algorithms, what do you think? Regards ________________________________ From: Jim Jagielski <j...@jagunet.com> Sent: Friday, February 17, 2017 8:18 AM To: dev@mahout.apache.org Subject: Re: Contributing an algorithm for samsara Sounds good to me. +1 > On Feb 17, 2017, at 11:15 AM, Saikat Kanjilal <sxk1...@hotmail.com> wrote: > > Jim, > What do you say we start with ALS and then tackle glm? > > > Sent from my iPhone > >> On Feb 17, 2017, at 6:56 AM, Trevor Grant <trevor.d.gr...@gmail.com> wrote: >> >> Jim is right, and I would take it one further and say, it would be best to >> implement GLMs https://en.wikipedia.org/wiki/Generalized_linear_model , [http://upload.wikimedia.org/wikipedia/commons/thumb/3/37/Biologist_and_statistician_Ronald_Fisher.jpg/200px-Biologist_and_statistician_Ronald_Fisher.jpg]<https://en.wikipedia.org/wiki/Generalized_linear_model> Generalized linear model - Wikipedia<https://en.wikipedia.org/wiki/Generalized_linear_model> en.wikipedia.org Part of a series on Statistics: Regression analysis; Models; Linear regression; Simple regression; Ordinary least squares; Polynomial regression; General linear model >> from there a Logistic regression is a trivial extension. >> >> Buyer beware- GLMs will be a bit of work- doable, but that would be jumping >> in neck first for both Jim and Saikat... >> >> MAHOUT-1928 and MAHOUT-1929 >> >> https://issues.apache.org/jira/browse/MAHOUT-1925?jql=project%20%3D%20MAHOUT%20AND%20component%20%3D%20Algorithms%20AND%20resolution%20%3D%20Unresolved%20ORDER%20BY%20due%20ASC%2C%20priority%20DESC%2C%20created%20ASC >> >> ^^ currently open JIRAs around Algorithms- you'll see Logistic and GLMs are >> in there. >> >> If you have an algorithm you are particularly intimate with, or explicitly >> need/want- feel free to open a JIRA and assign to yourself. >> >> There is also a case to be made for implementing the ALS... >> >> 1) It's a much better 'beginner' project. >> 2) Mahout has some world class Recommenders, a toy ALS implementation might >> help us think through how the other reccomenders (e.g. CCO) will 'fit' into >> the framework. E.g. ALS being the toy-prototype reccomender that helps us >> think through building out that section of the framework. >> >> >> >> Trevor Grant >> Data Scientist >> https://github.com/rawkintrevo [https://avatars3.githubusercontent.com/u/5852441?v=3&s=400]<https://github.com/rawkintrevo> rawkintrevo (Trevor Grant) · GitHub<https://github.com/rawkintrevo> github.com rawkintrevo has 22 repositories available. Follow their code on GitHub. >> http://stackexchange.com/users/3002022/rawkintrevo User rawkintrevo - Stack Exchange<http://stackexchange.com/users/3002022/rawkintrevo> stackexchange.com Fortuna Audaces Iuvat ~Chance Favors the Bold. top accounts reputation activity favorites subscriptions. Top Questions >> http://trevorgrant.org [https://s0.wp.com/i/blank.jpg]<http://trevorgrant.org/> The musings of rawkintrevo<http://trevorgrant.org/> trevorgrant.org Hot-rodder, opera enthusiast, mad data scientist; a man for all seasons. >> >> *"Fortunate is he, who is able to know the causes of things." -Virgil* >> >> >>> On Fri, Feb 17, 2017 at 7:59 AM, Jim Jagielski <j...@jagunet.com> wrote: >>> >>> My own thoughts are that logistic regression seems a more "generalized" >>> and hence more useful algo to be factored in... At least in the >>> use cases that I've been toying with. >>> >>> So I'd like to help out with that if wanted... >>> >>>> On Feb 9, 2017, at 3:59 PM, Saikat Kanjilal <sxk1...@hotmail.com> wrote: >>>> >>>> Trevor et al, >>>> >>>> I'd like to contribute an algorithm or two in samsara using spark as I >>> would like to do a compare and contrast with mahout with R server for a >>> data science pipeline, machine learning repo that I'm working on, in >>> looking at the list of algorithms (https://mahout.apache.org/ >>> users/basics/algorithms.html) is there an algorithm for spark that would >>> be beneficial for the community, my use cases would typically be around >>> clustering or real time machine learning for building recommendations on >>> the fly. The algorithms I see that could potentially be useful are: 1) >>> Matrix Factorization with ALS 2) Logistic regression with SVD. >>>> >>>> Apache Mahout: Scalable machine learning and data mining< >>> https://mahout.apache.org/users/basics/algorithms.html> >>>> mahout.apache.org >>>> Mahout 0.12.0 Features by Engine¶ Single Machine MapReduce Spark H2O >>> Flink; Mahout Math-Scala Core Library and Scala DSL >>>> >>>> >>>> >>>> Any thoughts/guidance or recommendations would be very helpful. >>>> Thanks in advance. >>> >>>