+1 to glms
Sent from my Verizon Wireless 4G LTE smartphone -------- Original message -------- From: Trevor Grant <trevor.d.gr...@gmail.com> Date: 02/17/2017 6:56 AM (GMT-08:00) To: dev@mahout.apache.org Subject: Re: Contributing an algorithm for samsara Jim is right, and I would take it one further and say, it would be best to implement GLMs https://en.wikipedia.org/wiki/Generalized_linear_model , from there a Logistic regression is a trivial extension. Buyer beware- GLMs will be a bit of work- doable, but that would be jumping in neck first for both Jim and Saikat... MAHOUT-1928 and MAHOUT-1929 https://issues.apache.org/jira/browse/MAHOUT-1925?jql=project%20%3D%20MAHOUT%20AND%20component%20%3D%20Algorithms%20AND%20resolution%20%3D%20Unresolved%20ORDER%20BY%20due%20ASC%2C%20priority%20DESC%2C%20created%20ASC ^^ currently open JIRAs around Algorithms- you'll see Logistic and GLMs are in there. If you have an algorithm you are particularly intimate with, or explicitly need/want- feel free to open a JIRA and assign to yourself. There is also a case to be made for implementing the ALS... 1) It's a much better 'beginner' project. 2) Mahout has some world class Recommenders, a toy ALS implementation might help us think through how the other reccomenders (e.g. CCO) will 'fit' into the framework. E.g. ALS being the toy-prototype reccomender that helps us think through building out that section of the framework. Trevor Grant Data Scientist https://github.com/rawkintrevo http://stackexchange.com/users/3002022/rawkintrevo http://trevorgrant.org *"Fortunate is he, who is able to know the causes of things." -Virgil* On Fri, Feb 17, 2017 at 7:59 AM, Jim Jagielski <j...@jagunet.com> wrote: > My own thoughts are that logistic regression seems a more "generalized" > and hence more useful algo to be factored in... At least in the > use cases that I've been toying with. > > So I'd like to help out with that if wanted... > > > On Feb 9, 2017, at 3:59 PM, Saikat Kanjilal <sxk1...@hotmail.com> wrote: > > > > Trevor et al, > > > > I'd like to contribute an algorithm or two in samsara using spark as I > would like to do a compare and contrast with mahout with R server for a > data science pipeline, machine learning repo that I'm working on, in > looking at the list of algorithms (https://mahout.apache.org/ > users/basics/algorithms.html) is there an algorithm for spark that would > be beneficial for the community, my use cases would typically be around > clustering or real time machine learning for building recommendations on > the fly. The algorithms I see that could potentially be useful are: 1) > Matrix Factorization with ALS 2) Logistic regression with SVD. > > > > Apache Mahout: Scalable machine learning and data mining< > https://mahout.apache.org/users/basics/algorithms.html> > > mahout.apache.org > > Mahout 0.12.0 Features by EngineĀ¶ Single Machine MapReduce Spark H2O > Flink; Mahout Math-Scala Core Library and Scala DSL > > > > > > > > Any thoughts/guidance or recommendations would be very helpful. > > Thanks in advance. > >