right- sorry i was looking at Weighted Matrix factorization. I meant added "Matrix Factorization with ALS on Implicit Feedback" as "in progress"
> Date: Mon, 25 Aug 2014 15:27:39 -0700 > Subject: Re: Features by engine page > From: [email protected] > To: [email protected] > > On Mon, Aug 25, 2014 at 3:23 PM, Andrew Palumbo <[email protected]> wrote: > > > Thanks Dmitriy, > > > > I've added in SSVD, PCA, QR and Weighted ALS. > > > I think it is called "regularized ALS" > > > > To keep it simple, I'll leave them under Spark for right now. (and add > > "in development" for h2o) since they're in and passing tests. > > > > Should I add: > > > > no > > > > > GP-EI > > BFGS > > > > as "in development" > > > > bigram co-occurrence (would this be collocations?) > > > > as "in development" for spark? > > > > > > > > > > > Date: Mon, 25 Aug 2014 14:40:57 -0700 > > > Subject: Re: Features by engine page > > > From: [email protected] > > > To: [email protected] > > > > > > yes SSVD and stochastic PCA as well as thin QR are re-cast in Mahout > > > algebra (meaning they are engine-independent, not just spark). > > > > > > So is regularized ALS (albeit perhaps somewhat naive and thus affecting > > > performance). > > > > > > I also had quasi algebraic implicit feedback ALS (which is in fact > > implicit > > > feedback paper and ALS-WR in the same bottle) but closed the issue due to > > > lack of reviews and interest. > > > > > > Internally I also have framework for doing hyper parameter searches and > > > right now am closing on GP-EI which will probably benefit from some > > > additions doing estimates chosen by reducing uncertainty (attempts to get > > > out of local minimum projected by GP-EI Snoek's algorithm itself). I > > hope i > > > could open it one day. This work is obviously also interesting in that it > > > establishes probabilistic framework in Mahout (distributions & gaussian > > > process). GP stuff can be also used to evaluate things like RLFM i > > think. > > > > > > I also have framework to do line search type of things, including big > > > datasets, per Nosedal and Wright, incluging BFGS, those are probably also > > > candidates for contribution. Or not, depending on the moods of my new > > boss. > > > > > > Of other interesting things that are done with DSL and may be considered > > > for contribution, I also have implementations for bigram co-occurrence > > > (both directed and undirected) made in the DSL but it is also > > > quasi-algebraic i think (meaning there are Spark-specific parts). This is > > > (I think) would also include truethful implementation of Surprise & > > > Coincidence's paper bigram problem (currently implemented in Mahout MR) > > but > > > also would estimate undirected co-occurrences (as a frequent itemsets > > > problem solver/replacement). Again, hopeful it may be contributed, but > > not > > > sure if i'll pursue that if there's lack of interest in my company. It's > > > hard to go against the wind, in a way. > > > > > > By far the most often missing piece is data prep of course, but i think i > > > can eventually contribute a couple tutorials of how to do vectorization > > > using SparkQL stuff. > > > > > > > > > > > > -d > > > > > > > > > > > > > > > On Mon, Aug 25, 2014 at 2:19 PM, Pat Ferrel <[email protected]> > > wrote: > > > > > > > Spark RSJ, MAHOUT-1604 is in development > > > > > > > > I thought SSVD with PCA was working on Spark. > > > > > > > > > > > > On Aug 25, 2014, at 2:15 PM, Dmitriy Lyubimov <[email protected]> > > wrote: > > > > > > > > this is super-cool to hear. > > > > > > > > > > > > On Mon, Aug 25, 2014 at 1:53 PM, Till Rohrmann <[email protected]> > > > > wrote: > > > > > > > > > Hi Andrew, > > > > > > > > > > I like the overview of the different algorithms. The Flink bindings > > are > > > > > still under development. We hope to finish them in the next couple of > > > > > weeks. > > > > > > > > > > Best regards, > > > > > > > > > > Till > > > > > > > > > > > > > > > On Mon, Aug 25, 2014 at 9:17 PM, Andrew Palumbo <[email protected]> > > > > > wrote: > > > > > > > > > >> I created a "Features by Engine" table from the Mahout "List of > > > > >> Algorithms" page which I'd like to add to the Mahout site once it > > looks > > > > >> good: > > > > >> > > > > >> https://andrewpalumbo.github.io/algorithms_by_engine > > > > >> > > > > >> I just copied over the current page, and added in some of the stuff > > that > > > > > i > > > > >> know is complete/in the works. I wasn't sure about some of the > > > > >> Collaborative filtering stuff. > > > > >> > > > > >> Maybe the whole thing needs to be organized differently? A seperate > > > > >> totally abstract section for algorithms that will be sitting in > > > > > math-scala > > > > >> and then a section for each engine's implementation? > > > > >> > > > > >> Also I know that there's been some work done on Flink bindings, but > > I > > > > >> don't see a specific Jira. Should I put Filnk down as "In > > development"? > > > > >> > > > > >> Any thoughts are appreciated. > > > > >> > > > > >> > > > > >> > > > > > > > > > > > > > > > > >
