yes SSVD and stochastic PCA as well as thin QR are re-cast in Mahout algebra (meaning they are engine-independent, not just spark).
So is regularized ALS (albeit perhaps somewhat naive and thus affecting performance). I also had quasi algebraic implicit feedback ALS (which is in fact implicit feedback paper and ALS-WR in the same bottle) but closed the issue due to lack of reviews and interest. Internally I also have framework for doing hyperparameter searches and right now am closing on GP-EI which will probably benefit from some additions doing estimates chosen by reducing uncertainty (attempts to get out of local minimum projected by GP-EI Snoek's algorithm itself). I hope i could open it one day. This work is obviously also interesting in that it establishes probabilistic framework in Mahout (distributions & gaussian process). GP stuff can be also used to evaluate things like RLFM i think. I also have framework to do line search type of things, including big datasets, per Nosedal and Wright, incluging BFGS, those are probably also candidates for contribution. Or not, depending on the moods of my new boss. Of other interesting things that are done with DSL and may be considered for contribution, I also have implementations for bigram co-occurrence (both directed and undirected) made in the DSL but it is also quasi-algebraic i think (meaning there are Spark-specific parts). This is (I think) would also include truethful implementation of Surprise & Coincidence's paper bigram problem (currently implemented in Mahout MR) but also would estimate undirected co-occurrences (as a frequent itemsets problem solver/replacement). Again, hopeful it may be contributed, but not sure if i'll pursue that if there's lack of interest in my company. It's hard to go against the wind, in a way. By far the most often missing piece is data prep of course, but i think i can eventually contribute a couple tutorials of how to do vectorization using SparkQL stuff. -d On Mon, Aug 25, 2014 at 2:19 PM, Pat Ferrel <[email protected]> wrote: > Spark RSJ, MAHOUT-1604 is in development > > I thought SSVD with PCA was working on Spark. > > > On Aug 25, 2014, at 2:15 PM, Dmitriy Lyubimov <[email protected]> wrote: > > this is super-cool to hear. > > > On Mon, Aug 25, 2014 at 1:53 PM, Till Rohrmann <[email protected]> > wrote: > > > Hi Andrew, > > > > I like the overview of the different algorithms. The Flink bindings are > > still under development. We hope to finish them in the next couple of > > weeks. > > > > Best regards, > > > > Till > > > > > > On Mon, Aug 25, 2014 at 9:17 PM, Andrew Palumbo <[email protected]> > > wrote: > > > >> I created a "Features by Engine" table from the Mahout "List of > >> Algorithms" page which I'd like to add to the Mahout site once it looks > >> good: > >> > >> https://andrewpalumbo.github.io/algorithms_by_engine > >> > >> I just copied over the current page, and added in some of the stuff that > > i > >> know is complete/in the works. I wasn't sure about some of the > >> Collaborative filtering stuff. > >> > >> Maybe the whole thing needs to be organized differently? A seperate > >> totally abstract section for algorithms that will be sitting in > > math-scala > >> and then a section for each engine's implementation? > >> > >> Also I know that there's been some work done on Flink bindings, but I > >> don't see a specific Jira. Should I put Filnk down as "In development"? > >> > >> Any thoughts are appreciated. > >> > >> > >> > > > >
