On Mon, Aug 25, 2014 at 3:23 PM, Andrew Palumbo <[email protected]> wrote:
> Thanks Dmitriy, > > I've added in SSVD, PCA, QR and Weighted ALS. I think it is called "regularized ALS" > To keep it simple, I'll leave them under Spark for right now. (and add > "in development" for h2o) since they're in and passing tests. > > Should I add: > no > > GP-EI > BFGS > > as "in development" > > bigram co-occurrence (would this be collocations?) > > as "in development" for spark? > > > > > > Date: Mon, 25 Aug 2014 14:40:57 -0700 > > Subject: Re: Features by engine page > > From: [email protected] > > To: [email protected] > > > > yes SSVD and stochastic PCA as well as thin QR are re-cast in Mahout > > algebra (meaning they are engine-independent, not just spark). > > > > So is regularized ALS (albeit perhaps somewhat naive and thus affecting > > performance). > > > > I also had quasi algebraic implicit feedback ALS (which is in fact > implicit > > feedback paper and ALS-WR in the same bottle) but closed the issue due to > > lack of reviews and interest. > > > > Internally I also have framework for doing hyper parameter searches and > > right now am closing on GP-EI which will probably benefit from some > > additions doing estimates chosen by reducing uncertainty (attempts to get > > out of local minimum projected by GP-EI Snoek's algorithm itself). I > hope i > > could open it one day. This work is obviously also interesting in that it > > establishes probabilistic framework in Mahout (distributions & gaussian > > process). GP stuff can be also used to evaluate things like RLFM i > think. > > > > I also have framework to do line search type of things, including big > > datasets, per Nosedal and Wright, incluging BFGS, those are probably also > > candidates for contribution. Or not, depending on the moods of my new > boss. > > > > Of other interesting things that are done with DSL and may be considered > > for contribution, I also have implementations for bigram co-occurrence > > (both directed and undirected) made in the DSL but it is also > > quasi-algebraic i think (meaning there are Spark-specific parts). This is > > (I think) would also include truethful implementation of Surprise & > > Coincidence's paper bigram problem (currently implemented in Mahout MR) > but > > also would estimate undirected co-occurrences (as a frequent itemsets > > problem solver/replacement). Again, hopeful it may be contributed, but > not > > sure if i'll pursue that if there's lack of interest in my company. It's > > hard to go against the wind, in a way. > > > > By far the most often missing piece is data prep of course, but i think i > > can eventually contribute a couple tutorials of how to do vectorization > > using SparkQL stuff. > > > > > > > > -d > > > > > > > > > > On Mon, Aug 25, 2014 at 2:19 PM, Pat Ferrel <[email protected]> > wrote: > > > > > Spark RSJ, MAHOUT-1604 is in development > > > > > > I thought SSVD with PCA was working on Spark. > > > > > > > > > On Aug 25, 2014, at 2:15 PM, Dmitriy Lyubimov <[email protected]> > wrote: > > > > > > this is super-cool to hear. > > > > > > > > > On Mon, Aug 25, 2014 at 1:53 PM, Till Rohrmann <[email protected]> > > > wrote: > > > > > > > Hi Andrew, > > > > > > > > I like the overview of the different algorithms. The Flink bindings > are > > > > still under development. We hope to finish them in the next couple of > > > > weeks. > > > > > > > > Best regards, > > > > > > > > Till > > > > > > > > > > > > On Mon, Aug 25, 2014 at 9:17 PM, Andrew Palumbo <[email protected]> > > > > wrote: > > > > > > > >> I created a "Features by Engine" table from the Mahout "List of > > > >> Algorithms" page which I'd like to add to the Mahout site once it > looks > > > >> good: > > > >> > > > >> https://andrewpalumbo.github.io/algorithms_by_engine > > > >> > > > >> I just copied over the current page, and added in some of the stuff > that > > > > i > > > >> know is complete/in the works. I wasn't sure about some of the > > > >> Collaborative filtering stuff. > > > >> > > > >> Maybe the whole thing needs to be organized differently? A seperate > > > >> totally abstract section for algorithms that will be sitting in > > > > math-scala > > > >> and then a section for each engine's implementation? > > > >> > > > >> Also I know that there's been some work done on Flink bindings, but > I > > > >> don't see a specific Jira. Should I put Filnk down as "In > development"? > > > >> > > > >> Any thoughts are appreciated. > > > >> > > > >> > > > >> > > > > > > > > > > > >
