right- sorry i was looking at Weighted Matrix factorization. I meant added 
"Matrix Factorization with ALS on Implicit Feedback" as "in progress"

> Date: Mon, 25 Aug 2014 15:27:39 -0700
> Subject: Re: Features by engine page
> From: [email protected]
> To: [email protected]
> 
> On Mon, Aug 25, 2014 at 3:23 PM, Andrew Palumbo <[email protected]> wrote:
> 
> > Thanks Dmitriy,
> >
> > I've added in SSVD, PCA, QR and Weighted ALS.
> 
> 
> I think it is called "regularized ALS"
> 
> 
> > To keep it simple,  I'll leave them under Spark for right now. (and add
> > "in development" for h2o) since they're in and passing tests.
> >
> > Should I add:
> >
> 
> no
> 
> >
> > GP-EI
> > BFGS
> >
> > as "in development"
> >
> > bigram co-occurrence (would this be collocations?)
> >
> > as "in development" for spark?
> >
> >
> >
> >
> > > Date: Mon, 25 Aug 2014 14:40:57 -0700
> > > Subject: Re: Features by engine page
> > > From: [email protected]
> > > To: [email protected]
> > >
> > > yes SSVD and stochastic PCA as well as thin QR are re-cast in Mahout
> > > algebra (meaning they are engine-independent, not just spark).
> > >
> > > So is regularized ALS (albeit perhaps somewhat naive and thus affecting
> > > performance).
> > >
> > > I also had quasi algebraic implicit feedback ALS (which is in fact
> > implicit
> > > feedback paper and ALS-WR in the same bottle) but closed the issue due to
> > >  lack of reviews and interest.
> > >
> > > Internally I also have framework for doing hyper parameter searches and
> > > right now am closing on GP-EI which will probably benefit from some
> > > additions doing estimates chosen by reducing uncertainty (attempts to get
> > > out of local minimum projected by GP-EI Snoek's algorithm itself). I
> > hope i
> > > could open it one day. This work is obviously also interesting in that it
> > > establishes probabilistic framework in Mahout (distributions & gaussian
> > > process). GP stuff can  be also used to evaluate things like RLFM i
> > think.
> > >
> > > I also have framework to do line search type of things, including big
> > > datasets, per Nosedal and Wright, incluging BFGS, those are probably also
> > > candidates for contribution. Or not, depending on the moods of my new
> > boss.
> > >
> > > Of other interesting things that are done with DSL and may be considered
> > > for contribution, I also have implementations for bigram co-occurrence
> > > (both directed and undirected) made in the DSL but it is also
> > > quasi-algebraic i think (meaning there are Spark-specific parts). This is
> > > (I think) would also include truethful implementation of Surprise &
> > > Coincidence's paper bigram problem (currently implemented in Mahout MR)
> > but
> > > also would estimate undirected co-occurrences (as a frequent itemsets
> > > problem solver/replacement). Again, hopeful it may be contributed, but
> > not
> > > sure if i'll pursue that if there's lack of interest in my company. It's
> > > hard to go against the wind, in a way.
> > >
> > > By far the most often missing piece is data prep of course, but i think i
> > > can eventually contribute a couple tutorials of how to do vectorization
> > > using SparkQL stuff.
> > >
> > >
> > >
> > > -d
> > >
> > >
> > >
> > >
> > > On Mon, Aug 25, 2014 at 2:19 PM, Pat Ferrel <[email protected]>
> > wrote:
> > >
> > > > Spark RSJ, MAHOUT-1604 is in development
> > > >
> > > > I thought SSVD with PCA was working on Spark.
> > > >
> > > >
> > > > On Aug 25, 2014, at 2:15 PM, Dmitriy Lyubimov <[email protected]>
> > wrote:
> > > >
> > > > this is super-cool to hear.
> > > >
> > > >
> > > > On Mon, Aug 25, 2014 at 1:53 PM, Till Rohrmann <[email protected]>
> > > > wrote:
> > > >
> > > > > Hi Andrew,
> > > > >
> > > > > I like the overview of the different algorithms. The Flink bindings
> > are
> > > > > still under development. We hope to finish them in the next couple of
> > > > > weeks.
> > > > >
> > > > > Best regards,
> > > > >
> > > > > Till
> > > > >
> > > > >
> > > > > On Mon, Aug 25, 2014 at 9:17 PM, Andrew Palumbo <[email protected]>
> > > > > wrote:
> > > > >
> > > > >> I created a "Features by Engine" table from the Mahout "List of
> > > > >> Algorithms" page which I'd like to add to the Mahout site once it
> > looks
> > > > >> good:
> > > > >>
> > > > >> https://andrewpalumbo.github.io/algorithms_by_engine
> > > > >>
> > > > >> I just copied over the current page, and added in some of the stuff
> > that
> > > > > i
> > > > >> know is complete/in the works.  I wasn't sure about some of the
> > > > >> Collaborative filtering stuff.
> > > > >>
> > > > >> Maybe the whole thing needs to be organized differently?  A seperate
> > > > >> totally  abstract section for algorithms that will be sitting in
> > > > > math-scala
> > > > >> and then a section for each engine's implementation?
> > > > >>
> > > > >> Also I know that there's been some work done on Flink bindings, but
> > I
> > > > >> don't see a specific Jira.  Should I put Filnk down as "In
> > development"?
> > > > >>
> > > > >> Any thoughts are appreciated.
> > > > >>
> > > > >>
> > > > >>
> > > > >
> > > >
> > > >
> >
> >
                                          

Reply via email to