Re: Features by engine page

2014-08-26 Thread Dmitriy Lyubimov
meant to be normal of course


On Tue, Aug 26, 2014 at 10:39 AM, Dmitriy Lyubimov 
wrote:

> scala, if you want) to write something like `new
> MultivariateUniformDistribution(mu,sigma).sample()`, so i really just dsl-
>


Re: Features by engine page

2014-08-26 Thread Dmitriy Lyubimov
on distributions, I did not find anything multivariate Mahout Matrix-based.
Hopefully, i did not look well enough. Everything univariate seems to be
pretty spotty. Aside from that, i need scala traits, plus i find it
extremely unelegant (un-scala, if you want) to write something like `new
MultivariateUniformDistribution(mu,sigma).sample()`, so i really just
dsl-bridged for most part. There are enough third party choices not to
bother with filling the gaps.

On step-recorded evolutionary search, after my literature search on the
topic, it doesnt look like even distant third best choice, in particular
under big data training settings.

First, i did not find head-to-head comparisons of that with any of top
choices. It is not included in Amplab survey of top search choices. GP-EI
is Netflix's choice, for example. So there's very little convincing data to
go on, to begin with. So given lack of such comparisons, the next best
thing is to copy what others do here.

Second, under big data settings, every data point (training) is precious.
In spark specifically, unlike MR,  since we want to retain as much data in
RAM is possible and avoid spills, best performance is usually achieved by
sequentially semaphoring trainings rather then throwing a whole bunch of
them out at once. Especially under circumstances where companies are
extremely anemic in provisioning hardware needed for whatever reason. In
that sense, exploration algorithms that are capable of making better
inference after each new data point, and arrive to a reasonably performing
model in ~20..30 sequential trains are infinitely more preferable, rather
than those that require a whole bunch of trainings to happen to begin to
figure the next centroid of trials. I am not even sure if step-recorded
search was even ever tried outside SGD where datapoints are abundant albeit
incomplete.



On Tue, Aug 26, 2014 at 8:32 AM, Ted Dunning  wrote:

> On Mon, Aug 25, 2014 at 2:40 PM, Dmitriy Lyubimov 
> wrote:
>
> > This work is obviously also interesting in that it
> > establishes probabilistic framework in Mahout (distributions & gaussian
> > process).
> >
>
> We already have that.
>
> (distributions not GP)
>
> Note that we also have an implementation of recorded step evolutionary
> programming that works really well for hyper-parameter search.  I don't
> like the way that the API turned out (too hard to understand).
>


Re: Features by engine page

2014-08-26 Thread Ted Dunning
On Mon, Aug 25, 2014 at 2:40 PM, Dmitriy Lyubimov  wrote:

> This work is obviously also interesting in that it
> establishes probabilistic framework in Mahout (distributions & gaussian
> process).
>

We already have that.

(distributions not GP)

Note that we also have an implementation of recorded step evolutionary
programming that works really well for hyper-parameter search.  I don't
like the way that the API turned out (too hard to understand).


RE: Features by engine page

2014-08-25 Thread Andrew Palumbo


right- sorry i was looking at Weighted Matrix factorization. I meant added 
"Matrix Factorization with ALS on Implicit Feedback" as "in progress"

> Date: Mon, 25 Aug 2014 15:27:39 -0700
> Subject: Re: Features by engine page
> From: dlie...@gmail.com
> To: dev@mahout.apache.org
> 
> On Mon, Aug 25, 2014 at 3:23 PM, Andrew Palumbo  wrote:
> 
> > Thanks Dmitriy,
> >
> > I've added in SSVD, PCA, QR and Weighted ALS.
> 
> 
> I think it is called "regularized ALS"
> 
> 
> > To keep it simple,  I'll leave them under Spark for right now. (and add
> > "in development" for h2o) since they're in and passing tests.
> >
> > Should I add:
> >
> 
> no
> 
> >
> > GP-EI
> > BFGS
> >
> > as "in development"
> >
> > bigram co-occurrence (would this be collocations?)
> >
> > as "in development" for spark?
> >
> >
> >
> >
> > > Date: Mon, 25 Aug 2014 14:40:57 -0700
> > > Subject: Re: Features by engine page
> > > From: dlie...@gmail.com
> > > To: dev@mahout.apache.org
> > >
> > > yes SSVD and stochastic PCA as well as thin QR are re-cast in Mahout
> > > algebra (meaning they are engine-independent, not just spark).
> > >
> > > So is regularized ALS (albeit perhaps somewhat naive and thus affecting
> > > performance).
> > >
> > > I also had quasi algebraic implicit feedback ALS (which is in fact
> > implicit
> > > feedback paper and ALS-WR in the same bottle) but closed the issue due to
> > >  lack of reviews and interest.
> > >
> > > Internally I also have framework for doing hyper parameter searches and
> > > right now am closing on GP-EI which will probably benefit from some
> > > additions doing estimates chosen by reducing uncertainty (attempts to get
> > > out of local minimum projected by GP-EI Snoek's algorithm itself). I
> > hope i
> > > could open it one day. This work is obviously also interesting in that it
> > > establishes probabilistic framework in Mahout (distributions & gaussian
> > > process). GP stuff can  be also used to evaluate things like RLFM i
> > think.
> > >
> > > I also have framework to do line search type of things, including big
> > > datasets, per Nosedal and Wright, incluging BFGS, those are probably also
> > > candidates for contribution. Or not, depending on the moods of my new
> > boss.
> > >
> > > Of other interesting things that are done with DSL and may be considered
> > > for contribution, I also have implementations for bigram co-occurrence
> > > (both directed and undirected) made in the DSL but it is also
> > > quasi-algebraic i think (meaning there are Spark-specific parts). This is
> > > (I think) would also include truethful implementation of Surprise &
> > > Coincidence's paper bigram problem (currently implemented in Mahout MR)
> > but
> > > also would estimate undirected co-occurrences (as a frequent itemsets
> > > problem solver/replacement). Again, hopeful it may be contributed, but
> > not
> > > sure if i'll pursue that if there's lack of interest in my company. It's
> > > hard to go against the wind, in a way.
> > >
> > > By far the most often missing piece is data prep of course, but i think i
> > > can eventually contribute a couple tutorials of how to do vectorization
> > > using SparkQL stuff.
> > >
> > >
> > >
> > > -d
> > >
> > >
> > >
> > >
> > > On Mon, Aug 25, 2014 at 2:19 PM, Pat Ferrel 
> > wrote:
> > >
> > > > Spark RSJ, MAHOUT-1604 is in development
> > > >
> > > > I thought SSVD with PCA was working on Spark.
> > > >
> > > >
> > > > On Aug 25, 2014, at 2:15 PM, Dmitriy Lyubimov 
> > wrote:
> > > >
> > > > this is super-cool to hear.
> > > >
> > > >
> > > > On Mon, Aug 25, 2014 at 1:53 PM, Till Rohrmann 
> > > > wrote:
> > > >
> > > > > Hi Andrew,
> > > > >
> > > > > I like the overview of the different algorithms. The Flink bindings
> > are
> > > > > still under development. We hope to finish them in the next couple of
> > > > > weeks.
> > > > >
> > > > > Best regards,
> > > > >
> > > > > Till
> > > > >
> > > > >
> > > > > On Mon, Aug 25, 2014 at 9:17 PM, Andrew Palumbo 
> > > > > wrote:
> > > > >
> > > > >> I created a "Features by Engine" table from the Mahout "List of
> > > > >> Algorithms" page which I'd like to add to the Mahout site once it
> > looks
> > > > >> good:
> > > > >>
> > > > >> https://andrewpalumbo.github.io/algorithms_by_engine
> > > > >>
> > > > >> I just copied over the current page, and added in some of the stuff
> > that
> > > > > i
> > > > >> know is complete/in the works.  I wasn't sure about some of the
> > > > >> Collaborative filtering stuff.
> > > > >>
> > > > >> Maybe the whole thing needs to be organized differently?  A seperate
> > > > >> totally  abstract section for algorithms that will be sitting in
> > > > > math-scala
> > > > >> and then a section for each engine's implementation?
> > > > >>
> > > > >> Also I know that there's been some work done on Flink bindings, but
> > I
> > > > >> don't see a specific Jira.  Should I put Filnk down as "In
> > development"?
> > > > >>
> > > > >> Any thoughts are appreciated.
> > > > >>
> > > > >>
> > > > >>
> > > > >
> > > >
> > > >
> >
> >
  

Re: Features by engine page

2014-08-25 Thread Dmitriy Lyubimov
On Mon, Aug 25, 2014 at 3:23 PM, Andrew Palumbo  wrote:

> Thanks Dmitriy,
>
> I've added in SSVD, PCA, QR and Weighted ALS.


I think it is called "regularized ALS"


> To keep it simple,  I'll leave them under Spark for right now. (and add
> "in development" for h2o) since they're in and passing tests.
>
> Should I add:
>

no

>
> GP-EI
> BFGS
>
> as "in development"
>
> bigram co-occurrence (would this be collocations?)
>
> as "in development" for spark?
>
>
>
>
> > Date: Mon, 25 Aug 2014 14:40:57 -0700
> > Subject: Re: Features by engine page
> > From: dlie...@gmail.com
> > To: dev@mahout.apache.org
> >
> > yes SSVD and stochastic PCA as well as thin QR are re-cast in Mahout
> > algebra (meaning they are engine-independent, not just spark).
> >
> > So is regularized ALS (albeit perhaps somewhat naive and thus affecting
> > performance).
> >
> > I also had quasi algebraic implicit feedback ALS (which is in fact
> implicit
> > feedback paper and ALS-WR in the same bottle) but closed the issue due to
> >  lack of reviews and interest.
> >
> > Internally I also have framework for doing hyper parameter searches and
> > right now am closing on GP-EI which will probably benefit from some
> > additions doing estimates chosen by reducing uncertainty (attempts to get
> > out of local minimum projected by GP-EI Snoek's algorithm itself). I
> hope i
> > could open it one day. This work is obviously also interesting in that it
> > establishes probabilistic framework in Mahout (distributions & gaussian
> > process). GP stuff can  be also used to evaluate things like RLFM i
> think.
> >
> > I also have framework to do line search type of things, including big
> > datasets, per Nosedal and Wright, incluging BFGS, those are probably also
> > candidates for contribution. Or not, depending on the moods of my new
> boss.
> >
> > Of other interesting things that are done with DSL and may be considered
> > for contribution, I also have implementations for bigram co-occurrence
> > (both directed and undirected) made in the DSL but it is also
> > quasi-algebraic i think (meaning there are Spark-specific parts). This is
> > (I think) would also include truethful implementation of Surprise &
> > Coincidence's paper bigram problem (currently implemented in Mahout MR)
> but
> > also would estimate undirected co-occurrences (as a frequent itemsets
> > problem solver/replacement). Again, hopeful it may be contributed, but
> not
> > sure if i'll pursue that if there's lack of interest in my company. It's
> > hard to go against the wind, in a way.
> >
> > By far the most often missing piece is data prep of course, but i think i
> > can eventually contribute a couple tutorials of how to do vectorization
> > using SparkQL stuff.
> >
> >
> >
> > -d
> >
> >
> >
> >
> > On Mon, Aug 25, 2014 at 2:19 PM, Pat Ferrel 
> wrote:
> >
> > > Spark RSJ, MAHOUT-1604 is in development
> > >
> > > I thought SSVD with PCA was working on Spark.
> > >
> > >
> > > On Aug 25, 2014, at 2:15 PM, Dmitriy Lyubimov 
> wrote:
> > >
> > > this is super-cool to hear.
> > >
> > >
> > > On Mon, Aug 25, 2014 at 1:53 PM, Till Rohrmann 
> > > wrote:
> > >
> > > > Hi Andrew,
> > > >
> > > > I like the overview of the different algorithms. The Flink bindings
> are
> > > > still under development. We hope to finish them in the next couple of
> > > > weeks.
> > > >
> > > > Best regards,
> > > >
> > > > Till
> > > >
> > > >
> > > > On Mon, Aug 25, 2014 at 9:17 PM, Andrew Palumbo 
> > > > wrote:
> > > >
> > > >> I created a "Features by Engine" table from the Mahout "List of
> > > >> Algorithms" page which I'd like to add to the Mahout site once it
> looks
> > > >> good:
> > > >>
> > > >> https://andrewpalumbo.github.io/algorithms_by_engine
> > > >>
> > > >> I just copied over the current page, and added in some of the stuff
> that
> > > > i
> > > >> know is complete/in the works.  I wasn't sure about some of the
> > > >> Collaborative filtering stuff.
> > > >>
> > > >> Maybe the whole thing needs to be organized differently?  A seperate
> > > >> totally  abstract section for algorithms that will be sitting in
> > > > math-scala
> > > >> and then a section for each engine's implementation?
> > > >>
> > > >> Also I know that there's been some work done on Flink bindings, but
> I
> > > >> don't see a specific Jira.  Should I put Filnk down as "In
> development"?
> > > >>
> > > >> Any thoughts are appreciated.
> > > >>
> > > >>
> > > >>
> > > >
> > >
> > >
>
>


RE: Features by engine page

2014-08-25 Thread Andrew Palumbo
Thanks Dmitriy,

I've added in SSVD, PCA, QR and Weighted ALS. To keep it simple,  I'll leave 
them under Spark for right now. (and add "in development" for h2o) since 
they're in and passing tests.  

Should I add:

GP-EI
BFGS 
 
as "in development"

bigram co-occurrence (would this be collocations?)

as "in development" for spark?  




> Date: Mon, 25 Aug 2014 14:40:57 -0700
> Subject: Re: Features by engine page
> From: dlie...@gmail.com
> To: dev@mahout.apache.org
> 
> yes SSVD and stochastic PCA as well as thin QR are re-cast in Mahout
> algebra (meaning they are engine-independent, not just spark).
> 
> So is regularized ALS (albeit perhaps somewhat naive and thus affecting
> performance).
> 
> I also had quasi algebraic implicit feedback ALS (which is in fact implicit
> feedback paper and ALS-WR in the same bottle) but closed the issue due to
>  lack of reviews and interest.
> 
> Internally I also have framework for doing hyper parameter searches and
> right now am closing on GP-EI which will probably benefit from some
> additions doing estimates chosen by reducing uncertainty (attempts to get
> out of local minimum projected by GP-EI Snoek's algorithm itself). I hope i
> could open it one day. This work is obviously also interesting in that it
> establishes probabilistic framework in Mahout (distributions & gaussian
> process). GP stuff can  be also used to evaluate things like RLFM i think.
> 
> I also have framework to do line search type of things, including big
> datasets, per Nosedal and Wright, incluging BFGS, those are probably also
> candidates for contribution. Or not, depending on the moods of my new boss.
> 
> Of other interesting things that are done with DSL and may be considered
> for contribution, I also have implementations for bigram co-occurrence
> (both directed and undirected) made in the DSL but it is also
> quasi-algebraic i think (meaning there are Spark-specific parts). This is
> (I think) would also include truethful implementation of Surprise &
> Coincidence's paper bigram problem (currently implemented in Mahout MR) but
> also would estimate undirected co-occurrences (as a frequent itemsets
> problem solver/replacement). Again, hopeful it may be contributed, but not
> sure if i'll pursue that if there's lack of interest in my company. It's
> hard to go against the wind, in a way.
> 
> By far the most often missing piece is data prep of course, but i think i
> can eventually contribute a couple tutorials of how to do vectorization
> using SparkQL stuff.
> 
> 
> 
> -d
> 
> 
> 
> 
> On Mon, Aug 25, 2014 at 2:19 PM, Pat Ferrel  wrote:
> 
> > Spark RSJ, MAHOUT-1604 is in development
> >
> > I thought SSVD with PCA was working on Spark.
> >
> >
> > On Aug 25, 2014, at 2:15 PM, Dmitriy Lyubimov  wrote:
> >
> > this is super-cool to hear.
> >
> >
> > On Mon, Aug 25, 2014 at 1:53 PM, Till Rohrmann 
> > wrote:
> >
> > > Hi Andrew,
> > >
> > > I like the overview of the different algorithms. The Flink bindings are
> > > still under development. We hope to finish them in the next couple of
> > > weeks.
> > >
> > > Best regards,
> > >
> > > Till
> > >
> > >
> > > On Mon, Aug 25, 2014 at 9:17 PM, Andrew Palumbo 
> > > wrote:
> > >
> > >> I created a "Features by Engine" table from the Mahout "List of
> > >> Algorithms" page which I'd like to add to the Mahout site once it looks
> > >> good:
> > >>
> > >> https://andrewpalumbo.github.io/algorithms_by_engine
> > >>
> > >> I just copied over the current page, and added in some of the stuff that
> > > i
> > >> know is complete/in the works.  I wasn't sure about some of the
> > >> Collaborative filtering stuff.
> > >>
> > >> Maybe the whole thing needs to be organized differently?  A seperate
> > >> totally  abstract section for algorithms that will be sitting in
> > > math-scala
> > >> and then a section for each engine's implementation?
> > >>
> > >> Also I know that there's been some work done on Flink bindings, but I
> > >> don't see a specific Jira.  Should I put Filnk down as "In development"?
> > >>
> > >> Any thoughts are appreciated.
> > >>
> > >>
> > >>
> > >
> >
> >
  

Re: Features by engine page

2014-08-25 Thread Dmitriy Lyubimov
yes SSVD and stochastic PCA as well as thin QR are re-cast in Mahout
algebra (meaning they are engine-independent, not just spark).

So is regularized ALS (albeit perhaps somewhat naive and thus affecting
performance).

I also had quasi algebraic implicit feedback ALS (which is in fact implicit
feedback paper and ALS-WR in the same bottle) but closed the issue due to
 lack of reviews and interest.

Internally I also have framework for doing hyperparameter searches and
right now am closing on GP-EI which will probably benefit from some
additions doing estimates chosen by reducing uncertainty (attempts to get
out of local minimum projected by GP-EI Snoek's algorithm itself). I hope i
could open it one day. This work is obviously also interesting in that it
establishes probabilistic framework in Mahout (distributions & gaussian
process). GP stuff can  be also used to evaluate things like RLFM i think.

I also have framework to do line search type of things, including big
datasets, per Nosedal and Wright, incluging BFGS, those are probably also
candidates for contribution. Or not, depending on the moods of my new boss.

Of other interesting things that are done with DSL and may be considered
for contribution, I also have implementations for bigram co-occurrence
(both directed and undirected) made in the DSL but it is also
quasi-algebraic i think (meaning there are Spark-specific parts). This is
(I think) would also include truethful implementation of Surprise &
Coincidence's paper bigram problem (currently implemented in Mahout MR) but
also would estimate undirected co-occurrences (as a frequent itemsets
problem solver/replacement). Again, hopeful it may be contributed, but not
sure if i'll pursue that if there's lack of interest in my company. It's
hard to go against the wind, in a way.

By far the most often missing piece is data prep of course, but i think i
can eventually contribute a couple tutorials of how to do vectorization
using SparkQL stuff.



-d




On Mon, Aug 25, 2014 at 2:19 PM, Pat Ferrel  wrote:

> Spark RSJ, MAHOUT-1604 is in development
>
> I thought SSVD with PCA was working on Spark.
>
>
> On Aug 25, 2014, at 2:15 PM, Dmitriy Lyubimov  wrote:
>
> this is super-cool to hear.
>
>
> On Mon, Aug 25, 2014 at 1:53 PM, Till Rohrmann 
> wrote:
>
> > Hi Andrew,
> >
> > I like the overview of the different algorithms. The Flink bindings are
> > still under development. We hope to finish them in the next couple of
> > weeks.
> >
> > Best regards,
> >
> > Till
> >
> >
> > On Mon, Aug 25, 2014 at 9:17 PM, Andrew Palumbo 
> > wrote:
> >
> >> I created a "Features by Engine" table from the Mahout "List of
> >> Algorithms" page which I'd like to add to the Mahout site once it looks
> >> good:
> >>
> >> https://andrewpalumbo.github.io/algorithms_by_engine
> >>
> >> I just copied over the current page, and added in some of the stuff that
> > i
> >> know is complete/in the works.  I wasn't sure about some of the
> >> Collaborative filtering stuff.
> >>
> >> Maybe the whole thing needs to be organized differently?  A seperate
> >> totally  abstract section for algorithms that will be sitting in
> > math-scala
> >> and then a section for each engine's implementation?
> >>
> >> Also I know that there's been some work done on Flink bindings, but I
> >> don't see a specific Jira.  Should I put Filnk down as "In development"?
> >>
> >> Any thoughts are appreciated.
> >>
> >>
> >>
> >
>
>


RE: Features by engine page

2014-08-25 Thread Andrew Palumbo
Thanks Pat,

I added in Row Similarity- Should we keep that under "Miscellaneous"

I'll add in everything in the Decomposition suite under spark for now


> Subject: Re: Features by engine page
> From: pat.fer...@gmail.com
> Date: Mon, 25 Aug 2014 14:19:25 -0700
> To: dev@mahout.apache.org
> 
> Spark RSJ, MAHOUT-1604 is in development
> 
> I thought SSVD with PCA was working on Spark.
> 
> 
> On Aug 25, 2014, at 2:15 PM, Dmitriy Lyubimov  wrote:
> 
> this is super-cool to hear.
> 
> 
> On Mon, Aug 25, 2014 at 1:53 PM, Till Rohrmann  wrote:
> 
> > Hi Andrew,
> > 
> > I like the overview of the different algorithms. The Flink bindings are
> > still under development. We hope to finish them in the next couple of
> > weeks.
> > 
> > Best regards,
> > 
> > Till
> > 
> > 
> > On Mon, Aug 25, 2014 at 9:17 PM, Andrew Palumbo 
> > wrote:
> > 
> >> I created a "Features by Engine" table from the Mahout "List of
> >> Algorithms" page which I'd like to add to the Mahout site once it looks
> >> good:
> >> 
> >> https://andrewpalumbo.github.io/algorithms_by_engine
> >> 
> >> I just copied over the current page, and added in some of the stuff that
> > i
> >> know is complete/in the works.  I wasn't sure about some of the
> >> Collaborative filtering stuff.
> >> 
> >> Maybe the whole thing needs to be organized differently?  A seperate
> >> totally  abstract section for algorithms that will be sitting in
> > math-scala
> >> and then a section for each engine's implementation?
> >> 
> >> Also I know that there's been some work done on Flink bindings, but I
> >> don't see a specific Jira.  Should I put Filnk down as "In development"?
> >> 
> >> Any thoughts are appreciated.
> >> 
> >> 
> >> 
> > 
> 
  

Re: Features by engine page

2014-08-25 Thread Pat Ferrel
Spark RSJ, MAHOUT-1604 is in development

I thought SSVD with PCA was working on Spark.


On Aug 25, 2014, at 2:15 PM, Dmitriy Lyubimov  wrote:

this is super-cool to hear.


On Mon, Aug 25, 2014 at 1:53 PM, Till Rohrmann  wrote:

> Hi Andrew,
> 
> I like the overview of the different algorithms. The Flink bindings are
> still under development. We hope to finish them in the next couple of
> weeks.
> 
> Best regards,
> 
> Till
> 
> 
> On Mon, Aug 25, 2014 at 9:17 PM, Andrew Palumbo 
> wrote:
> 
>> I created a "Features by Engine" table from the Mahout "List of
>> Algorithms" page which I'd like to add to the Mahout site once it looks
>> good:
>> 
>> https://andrewpalumbo.github.io/algorithms_by_engine
>> 
>> I just copied over the current page, and added in some of the stuff that
> i
>> know is complete/in the works.  I wasn't sure about some of the
>> Collaborative filtering stuff.
>> 
>> Maybe the whole thing needs to be organized differently?  A seperate
>> totally  abstract section for algorithms that will be sitting in
> math-scala
>> and then a section for each engine's implementation?
>> 
>> Also I know that there's been some work done on Flink bindings, but I
>> don't see a specific Jira.  Should I put Filnk down as "In development"?
>> 
>> Any thoughts are appreciated.
>> 
>> 
>> 
> 



Re: Features by engine page

2014-08-25 Thread Dmitriy Lyubimov
this is super-cool to hear.


On Mon, Aug 25, 2014 at 1:53 PM, Till Rohrmann  wrote:

> Hi Andrew,
>
> I like the overview of the different algorithms. The Flink bindings are
> still under development. We hope to finish them in the next couple of
> weeks.
>
> Best regards,
>
> Till
>
>
> On Mon, Aug 25, 2014 at 9:17 PM, Andrew Palumbo 
> wrote:
>
> > I created a "Features by Engine" table from the Mahout "List of
> > Algorithms" page which I'd like to add to the Mahout site once it looks
> > good:
> >
> > https://andrewpalumbo.github.io/algorithms_by_engine
> >
> > I just copied over the current page, and added in some of the stuff that
> i
> > know is complete/in the works.  I wasn't sure about some of the
> > Collaborative filtering stuff.
> >
> > Maybe the whole thing needs to be organized differently?  A seperate
> > totally  abstract section for algorithms that will be sitting in
> math-scala
> > and then a section for each engine's implementation?
> >
> > Also I know that there's been some work done on Flink bindings, but I
> > don't see a specific Jira.  Should I put Filnk down as "In development"?
> >
> > Any thoughts are appreciated.
> >
> >
> >
>


RE: Features by engine page

2014-08-25 Thread Andrew Palumbo
Thank you Till,

I will add Flink in as "In Development"

Andy 

> Date: Mon, 25 Aug 2014 22:53:22 +0200
> Subject: Re: Features by engine page
> From: trohrm...@apache.org
> To: dev@mahout.apache.org
> 
> Hi Andrew,
> 
> I like the overview of the different algorithms. The Flink bindings are
> still under development. We hope to finish them in the next couple of weeks.
> 
> Best regards,
> 
> Till
> 
> 
> On Mon, Aug 25, 2014 at 9:17 PM, Andrew Palumbo  wrote:
> 
> > I created a "Features by Engine" table from the Mahout "List of
> > Algorithms" page which I'd like to add to the Mahout site once it looks
> > good:
> >
> > https://andrewpalumbo.github.io/algorithms_by_engine
> >
> > I just copied over the current page, and added in some of the stuff that i
> > know is complete/in the works.  I wasn't sure about some of the
> > Collaborative filtering stuff.
> >
> > Maybe the whole thing needs to be organized differently?  A seperate
> > totally  abstract section for algorithms that will be sitting in math-scala
> > and then a section for each engine's implementation?
> >
> > Also I know that there's been some work done on Flink bindings, but I
> > don't see a specific Jira.  Should I put Filnk down as "In development"?
> >
> > Any thoughts are appreciated.
> >
> >
> >
  

Re: Features by engine page

2014-08-25 Thread Till Rohrmann
Hi Andrew,

I like the overview of the different algorithms. The Flink bindings are
still under development. We hope to finish them in the next couple of weeks.

Best regards,

Till


On Mon, Aug 25, 2014 at 9:17 PM, Andrew Palumbo  wrote:

> I created a "Features by Engine" table from the Mahout "List of
> Algorithms" page which I'd like to add to the Mahout site once it looks
> good:
>
> https://andrewpalumbo.github.io/algorithms_by_engine
>
> I just copied over the current page, and added in some of the stuff that i
> know is complete/in the works.  I wasn't sure about some of the
> Collaborative filtering stuff.
>
> Maybe the whole thing needs to be organized differently?  A seperate
> totally  abstract section for algorithms that will be sitting in math-scala
> and then a section for each engine's implementation?
>
> Also I know that there's been some work done on Flink bindings, but I
> don't see a specific Jira.  Should I put Filnk down as "In development"?
>
> Any thoughts are appreciated.
>
>
>


Features by engine page

2014-08-25 Thread Andrew Palumbo
I created a "Features by Engine" table from the Mahout "List of Algorithms" 
page which I'd like to add to the Mahout site once it looks good:

https://andrewpalumbo.github.io/algorithms_by_engine

I just copied over the current page, and added in some of the stuff that i know 
is complete/in the works.  I wasn't sure about some of the Collaborative 
filtering stuff.  

Maybe the whole thing needs to be organized differently?  A seperate totally  
abstract section for algorithms that will be sitting in math-scala and then a 
section for each engine's implementation?

Also I know that there's been some work done on Flink bindings, but I don't see 
a specific Jira.  Should I put Filnk down as "In development"?

Any thoughts are appreciated.