Yes. Wasn't questioning the part about algorithms. I think each of several
other of these points are probably on their own several times the amount of
work that has been put into this project over the past year so I'm
wondering if this close to realistic as a to do list for 1.0 of this
project.

As a design brief for a new project , for sure. Spark or similar is kind of
half of these things already and could use work on adding things like model
import export. This is hijacking your point but wanted to agree with the
ideas and wonder out loudly whether a lot of this effort belongs elsewhere
in the Apache tent. And whether the goal here should look more like polish
up and maintain.
On Feb 28, 2014 1:16 AM, "Ted Dunning" <ted.dunn...@gmail.com> wrote:

> Well, Mahout has had (kinda sorta awful) classifiers and clustering from
> day one.  It isn't like the only goal is recommendations.
>
> The non-MR, non-Hadoop comments are really more user centric requirements
> than implementations.  It is important that users be able to start without
> a cluster and move relatively transparently into a fully scaled solution.
>
> Moreover, the Hadoop-tied map-reduce implementations that we have had up to
> now have been disastrously complex.  We really need something better.
>
>
>
>
> On Thu, Feb 27, 2014 at 5:11 PM, Sean Owen <sro...@gmail.com> wrote:
>
> > This sounds good, but sounds like a whole different project or projects.
> > For example where does R appear from, what non-MR implementations, etc,
> > what is the no Hadoop implementation?
> > On Feb 28, 2014 12:38 AM, "Ted Dunning" <ted.dunn...@gmail.com> wrote:
> >
> > > I would like to start a conversation about where we want Mahout to be
> for
> > > 1.0.  Let's suspend for the moment the question of how to achieve the
> > > goals.  Instead, let's converge on what we really would like to have
> > happen
> > > and after that, let's talk about means that will get us there.
> > >
> > > Here are some goals that I think would be good in the area of numerics,
> > > classifiers and clustering:
> > >
> > > - runs with or without Hadoop
> > >
> > > - runs with or without map-reduce
> > >
> > > - includes (at least), regularized generalized linear models, k-means,
> > > random forest, distributed random forest, distributed neural networks
> > >
> > > - reasonably competitive speed against other implementations including
> > > graphlab, mlib and R.
> > >
> > > - interactive model building
> > >
> > > - models can be exported as code or data
> > >
> > > - simple programming model
> > >
> > > - programmable via Java or R
> > >
> > > - runs clustered or not
> > >
> > >
> > > What does everybody think?
> > >
> >
>

Reply via email to