This doesn’t seem to be a vision statement. I was +1 to a simple consensus 
statement.

The vision is up to you. 

We have an interactive shell that scales to huge datasets without resorting to 
massive subsampling. One that allows you to deal with the exact data your black 
box algos work on. Every data tool has an interactive mode except Mahout—now it 
does.  Virtually every complex transform as well as basic linear algebra works 
on massive datasets. The interactivity will allow people to do things with 
Mahout they could never do before.

We also have the building blocks to make the fastest most flexible cutting edge 
collaborative filtering+metadata recommenders in the world. Honestly I don’t 
see anything like this elsewhere. We will also be able to fit into virtually 
any workflow and directly consume data produced in those systems with no 
intermediate scrubbing. This has never happened before in Mahout and I don’t 
see it in MLlib either. Even the interactive shell will benefit from this. 

Other feature champions will be able to add to this list.

Seems like the vision comes from feature champions. I may not use Mahout in the 
same way you do but I rely on your code. Maybe I serve a different user type 
than you. I don’t see a problem with that, do you? 

On May 6, 2014, at 2:32 PM, Dmitriy Lyubimov <dlie...@gmail.com> wrote:

Pat et. al,

The whole problem with original suggested consensus statement is that it
reads as "we are building MLLib for Spark (oh wait, there's already such a
thing)" and then "we are building MLLib for 0xdata" and then perhaps for
something else. Which can't be farther from the true philosophy of what has
been done. If not it, then at best it reads as "we don't know what it is we
are building, but we are including some Spark dependencies now". So it is
either misleading, or sufficiently vague, not sure which is worse.

If a collection of backend-specific separated MLLibs is the new consensus,
i can't say i can share it. In fact, the only motivation for me to do
anything within this project was to fix everything that  (per my perhaps
lopsided perception) is less than ideal with the approach of building ML
projects as backend-specific collections of black-box trainers and solvers
and bring in an ideology similar to Julia and R to the jvm-based big data
ML .

If users are to love us, somehow i think it will not be because we ported
yet another flavor of K-means to Spark.

At this point I think it is a little premature to talk about an existing
consensus, it seems.

On Tue, May 6, 2014 at 12:41 PM, Pat Ferrel <p...@occamsmachete.com> wrote:

> +1
> 
> I personally won’t spend a lot of time generalizing right now.
> Contributors can help with that if they want or make suggestions.
> 
> On May 6, 2014, at 9:23 AM, Ted Dunning <ted.dunn...@gmail.com> wrote:
> 
> As a bit of commentary, it is clear that what the committers are working on
> is Spark
> 

Mahout committers, with very rare exceptions, are not working on Spark.
Spark committers and contributors are working on Spark.

Reply via email to