Practically speaking, to guide short-term goals, we do need to start with a narrower, coherent remit and expand later. Starting as a Java-based, Hadoop-based library for developers, focusing on collaborative filtering, clustering, categorization, and a few other things sounds just right.
It would be bad to think, we'll, we're about anything machine-learning-related at all, and take a couple steps in 10 different directions, rather than start by thoroughly exploring a couple. But nobody is saying that, it seems. Let's start by being a great library as described above. To that end I do want to push on... 1) Unifying our Hadoop integration -- well, once Hadoop sorts itself out again. 0.20.0 doesn't really 'work' it seems 2) Unifying the code base -- see message about the common and utils package for instance If we do stuff like this we really are going to arrive at a useful, polished, coherent product soon. Sean On Sat, Sep 5, 2009 at 4:30 PM, Grant Ingersoll<gsing...@apache.org> wrote: > I don't think we necessarily need to be distributed or Hadoop based, but > those are what we led with so far and its a good start. The nice thing is > the stuff works just fine in standalone mode, too. First and foremost, we > are a machine learning project with a commercial friendly license and a > solid community aiming to build fast, production ready libraries. Java, > Hadoop and distributed are important, but secondary in my mind. There will > certainly be some algorithms that we can't implement in Hadoop. See