On Sep 4, 2009, at 1:07 PM, Ted Dunning wrote:
These are good questions to ask. I don't know that we are ready to
answer
them, but I do think that we have pieces of the answers.
So far, there are three or four general themes that seem to be of real
interest/value
a) taste/collaborative filtering/cooccurrence analysis
b) facilitation of conventional machine learning by large scale
aggregation
using hadoop (so far, this is largely cooccurrence counting)
c) standard and basic machine learning tasks like clustering, simple
classifiers running on large scale data
d) stuff
I'd add a few non-technical things I find useful:
e) Non-viral License
f) Community supporting it (i.e. not abandoned) and a place to get
answers about practical problems.
I've been frustrated more than once by the lack of (e) and (f) on some
other projects. Not that I'm saying we solve (f) yet completely
(could use a bit more diversity in people answering, but that is
starting to take hold, too), but I do firmly believe Apache is one of
the best places to build a community.
-Grant