On Mar 20, 2008, at 9:15 AM, Grant Ingersoll wrote:
On Mar 19, 2008, at 9:56 PM, Karl Wettin wrote:
Grant Ingersoll skrev:
Now that we have some code in place for clustering, I think it
would be cool to put together some examples/demos of real world
problems. Things like clustering text (perhaps we can use the
wikipedia download or the reuters download that Lucene contrib/
benchmark uses) or clustering other pieces of data.
We could setup a demo area of code and use Lucene's analysis code
to create document vectors.
Ideas and/or thoughts or volunteers?
Should a demo make sense enough so people who never heard about
machine learning before understand what's going on? Or should it
mainly show how to use the API? Or is it something that is just
built to show off working or large data set?
I think it is more about working with the APIs, at least for now.
In the longer run, intro to ML would be cool, but there is lots
available on that. I don't think it should be that large, as I
don't think we can really show scale.
Clarifying: I mean I don't know that we can really show scale in a
simple demo. The goal would be that someone can take and scale, sure,
but scaling requires infrastructure, etc.
Just something that shows how to get the source, set it up to run
against a test set of data and somehow see the results, even if it
is trivial cmd. line stuff.