I've been using the canopy clustering to cluster Apache log time slices by URL frequency. Typical results indicate several big clusters with the "business as usual" access patterns in them and then several small clusters with the unusual patterns. It's a little difficult to interpret beyond that but still intriguing. Since every body has such logs it might be a useful demo application that people could run over their own data.
Jeff > -----Original Message----- > From: Grant Ingersoll [mailto:[EMAIL PROTECTED] > Sent: Monday, March 17, 2008 8:41 AM > To: [email protected] > Subject: Demos/Tutorials > > Now that we have some code in place for clustering, I think it would > be cool to put together some examples/demos of real world problems. > Things like clustering text (perhaps we can use the wikipedia download > or the reuters download that Lucene contrib/benchmark uses) or > clustering other pieces of data. > > We could setup a demo area of code and use Lucene's analysis code to > create document vectors. > > Ideas and/or thoughts or volunteers? > > Cheers, > Grant
