Re: FW: Converging Clustering and Classification

Jeff Eastman Wed, 13 Apr 2011 21:01:10 -0700

Lol, not too surprising considering the source. Here's how I got there:

- ClusterClassifier holds a "List<Cluster> models;" field as its onlystate just like VectorModelClassifier does- Started with ModelSerializerTest since you suggested being compatiblewith ModelSerializer- This tests OnlineLogisticRegression, CrossFoldLearner andAdaptiveLogisticRegression- The first two are also subclasses of AbstractVectorClassifier justlike ClusterClassifier- The tests pass OLR and CFL learners to train(OnlineLearner) so it madesense for a CC to be an OL too- The new CC.train(...) methods map to "models.get(actual).observe()" inCluster.observe(V)- CC.close() maps to cluster.computeParameters() for each model whichcomputes the posterior cluster parameters

- Now the CC is ready for another iteration or to classify, etc.

So, the cluster iteration process starts with a prior List<Cluster>which is used to construct the ClusterClassifier. Then in each iterationeach point is passed to CC.classify() and the maximum probabilityelement index in the returned Vector is used to train() the CC. Sinceall the DistanceMeasureClusters contain their appropriateDistanceMeasure, the one with the maximum pdf() is the closest. Justwhat kmeans already does but done less efficiently (it uses just theminimum distance, but pdf() = e^-distance so the closest cluster has thelargest pdf()).

Finally, instead of passing in a List<Cluster> in the KMeansClusterer Ican just carry around a CC which wraps it. Instead of serializing aList<Cluster> at the end of each iteration I can just serialize the CC.At the beginning of the next iteration, I just deserialize it and go.


I was so easy it surely must be wrong :)



On 4/13/11 7:54 PM, Ted Dunning wrote:

On Wed, Apr 13, 2011 at 6:24 PM, Jeff Eastman<[email protected]>  wrote:

I've been able to prototype a ClusterClassifier which, like
VectorModelClassifier, extends AbstractVectorClassifier but which also
implements OnlineLearner and Writable.

Implementing OnlineLearner is a surprise here.

Have to think about it since the learning doesn't have a target variable.

... If this could be completed it would seem to allow kmeans, fuzzyk,
dirichlet and maybe even meanshift cluster classifiers to be used with SGD.

Very cool.

... The challenge would be to use AVC.classify() in the various clusterers

or to extract initial centers for kmeans&  fuzzyk. Dirichlet might be
adaptable more directly since its models only have to produce the pi vector
of pdfs.

Yes.  Dirichlet is the one where this makes sense.

Re: FW: Converging Clustering and Classification

Reply via email to