If you look at the ClusterClassifier and ClusterIterator classes, you can see there is not a lot of code in either and they read pretty well. The ClusterClassifier is an AbstractVectorClassifier that uses the pre-existing Cluster classes as its models. It should be usable as a classifier, just like the other classifiers; it supports the classify methods. It is also an OnlineLearner (like AbstractOnlineLogisticRegression) and so it also supports the train methods. In comparison with the normal classification lifecycle - train then classify - the iterator does classify then train. I thought the symmetry was pretty slick.
The ClusterIterator uses a ClusterClassifier to classify each data vector using the prior values of the model parameters within it. It then uses a ClusteringPolicy to develop a normalized weight vector (single entry for kmeans and dirichlet; multiple entries for fuzzyk). Then it trains itself using these weights. The training uses the pre-existing observe() method on the contained models to accumulate the posterior statistics counters. Upon close() the classifier computes each of its models' posterior parameters (computeParameters()) and is ready for another iteration. You can also run the kmeans, fuzzyk and dirichlet display examples to see that they do what they purport. Hope this helps, Jeff -----Original Message----- From: Lance Norskog [mailto:[email protected]] Sent: Tuesday, May 17, 2011 10:01 AM To: [email protected] Subject: Re: List your changes for Mahout 0.5 Could someone who understands it create a diagram of how the parts fit together? On Tue, May 17, 2011 at 8:57 AM, Jeff Eastman <[email protected]> wrote: > Cool. I thought it was pretty slick how the pre-existing parts all fit > together. Perhaps even elegant. > > -----Original Message----- > From: Ted Dunning [mailto:[email protected]] > Sent: Tuesday, May 17, 2011 8:50 AM > To: [email protected] > Subject: Re: List your changes for Mahout 0.5 > > I would say that we mention it. > > On Tue, May 17, 2011 at 8:44 AM, Jeff Eastman <[email protected]> wrote: > >> I don't know. It does a credible (but still sequential) job of clustering >> kmeans, fuzzyk and dirichlet. It is clearly still experimental. You guys be >> the judge... >> >> -----Original Message----- >> From: Ted Dunning [mailto:[email protected]] >> Sent: Tuesday, May 17, 2011 8:39 AM >> To: [email protected] >> Subject: Re: List your changes for Mahout 0.5 >> >> Is it done enough to claim? >> >> On Tue, May 17, 2011 at 8:29 AM, Jeff Eastman <[email protected]> wrote: >> >> > Perhaps premature, but would you want to include the >> > clustering-classification convergence e.g. ClusterClassifier? >> > >> > -----Original Message----- >> > From: Sean Owen [mailto:[email protected]] >> > Sent: Tuesday, May 17, 2011 7:51 AM >> > To: Mahout Dev List >> > Subject: List your changes for Mahout 0.5 >> > >> > • Improved Lanczos solver >> > • Stochastic Singular Value Decomposition implementation >> > • Incremental SVD implementation >> > • Alternating Least Squares with Weighted Regularization >> > collaborative filtering implementation, both distributed and >> > non-distributed >> > • SVDRecommender enhancements >> > • Better control over candidate item selection in item-based >> > recommenders >> > • Significant removal of deprecated or dead code >> > • Many bug fixes, refactorings and other small improvements >> > >> > What else? >> > >> > -- Lance Norskog [email protected]
