[
https://issues.apache.org/jira/browse/MAHOUT-930?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Paritosh Ranjan reassigned MAHOUT-930:
--------------------------------------
Assignee: Paritosh Ranjan
> Refactor Vector Classifaction out of Clustering - Make Classification abstract
> ------------------------------------------------------------------------------
>
> Key: MAHOUT-930
> URL: https://issues.apache.org/jira/browse/MAHOUT-930
> Project: Mahout
> Issue Type: Improvement
> Components: Classification, Clustering
> Affects Versions: 0.6
> Reporter: Paritosh Ranjan
> Assignee: Paritosh Ranjan
> Fix For: 0.7
>
>
> Right now, each clustering algorithm has its own runClustering ( -cp )
> implementation which produces clusteredPoints. The current design lacks :
> 1) Extensibility - No place to plugin new features like outlier removal while
> classification
> 2) Uniformity in design - as new algorithms don't have a pattern to follow.
> 3) Abstraction - the clusterData should only bother about classifying vectors
> i.e. assigning different vectors to clusters. Currently it lacks a bit of
> abstraction. It should not care about how to classify. That should be the
> work of a separate entity, which can have features like outlier removal.
> The new implementation factor out & implement an independent entity to
> perform the classification step independently of the various clustering
> implementations. The new design would start with ClusterClassifier,
> ClusteringPolicy and ClusterIterator whose experimental versions are
> available and committed. The currently committed version seems to work for
> all the iterative clustering algorithms.
> The ClusterClassifier provides probability of any vector belonging to the
> different clusters available. These probabilities are converted into weights
> by different ClusteringPolicy implementations, which are for respective
> clustering algorithms. This is the place where the outlier removal
> implementation can be plugged in. In future, different implementations of
> ClusteringPolicy can be provided (configured) for different type of
> classification.
> The ClusteringPolicy can be initialized with the ClusterConfig objects. These
> ClusterConfig objects would hold the Clustering Algorithm parameters which
> will help in classifying the Clusters.
> The ClusterClassifier also gives the capability to train the existing
> classifiers (clusters), by the input. This is the place where
> clustering/classification will converge.
> The execution is done by a ClusterIterator for now, which runs a clustering
> policy on the input and tries to classify the vectors to different clusters.
> It can simultaneously train the classifiers, as it can run for given number
> of iterations and each iteration would improve the quality of the classifiers.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira