[ https://issues.apache.org/jira/browse/MAHOUT-931?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13176423#comment-13176423 ]
Paritosh Ranjan commented on MAHOUT-931: ---------------------------------------- Ok, I will start working in the following order then. I have few more doubts which I have written inline. - 929 implement a new post processor that does only classification as required by the various clusterPoints steps. The new post processor for clusterPoints() would use the Cluster Classifier to identify which vector belongs to which cluster. At least for K-means, Canopy, Dirichlet ( i.e. similar policies exist for them ). I need to create a mapreduce and sequential version of it. Am I correct? The current ClusterIterator is for buildCluster phase, as it is also training sideways? - 930 modify the existing drivers to use this post processor rather than their current, custom implementations. Currently, the buildClusters and clusteredPoints run in the same method call for each vector. The new implementation would let buildClusters run for all input vectors first. And only after buildClusters is completely finished, start a new call for clusterPoints ( for all input vectors, using the new post processor ). - 931 modify the post processor to support pluggable outlier removal. This would be a probability threshold based implementation? > Implement a pluggable outlier removal capability for cluster classifiers > ------------------------------------------------------------------------ > > Key: MAHOUT-931 > URL: https://issues.apache.org/jira/browse/MAHOUT-931 > Project: Mahout > Issue Type: Improvement > Components: Classification, Clustering > Affects Versions: 0.6 > Reporter: Paritosh Ranjan > Fix For: 0.7 > > Attachments: MAHOUT-931 > > > A pluggable outlier removal capability while classifying the clusters is > needed. The classification and outlier removal implementations, both should > be completely separate entities for better abstraction. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira