[ https://issues.apache.org/jira/browse/MAHOUT-931?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13176707#comment-13176707 ]
Jeff Eastman commented on MAHOUT-931: ------------------------------------- - 929: Yes, use the existing ClusterClassifier to write sequential and mapreduce versions of a post processor to do vector classification. You should not need the ClusterIterator as that is used for the buildCluster phase. - 930: No, buildClusters runs to completion on all vectors before clusterPoints is called on them. Currently, it is not possible to run the clusterPoints without first running buildClusters. With the post processor, they will be completely independent jobs (the existing CLI drivers may still bundle them for compatibility). - 931: Yes, a probability-based threshold would work with the current ClusterClassifier API. A distance-based threshold (like Canopy T1 pruning) would need a different mechanism. > Implement a pluggable outlier removal capability for cluster classifiers > ------------------------------------------------------------------------ > > Key: MAHOUT-931 > URL: https://issues.apache.org/jira/browse/MAHOUT-931 > Project: Mahout > Issue Type: Improvement > Components: Classification, Clustering > Affects Versions: 0.6 > Reporter: Paritosh Ranjan > Fix For: 0.7 > > Attachments: MAHOUT-931 > > > A pluggable outlier removal capability while classifying the clusters is > needed. The classification and outlier removal implementations, both should > be completely separate entities for better abstraction. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira