[ 
https://issues.apache.org/jira/browse/MAHOUT-931?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13176707#comment-13176707
 ] 

Jeff Eastman commented on MAHOUT-931:
-------------------------------------

- 929: Yes, use the existing ClusterClassifier to write sequential and 
mapreduce versions of a post processor to do vector classification. You should 
not need the ClusterIterator as that is used for the buildCluster phase.
- 930: No, buildClusters runs to completion on all vectors before clusterPoints 
is called on them. Currently, it is not possible to run the clusterPoints 
without first running buildClusters. With the post processor, they will be 
completely independent jobs (the existing CLI drivers may still bundle them for 
compatibility).
- 931: Yes, a probability-based threshold would work with the current 
ClusterClassifier API. A distance-based threshold (like Canopy T1 pruning) 
would need a different mechanism.
                
> Implement a pluggable outlier removal capability for cluster classifiers
> ------------------------------------------------------------------------
>
>                 Key: MAHOUT-931
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-931
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Classification, Clustering
>    Affects Versions: 0.6
>            Reporter: Paritosh Ranjan
>             Fix For: 0.7
>
>         Attachments: MAHOUT-931
>
>
> A pluggable outlier removal capability while classifying the clusters is 
> needed. The classification and outlier removal implementations, both should 
> be completely separate entities for better abstraction. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to