[
https://issues.apache.org/jira/browse/MAHOUT-931?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13176707#comment-13176707
]
Jeff Eastman commented on MAHOUT-931:
-------------------------------------
- 929: Yes, use the existing ClusterClassifier to write sequential and
mapreduce versions of a post processor to do vector classification. You should
not need the ClusterIterator as that is used for the buildCluster phase.
- 930: No, buildClusters runs to completion on all vectors before clusterPoints
is called on them. Currently, it is not possible to run the clusterPoints
without first running buildClusters. With the post processor, they will be
completely independent jobs (the existing CLI drivers may still bundle them for
compatibility).
- 931: Yes, a probability-based threshold would work with the current
ClusterClassifier API. A distance-based threshold (like Canopy T1 pruning)
would need a different mechanism.
> Implement a pluggable outlier removal capability for cluster classifiers
> ------------------------------------------------------------------------
>
> Key: MAHOUT-931
> URL: https://issues.apache.org/jira/browse/MAHOUT-931
> Project: Mahout
> Issue Type: Improvement
> Components: Classification, Clustering
> Affects Versions: 0.6
> Reporter: Paritosh Ranjan
> Fix For: 0.7
>
> Attachments: MAHOUT-931
>
>
> A pluggable outlier removal capability while classifying the clusters is
> needed. The classification and outlier removal implementations, both should
> be completely separate entities for better abstraction.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira