[
https://issues.apache.org/jira/browse/MAHOUT-931?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13176227#comment-13176227
]
Paritosh Ranjan commented on MAHOUT-931:
----------------------------------------
I am a bit confused.
Are we planning to get rid of the way clustering is being done currently, which
is algorithms specific? i.e. the code in CanopyClusterer.
Will the new clustering strategy be "only" what is implemented in
ClusterClassifier? i.e. Calculating probabilities of vectors belonging to
different models (clusters) and choose the model with highest probability?
If yes, then Implementing Clustering policy for different clustering algorithms
is all that is needed. And for outlier removal, just a threshold probability
will be needed. All vectors below that probability won't be clustered. Am I
correct?
Till now, I have been thinking that the clustering code just needs to be
refactored out ( without changing the implementation ). If this is the case,
then, I think, I have been proceeding in the correct direction ( in terms of
design ).
However, I am doubting that we are not in sync regarding the way of
implementation. I think you want to change the clustering implementation to a
cluster classification implementation, with outlier removal ( and completely
get rid of the algorithm specific implementation, which makes sense ).
So, it would be really helpful if you can clarify my doubts.
> Implement a pluggable outlier removal capability for cluster classifiers
> ------------------------------------------------------------------------
>
> Key: MAHOUT-931
> URL: https://issues.apache.org/jira/browse/MAHOUT-931
> Project: Mahout
> Issue Type: Improvement
> Components: Classification, Clustering
> Affects Versions: 0.6
> Reporter: Paritosh Ranjan
> Fix For: 0.7
>
> Attachments: MAHOUT-931
>
>
> A pluggable outlier removal capability while classifying the clusters is
> needed. The classification and outlier removal implementations, both should
> be completely separate entities for better abstraction.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira