[ 
https://issues.apache.org/jira/browse/MAHOUT-929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13204936#comment-13204936
 ] 

Jeff Eastman commented on MAHOUT-929:
-------------------------------------

Sequential version looks good but lacks tests of the MR implementation or at 
least of the mapper. 

What I get reading the code is that all points with a pdf > 
clusterClassificationThreshold will be clustered (else ignored as outliers) and 
that the most likely cluster will be chosen for each vector. To replace the 
current FuzzyK and Dirichlet capabilities, it will also need another 
classification threshold to support multiple classifications that the current 
implementations support.

As this code is not used yet, it could be committed as-is if you are 
comfortable but it would still be a WIP. How would you like to proceed?



                
> Refactor Clustering (Vector Classification) into a Separate Postprocess with 
> Outlier Pruning
> --------------------------------------------------------------------------------------------
>
>                 Key: MAHOUT-929
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-929
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Classification, Clustering
>    Affects Versions: 0.6
>            Reporter: Jeff Eastman
>             Fix For: 0.7
>
>         Attachments: Mahout-929, Mahout-929, Mahout-929
>
>
> The current clustering drivers have a -cp option to produce clusteredPoints 
> directory containing the input vectors classified by the final clusters 
> produced by the algorithm. These options are redundantly implemented in those 
> drivers.
> - Factor out & implement an independent post processor to perform the 
> classification step independently of the various clustering implementations.
> - Implement a pluggable outlier removal capability for this classifier. 
> - Consider building off of the ClusterClassifier & ClusterIterator ideas.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to