[ 
https://issues.apache.org/jira/browse/MAHOUT-931?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13176423#comment-13176423
 ] 

Paritosh Ranjan commented on MAHOUT-931:
----------------------------------------

Ok, I will start working in the following order then. I have few more doubts 
which I have written inline.

 - 929 implement a new post processor that does only classification as required 
by the various clusterPoints steps.

The new post processor for clusterPoints() would use the Cluster Classifier to 
identify which vector belongs to which cluster. At least for K-means, Canopy, 
Dirichlet ( i.e. similar policies exist for them ). I need to create a 
mapreduce and sequential version of it. Am I correct?

The current ClusterIterator is for buildCluster phase, as it is also training 
sideways?

 - 930 modify the existing drivers to use this post processor rather than their 
current, custom implementations.

Currently, the buildClusters and clusteredPoints run in the same method call 
for each vector. The new implementation would let buildClusters run for all 
input vectors first. And only after buildClusters is completely finished, start 
a new call for clusterPoints ( for all input vectors, using the new post 
processor ). 

 - 931 modify the post processor to support pluggable outlier removal.

This would be a probability threshold based implementation?
                
> Implement a pluggable outlier removal capability for cluster classifiers
> ------------------------------------------------------------------------
>
>                 Key: MAHOUT-931
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-931
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Classification, Clustering
>    Affects Versions: 0.6
>            Reporter: Paritosh Ranjan
>             Fix For: 0.7
>
>         Attachments: MAHOUT-931
>
>
> A pluggable outlier removal capability while classifying the clusters is 
> needed. The classification and outlier removal implementations, both should 
> be completely separate entities for better abstraction. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to