[ 
https://issues.apache.org/jira/browse/MAHOUT-843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13143694#comment-13143694
 ] 

Paritosh Ranjan commented on MAHOUT-843:
----------------------------------------

I don't see much of a "comparison" between a Java Approach and a CLI. I see 
these two as separate means to perform the same task. I think Mahout provides 
both ways to accomplish most of the tasks. So, to me, this question is like 
"Why KMeansDriver is better than CLI to do KMeans", which I think depends on 
the way user wants to use it.

So, I don't see the reason of questioning the Java API. This helps the user to 
accomplish top down clustering, with different clustering algorithms, without 
getting into its intricacies.

I also don't think that creating a CLI would be the hardest part of the 
feature. So, I can create the CLI with top-bottom parameters for 
TopDownClustering all together. Because I think that many parameters to run 
Clustering are common to most of the algorithms, so its not going to be that 
complicated and complex. But, does creating this CLI and writing the Junit 
Tests complete the feature?

Writing a CLI which does clustering, post processing, and again clustering 
makes sense as it helps reducing parameters. But, still, why agains the Java 
API?
                
> Top Down Clustering
> -------------------
>
>                 Key: MAHOUT-843
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-843
>             Project: Mahout
>          Issue Type: New Feature
>          Components: Clustering
>    Affects Versions: 0.6
>            Reporter: Paritosh Ranjan
>              Labels: clustering, patch
>             Fix For: 0.6
>
>         Attachments: MAHOUT-843-patch, Top-Down-Clustering-patch
>
>
> Top Down Clustering works in multiple steps. The first step is to find 
> comparative bigger clusters. The second step is to cluster the bigger chunks 
> into meaningful clusters. This can performance while clustering big amount of 
> data. And, it also removes the dependency of providing input clusters/numbers 
> to the clustering algorithm.
> The "big" is a relative term, as well as the smaller "meaningful" terms. So, 
> the control of this "bigger" and "smaller/meaningful" clusters will be 
> controlled by the user.
> Which clustering algorithm to be used in the top level and which to use in 
> the bottom level can also be selected by the user. Initially, it can be done 
> for only one/few clustering algorithms, and later, option can be provided to 
> use all the algorithms ( which suits the case ). 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to