[jira] [Commented] (MAHOUT-843) Top Down Clustering

Paritosh Ranjan (Commented) (JIRA) Mon, 17 Oct 2011 11:25:32 -0700

    [ 
https://issues.apache.org/jira/browse/MAHOUT-843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13129062#comment-13129062
 ]


Paritosh Ranjan commented on MAHOUT-843:
----------------------------------------

I did not know that. Will do it from parent directory from now onwards. Thanks 
for letting me know.

Yes, the code does not work yet because I still need to group points belonging 
to a similar cluster in their respective directories, and give each cluster 
directory as the input to the bottom level clustering. I am working on that 
part and upload a working patch soon. 

One option can be to let the user cluster it manually 

/bin/mahout toplevelclustering <cluster-config>
/bin/mahout bottomlevelclustering <coluter-config>

Then we get rid of the duplicate looking arguments. As the only difference 
would be in input directory of bottom level clustering, which can be derived 
from input based on whether its a top level or a bottom level clustering ( as 
the output directory of top level clustering will be controlled by the code).
                
> Top Down Clustering
> -------------------
>
>                 Key: MAHOUT-843
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-843
>             Project: Mahout
>          Issue Type: New Feature
>          Components: Clustering
>    Affects Versions: 0.6
>            Reporter: Paritosh Ranjan
>              Labels: clustering, patch
>             Fix For: 0.6
>
>         Attachments: Top-Down-Clustering-patch
>
>
> Top Down Clustering works in multiple steps. The first step is to find 
> comparative bigger clusters. The second step is to cluster the bigger chunks 
> into meaningful clusters. This can performance while clustering big amount of 
> data. And, it also removes the dependency of providing input clusters/numbers 
> to the clustering algorithm.
> The "big" is a relative term, as well as the smaller "meaningful" terms. So, 
> the control of this "bigger" and "smaller/meaningful" clusters will be 
> controlled by the user.
> Which clustering algorithm to be used in the top level and which to use in 
> the bottom level can also be selected by the user. Initially, it can be done 
> for only one/few clustering algorithms, and later, option can be provided to 
> use all the algorithms ( which suits the case ). 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAHOUT-843) Top Down Clustering

Reply via email to