[
https://issues.apache.org/jira/browse/MAHOUT-843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Paritosh Ranjan updated MAHOUT-843:
-----------------------------------
Attachment: Top-Down-Clustering-patch
I am trying to implement top down clustering. I have read the concept from the
book Mahout in action. This patch is just for providing feedback on the line of
thought to implement it.
I think that this Top Down Clustering should be flexible and the user should be
able to use different clustering algorithms for first and second level of
clustering with parameters that suits the user.
The patch demonstrates the idea. What is left to code, in the patch, is to
arrange clustered output from first level clustering algorithm in a directory
structure and provide each directory ( clustered points ) to the second level
clustering algorithm.
Please don't consider this patch as the final patch. I am submitting this for
feedback, and would welcome suggestions to improve it.
> Top Down Clustering
> -------------------
>
> Key: MAHOUT-843
> URL: https://issues.apache.org/jira/browse/MAHOUT-843
> Project: Mahout
> Issue Type: New Feature
> Components: Clustering
> Affects Versions: 0.6
> Reporter: Paritosh Ranjan
> Labels: clustering, patch
> Fix For: 0.6
>
> Attachments: Top-Down-Clustering-patch
>
>
> Top Down Clustering works in multiple steps. The first step is to find
> comparative bigger clusters. The second step is to cluster the bigger chunks
> into meaningful clusters. This can performance while clustering big amount of
> data. And, it also removes the dependency of providing input clusters/numbers
> to the clustering algorithm.
> The "big" is a relative term, as well as the smaller "meaningful" terms. So,
> the control of this "bigger" and "smaller/meaningful" clusters will be
> controlled by the user.
> Which clustering algorithm to be used in the top level and which to use in
> the bottom level can also be selected by the user. Initially, it can be done
> for only one/few clustering algorithms, and later, option can be provided to
> use all the algorithms ( which suits the case ).
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira