[
https://issues.apache.org/jira/browse/MAHOUT-887?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jeff Eastman resolved MAHOUT-887.
---------------------------------
Resolution: Invalid
Assignee: Jeff Eastman
In general, top-down clustering begins with all points assigned to a single
cluster and then iteratively uses some algorithm to split them. Bottom-up
clustering starts with one cluster for each point and then uses some algorithm
to iteratively merge them. Both of these approaches have scalability challenges
due to all the bookkeeping required and really break down if a probabilistic
cluster assignment (e.g. fuzzyk/dirichlet) is needed.
You can search the mail archive and JIRAs for MSC to find these discussions.
The scalability issues involve the requirement to use a single reducer (for the
last iteration at least) and cluster growth due to retaining the ids of all the
clusters that have merged with it.
MAHOUT-843 is aimed at supporting heterogeneous, top-down, hierarchical
clustering where the choice of algorithm at every level is up to the user and
where each algorithm may itself be iterative. That's a bit different than the
homogeneous, top-down clustering I described above. As clustering algorithms
cannot be used to merge clusters, there is no way to use them to build
heterogeneous, bottom-up clusterers which would be the opposite of 843.
I agree this issue can be closed.
> Bottom Up Clustering
> --------------------
>
> Key: MAHOUT-887
> URL: https://issues.apache.org/jira/browse/MAHOUT-887
> Project: Mahout
> Issue Type: New Feature
> Components: Clustering
> Affects Versions: 0.6
> Environment: Linux Windows
> Reporter: Paritosh Ranjan
> Assignee: Jeff Eastman
> Labels: features
> Fix For: 0.6
>
>
> Bottom up clustering is achieved by starting with small clusters/single
> points and then merging clusters recursively which are closer than a
> specified control constraint.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira