Cool. On Nov 1, 2011, at 11:46 PM, Paritosh Ranjan wrote:
> Hi Grant, > > I have been working on Top Down Clustering. > https://issues.apache.org/jira/browse/MAHOUT-843 > > In this, the top level clustering algorithm ( for eg. Canopy ) can run with > big t1,t2 values. And then any other clustering algorithm (selected by user) > is executed on clusters produced by top level clustering. > > I have been able to configure top level and bottom level clustering with some > of the clustering algorithms available. > > I will be submitting the patch sometime in this week. Using it, we will be > able to run Canopy Clustering ( or other clustering algorithms first ) to > extract bigger clusters first and then apply other fine grained clustering > algorithms on the clusters extracted. > > I think this will help in achieving what is needed. > > Thanks and Regards, > Paritosh > > On 02-11-2011 09:01, Grant Ingersoll wrote: >> In reviewing clustering for upcoming training, I'm wondering about something >> w/ Canopy clustering that we claim, but wanted to check here first. In the >> lectures, etc. I've seen on it, the idea is to run Canopy first and then >> some other more expensive algorithm, such as k-means, etc. with the idea >> that items further away than T2 are not even considered when scoring a >> centroid in the more complex clustering approach. However, I think I'm >> missing where in the code this actually happens. We do have code that >> allows K-Means to use the Canopy centroids as initial centroids for k-means, >> but the other material seemed to imply more aggressive pruning was possible >> since points outside of T2 would not even need to be considered. Otherwise, >> it doesn't seem like we are saving anything by doing Canopy first other than >> we likely have a better set of starting centroids. I haven't thought about >> how this would be implemented. >> >> Then again, it's late and I'm tired. >> >> -Grant >> >> ----- >> No virus found in this message. >> Checked by AVG - www.avg.com >> Version: 10.0.1411 / Virus Database: 2092/3990 - Release Date: 11/01/11 >> > > -------------------------------------------- Grant Ingersoll http://www.lucidimagination.com
