Cool.

On Nov 1, 2011, at 11:46 PM, Paritosh Ranjan wrote:

> Hi Grant,
> 
> I have been working on Top Down Clustering. 
> https://issues.apache.org/jira/browse/MAHOUT-843
> 
> In this, the top level  clustering algorithm ( for eg. Canopy ) can run with 
> big t1,t2 values. And then any other clustering algorithm (selected by user) 
> is executed on clusters produced by top level clustering.
> 
> I have been able to configure top level and bottom level clustering with some 
> of the clustering algorithms available.
> 
> I will be submitting the patch sometime in this week. Using it, we will be 
> able to run Canopy Clustering ( or other clustering algorithms first ) to 
> extract bigger clusters first and then apply other fine grained clustering 
> algorithms on the clusters extracted.
> 
> I think this will help in achieving what is needed.
> 
> Thanks and Regards,
> Paritosh
> 
> On 02-11-2011 09:01, Grant Ingersoll wrote:
>> In reviewing clustering for upcoming training, I'm wondering about something 
>> w/ Canopy clustering that we claim, but wanted to check here first.  In the 
>> lectures, etc. I've seen on it, the idea is to run Canopy first and then 
>> some other more expensive algorithm, such as k-means, etc. with the idea 
>> that items further away than T2 are not even considered when scoring a 
>> centroid in the more complex clustering approach.  However, I think I'm 
>> missing where in the code this actually happens.  We do have code that 
>> allows K-Means to use the Canopy centroids as initial centroids for k-means, 
>> but the other material seemed to imply more aggressive pruning was possible 
>> since points outside of T2 would not even need to be considered.  Otherwise, 
>> it doesn't seem like we are saving anything by doing Canopy first other than 
>> we likely have a better set of starting centroids.  I haven't thought about 
>> how this would be implemented.
>> 
>> Then again, it's late and I'm tired.
>> 
>> -Grant
>> 
>> -----
>> No virus found in this message.
>> Checked by AVG - www.avg.com
>> Version: 10.0.1411 / Virus Database: 2092/3990 - Release Date: 11/01/11
>> 
> 
> 

--------------------------------------------
Grant Ingersoll
http://www.lucidimagination.com



Reply via email to