On Jan 2, 2010, at 2:15 AM, Shashikant Kore wrote:

> On Thu, Dec 31, 2009 at 10:40 PM, Grant Ingersoll <[email protected]> wrote:
>> 
>> The other thing I'm interested in is people's real world feedback on using 
>> clustering to solve their text related problems.
>> For instance, what type of feature reduction did you do (stopword removal, 
>> stemming, etc.)?  What algorithms worked for you?
>> What didn't work?  Any and all insight is welcome and I don't particularly 
>> care if it is Mahout specific (for instance, part of
>> the chapter is about search result clustering using Carrot2 and so Mahout 
>> isn't applicable)
>> 
> 
> 
> Using vector normalization like L2 norm helped quite a bit. 

As I recall, it is important that the choice of norms aligns with the choice of 
distance measures, as well as data source 
(http://www.lucidimagination.com/search/document/34ffc2a83a71a055/centroid_calculations_with_sparse_vectors
 and 
http://www.lucidimagination.com/search/document/34ffc2a83a71a055/centroid_calculations_with_sparse_vectors#3d8310376b6cdf6b)

Reply via email to