Methods for Naming Clusters

Paul Ingles Wed, 05 Aug 2009 12:39:46 -0700

Hi,

As I've mentioned in the past, I'm working on clustering documents(albeit relatively small ones). The cluster mechanism I've ended upwith has produced some pretty good results (at least for what I needto be able to do). However, what I'd like to be able to do is find away to automate the naming of these groups.

For example, if each document has a 6/7 word title, I'd like toproduce names that are somewhat logically ordered (that is they makegrammatical sense, this can probably be inferred by the frequency inthe clusters: most documents in a cluster should be well-formed) andshare terms across the majority of the titles.

So far, I'm using a kind of hacked-together longest common substringmethod:


* Sort the titles within the cluster
* Compare every string against every other string, producing a LCS value
* Use the most common LCS

As this is all relatively new ground for me, I was wondering whetherthere were any better methods I could be using?


Thanks,
Paul

Methods for Naming Clusters

Reply via email to