I had hacked the code to put labels for the vectors. Then I modified KMeans to output the document label, Cluster ID, and distance from the cluster. Another utility takes this input and converts labels to the actual text files from which it is created. Then I do random checks manually for the documents in a cluster.
Ugly, but at least I know clustering is "working." The "top" terms of the cluster may give some idea about the documents in the cluster. --shashi On Wed, Jun 17, 2009 at 3:05 AM, Grant Ingersoll<[email protected]> wrote: > What tools/approaches are people using to validate their clustering output? > Are there utilities that we should be implementing that would make this > easier for users? > > -Grant >
