Stefan Groschupf wrote:Hi,How is document clustering different/related to text categorization?
Clustering: try to find own categories and put documents that match in it. You group all documents with minimal distance together.
Would I be correct to say that you have to define a "distance threshold"
parameter in order to define when to build a new category for a certain
group?
Depends on the type of clustering algorithm. Some clustering algorithms take the number of clusters as a parameter (in this case the algorithm may be run several times with different values, to determine the best value). Other types of algorithms, such as hierarchical agglomerative clustering algorithms, work more as you suggest.
Regards,
Joshua O'Madadhain
[EMAIL PROTECTED] Per Obscurius...www.ics.uci.edu/~jmadden
Joshua O'Madadhain: Information Scientist, Musician, Philosopher-At-Tall
It's that moment of dawning comprehension that I live for--Bill Watterson
My opinions are too rational and insightful to be those of any organization.
--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]