"Chia C Chong" <[EMAIL PROTECTED]> writes: >Hi!
>I am new in this area..I wonder which clustering algorithm is the most >frequently used and maybe the most robust?? This question has may levels, ranging from the decision between agglomerative and "seed based" methods, touching the choice of an appropriate measure of diatance or similarity and terminating in the descision on a method to form clusters. I will try to answer the question of choosing one of the classic agglomerations method for agglomerative hierarchical cluster analysis. The choice may depend on what you are trying to achive. If you want to detect outliers, single linkage is the method of choice. Observations that are joined very late, and at rather high levels of dissimlarity are potential canditates for further inspection (probable outliers). The disadvantage of single linkage is that two groups can be "joined" at an early stage if there is a single observation that formas a "bridge" between them. The complete linkage method has a tendency to form small homogenous clusters at an early stage, but because the distance between clusters is defined as the dinstance between their most dissimilar members, clusters that are in fact quite similar can stay separate until quite a late stage of the agglomeration process. Ward's method will stress the demand for homogentiy within a cluster, but it will probably not be your tool of choice if you are interested in detecting sturctures in your data that go beyond mere "within cluster homogenity". Average linkage will be computationaly expensive, with may or may not be a point to take into consideration depending on the size of your data set, but avoids some disadvantages of the other methods, depending on what you are trying to achive. Maybe the most important point to make about cluster analysis was made by Fowlkes et al. (1987, Variable selection in clustering and other contexts): "In the murky area of cluster analysis, where there is so little guiding theory, informal graphical approaches which can be used in a highly interactive manner are not only very useful but perhaps even essential for getting the job done." There is no silver bullet for detecting clusters. The important thing is to look at your results in connection with your data. A useful technique is to use a graphical display of your data to visualize and evaluate different approaches to detect clusters. Kurt -- | Kurt Watzka | [EMAIL PROTECTED] ================================================================= Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =================================================================