I am doing cluster analysis [hclust(Dist, method="average")] on data that potentially contains redundant objects. As expected, the inclusion of redundant objects affects the clustering result, i.e., the data a1, = a2, = a3, b, c, d, e1, = e2 is likely to cluster differently from the same data without the redundancy, i.e., a1, b, c, d, e1. This is apparent when the outcome is visualized as a dendrogram.
Now, it seems that the clustering result for which the redundancy has been eliminated is more robust for the present assignment than that of the redundant data. Naturally, there is no problem in the elimination: just exclude the redundant objects from Dist. However, it would be very convenient to be able to include the redundant objects in the *dendrogram* by attaching them as 0-level branches to the subtrees, i.e.: 1.0........-------........ 0.5....___|__...._|_...... 0.0.._|_..|..|..|.._|_.... ....|.|.|.|..|..|.|...|... ...a1a2a3.b..c..d.e1.e2... instead of 1.0........-------........ 0.5....___|__...._|_...... 0.0...|...|..|..|...|..... ......a1..b..c..d..e1..... The question: Can this be accomplished in the *dendrogram plot* by manipulating the resulting hclust data structure or by some other means, and if yes, how? Jopi Harri ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.