Hello Hadley, Tormod and every one else.

I just published a post on my blog, giving the code and presenting an
example of it's use (on the Iris data set)
http://www.r-statistics.com/2010/06/clustergram-a-graph-for-visualizing-cluster-analyses-r-code/

I welcome any comments (pitfalls, suggestions or ideas) regarding this
method of visualizing cluster analysis in the hope that all of us can learn
from each others knowledge.

And again, thank you Hadley for offering your advice.

Best,
Tal



----------------Contact
Details:-------------------------------------------------------
Contact me: tal.gal...@gmail.com |  972-52-7275845
Read me: www.talgalili.com (Hebrew) | www.biostatistics.co.il (Hebrew) |
www.r-statistics.com (English)
----------------------------------------------------------------------------------------------




On Tue, Jun 15, 2010 at 5:04 PM, Tal Galili <tal.gal...@gmail.com> wrote:

> Hi Hadley,
>
> I wrapped the code into a function.
> I made it so all the lines would always start from the cluster mean.
> And I tried to give more meaning to the colors by giving the
> color according the the order of the first principal component of that
> observation.
>
> What do you think ?
>
> Tal
>
>
>
>
> # -------------------------------
>
>
> clustergram <- function(Data, k.range = 2:10 ,
>  clustering.function = kmeans,
> line.width = .004, add.center.points = T)
> {
> n <- dim(Data)[1]
>  PCA.1 <- Data %*% princomp(Data)$loadings[,1] # first principal component
> of our data
>
>
> COL <- heat_hcl(n)[order(PCA.1)] # line colors
>
>  line.width <- rep(line.width, n)
>  Y <- NULL # Y matrix
>  X <- NULL # X matrix
>
> plot(0,0, col = "white", xlim = range(k.range), ylim = range(PCA.1),
> xlab = "Number of clusters (k)", ylab = "Mean of the first principal
> component by clusters", main = "Clustergram of first principal component
> mean by k-mean clusters")
>  axis(side =1, at = k.range)
> abline(v = k.range, col = "grey")
>
>  centers.points <- list()
>
> for(k in k.range)
> {
>  cl <- clustering.function(Data, k)
>  clusters.vec <- cl$cluster
>  # the.centers <- apply(cl$centers,1, mean)
> the.centers <- cl$centers %*% princomp(Data)$loadings[,1]
>
> noise <- unlist(tapply(line.width, clusters.vec,
> cumsum))[order(seq_along(clusters.vec)[order(clusters.vec)])]
>  # noise <- noise - mean(range(noise))
> y <- the.centers[clusters.vec] + noise
>  Y <- cbind(Y, y)
> x <- rep(k, length(y))
> X <- cbind(X, x)
>
> centers.points[[k]] <- data.frame(y = the.centers , x = rep(k , k))
>  # points(the.centers ~ rep(k , k), pch = 19, col = "red", cex = 1.5)
> }
>
> require(colorspace)
> matlines(t(X), t(Y), pch = 19, col = COL, lty = 1, lwd = 1.5)
>
> if(add.center.points)
> {
> # add points
>  suppressMessages(lapply(centers.points, function(xx) {
> with(xx,points(y~x, pch = 19, col = "red", cex = 1.3))
>  return(1)
> }))
> }
>
> }
>
>
> set.seed(250)
> Data <- rbind(matrix(rnorm(100, sd = 0.3), ncol = 2),
>            matrix(rnorm(100, mean = 1, sd = 0.3), ncol = 2))
>  clustergram(Data, k.range = 2:8 , line.width = .004, add.center.points =
> T)
>
>
>
>
>
>
>
> ----------------Contact
> Details:-------------------------------------------------------
> Contact me: tal.gal...@gmail.com |  972-52-7275845
> Read me: www.talgalili.com (Hebrew) | www.biostatistics.co.il (Hebrew) |
> www.r-statistics.com (English)
>
> ----------------------------------------------------------------------------------------------
>
>
>
>
> On Tue, Jun 15, 2010 at 4:46 PM, Hadley Wickham <had...@rice.edu> wrote:
>
>> > The glitches are the cases where you would have a bundle of lines
>> belonging
>> > to a specific cluster, but had spaces between them (because the place of
>> one
>> > of the lines was saved for another line that in the meantime moved to
>> > another cluster).
>>
>> I think that display looked just fine!
>>
>> > I just came up with a solution for how to resolve this (After showering,
>> it
>> > tends to help my thinking...) - it is attached at the bottom of this
>> e-mail.
>> > I will later cleanup the code a bit and publish it.
>>
>> I'd also suggest reordering the lines within each cluster mean so that
>> (e.g.) all the lines going from 1a to 2a are all in the same position
>> (i.e. at the top of the bundle of lines, not interspersed throughout).
>>
>> And again, think about using the colour for something useful, maybe
>> the value of the variable that you're averaging over to get the y
>> position.
>>
>> Hadley
>>
>> --
>> Assistant Professor / Dobelman Family Junior Chair
>> Department of Statistics / Rice University
>> http://had.co.nz/
>>
>
>

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to