Hi Chrysanthi,

Chrysanthi A. wrote:

Thanks a lot..! What exactly the sweep function is doing? Also, is there a possibility instead of using the mean of the whole row to get only the mean of a group of the row values? So the values in the matrix (heat map) used in the comparison are z-scores and not the intensities of the gene expressions, right?

I was trying to give a subtle hint below, but maybe I should be a bit more blunt. One of the coolest things about R is that it is free, and there are these sweet listservs where people give advice and help for free as well.

HOWEVER, there is still a price to pay, and that is with your time. All of these functions have help pages that the developers spent time writing, and the code is there for you to peruse. Because of this, there is some expectation that you would have done so prior to asking questions. Now I have read the help page for sweep, and quite frankly it is a bit confusing. The term 'sweep' is used without definition, so if one doesn't know what that means the help page is less than helpful. But it doesn't take much time or effort to empirically see what it does:

> a <- matrix(rnorm(25), ncol=5)
> a
           [,1]       [,2]       [,3]        [,4]        [,5]
[1,]  0.6841637 -1.0590185 -0.1719887 -0.01916011 -1.61936817
[2,]  0.5707217  1.4790968  1.6736991 -0.72158518  1.22467334
[3,]  0.4440499 -0.3382888 -0.1504191  0.32140022  1.83780859
[4,] -0.6659568  3.0573678 -1.5709904 -1.35618488 -0.01717017
[5,] -0.3182206  2.2777597 -0.2325356 -0.02001414  1.77440090
> rm <- rowMeans(a)
> rm
[1] -0.4370743  0.8453211  0.4229102 -0.1105869  0.6962780
> sweep(a, 1, rm, "-")
            [,1]       [,2]       [,3]       [,4]        [,5]
[1,]  1.12123808 -0.6219441  0.2650857  0.4179142 -1.18229384
[2,] -0.27459943  0.6337756  0.8283779 -1.5669063  0.37935220
[3,]  0.02113977 -0.7611990 -0.5733293 -0.1015100  1.41489842
[4,] -0.55536988  3.1679546 -1.4604035 -1.2455980  0.09341672
[5,] -1.01449866  1.5814817 -0.9288137 -0.7162922  1.07812286

For your second question:

?heatmap.2



Also, as I can understand from the code, heatmap is using distfun function for the clusering. Can I use pearson correlation for the clustering? My main object of using the heatmap is to examine the expression levels of the marker genes and to confirm that the marker genes are clearly differentially expressed in the two subtypes of the disease that I examine.

No, heatmap.2() is not using distfun for the clustering. There isn't a function by that name in either gplots nor base R. If you look at the help page, you can see that distfun is an argument to the function, and the default is to use the dist() function.

You can use Pearson correlation, but in my experience it takes some work. Again, if you read the help page, you can see that the Rowv and Colv arguments can be one of TRUE, FALSE, NULL, or a dendrogram. So if you want to use Pearson correlation, you should supply heatmap.2() with dendrograms produced using that correlation. So an example:

a <- matrix(rnorm(50), ncol=5)
rowv <- as.dendrogram(hclust(as.dist(1-cor(t(a)))))
colv <- as.dendrogram(hclust(as.dist(1-cor(a))))
heatmap.2(a, scale="row", Rowv=rowv, Colv=colv)

Best,

Jim




Many thanks,

Chrysanthi.


2009/7/8 James W. MacDonald <jmac...@med.umich.edu <mailto:jmac...@med.umich.edu>>

    Hi Chrysanthi,


    Chrysanthi A. wrote:

        Hi,

        I am analysing gene expression data using the heatmap.2 function
        in R and I
        was wondering what is the formula of the "raw z-score" bar which
        shows the
        colors for each pixel.
        According to that post:
        
https://mailman.stat.ethz.ch/pipermail/r-help/2006-September/113598.html,
        it
        is the

        (actual value - mean of the group) / standard deviation.

        But, mean of which group? Mean of the gene vector? And actual
        value of that
        gene on a sample?  I would be grateful if you could give me some
        more
        details about it or even if there is a book/manual that I could
        address
        to..


    How about looking at the code?

       if (scale == "row") {
           retval$rowMeans <- rm <- rowMeans(x, na.rm = na.rm)
           x <- sweep(x, 1, rm)
           retval$rowSDs <- sx <- apply(x, 1, sd, na.rm = na.rm)
           x <- sweep(x, 1, sx, "/")
       }
       else if (scale == "column") {
           retval$colMeans <- rm <- colMeans(x, na.rm = na.rm)
           x <- sweep(x, 2, rm)
           retval$colSDs <- sx <- apply(x, 2, sd, na.rm = na.rm)
           x <- sweep(x, 2, sx, "/")
       }

    So the z-score is calculated on either the row or column (or the
    default of "none").

    I don't see how you can get something saying 'raw z-score'. I get
    either 'Row Z-Score' or 'Column Z-Score'. So assuming you meant Row
    Z-Score, then the rows are centered and scaled by subtracting the
    mean of the row from every value and then dividing the resulting
    values by the standard deviation of the row.

    Best,

    Jim



        Thanks a lot,

        Chrysanthi.

        *
        *

               [[alternative HTML version deleted]]

        ______________________________________________
        R-help@r-project.org <mailto:R-help@r-project.org> mailing list
        https://stat.ethz.ch/mailman/listinfo/r-help
        PLEASE do read the posting guide
        http://www.R-project.org/posting-guide.html
        and provide commented, minimal, self-contained, reproducible code.


-- James W. MacDonald, M.S.
    Biostatistician
    Douglas Lab
    University of Michigan
    Department of Human Genetics
    5912 Buhl
    1241 E. Catherine St.
    Ann Arbor MI 48109-5618
    734-615-7826



--
James W. MacDonald, M.S.
Biostatistician
Douglas Lab
University of Michigan
Department of Human Genetics
5912 Buhl
1241 E. Catherine St.
Ann Arbor MI 48109-5618
734-615-7826

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to