Re: [Bioc-sig-seq] "pooled" dispersion estimation in edgeR

Gordon K Smyth Mon, 18 Jul 2011 21:54:57 -0700

Hi Sean,

Sorry, the code I gave works as is with the devel version edgeR. With theofficial release version you have to set:


  design <- matrix(1,ncol(y),1)

to get the same effect.

Best wishes
Gordon

---------------------------------------------
Professor Gordon K Smyth,
Bioinformatics Division,
Walter and Eliza Hall Institute of Medical Research,
1G Royal Parade, Parkville, Vic 3052, Australia.
sm...@wehi.edu.au
http://www.wehi.edu.au
http://www.statsci.org/smyth

On Mon, 18 Jul 2011, Sean Ruddy wrote:

Hi Gordon,

I wasn't able to get your suggestion to work. estimateGLMCommonDisp() seems
to want explicit values for the design. If I leave the design argument empty
I get the error,

Error in as.matrix(design) :
 argument "design" is missing, with no default

I have release 2.8 installed. My code is

y <- DGEList( countMat )
y$offset <- log( totals )
y <- estimateGLMCommonDisp( y , offset = y$offset )

Sorry if I'm missing something obvious.

Thanks,
Sean


On Fri, Jul 15, 2011 at 7:26 PM, Gordon K Smyth <sm...@wehi.edu.au> wrote:

Hi Sean,

On Fri, 15 Jul 2011, Sean Ruddy wrote:

 Hi Gordon,


Thanks for the response. One of my data sets has 8 conditions and no
replicates and so I wanted to emulate DESeq's way of pooling the samples and
also use an offset matrix. I was hoping to avoid doing it manually so that I
don't mess it up. I could do this all in edgeR and pool the samples but I'm
not sure how well this would work under edgeR vs. DESeq.


edgeR has a very flexible interface, so there was no need to explicitly
introduce a "pooled" method.  Instead, this sort of thing can be handled by
the usual functions in the usual way.  Suppose you have a data object y,
which includes an offset matrix:

  y$offset <- your matrix

Then you can estimate the "pooled" dispersion simply by:

  y <- estimateGLMCommonDisp(y)

The fact that you don't supply a design matrix means that the samples are
automatically treated as one group, i.e., pooled.  You can estimate a
trended or tagwise dispersions in the same way.  Then

  fit <- glmFit(y,design)  etc

will do any analysis you want using dispersions estimated when the samples
were pooled.

I and the other edgeR authors are anxious to get feedback, so write again
if this doesn't turn out to be clear.

 I am curious though what sounds off to you in my previous email. I don't

feel entirely comfortable doing this manually but hopefully it's just
because I left out some details. I was trying to follow the DESeq method and
the only difference I saw was in the size factor calculations which I
changed for my own needs by using the offset values for each tag and sample.


Even if you could estimate the variances yourself, I don't see any manual
way that you could perform valid statistical tests, while correctly
accounting for the offsets.  The whole negative binomial methodology
requires genuine counts rather than adjusted counts.  So handling the
offsets needs to be built-in.

Best wishes
Gordon

 I appreciate the help!


Best,
Sean

On Fri, Jul 15, 2011 at 12:02 AM, Gordon K Smyth <sm...@wehi.edu.au>
wrote:

 Hi Sean,


I'm curious to know why not use edgeR, since edgeR does what you want and
DESeq doesn't?

I might be wrong, but the manual analysis that you describe doesn't sound
right.

Best wishes
Gordon

 Date: Thu, 14 Jul 2011 12:54:49 -0700

From: Sean Ruddy <srudd...@gmail.com>
To: bioc-sig-sequencing@r-project.****org<bioc-sig-sequencing@r-**
project.org <bioc-sig-sequencing@r-project.org>>
Subject: [Bioc-sig-seq] Supplying own variance functions and adjusted
      counts  to a DESeq dataset

Hi,

I have a RNA-Seq count data set that requires separate offset values for
each tag and sample. DESeq does not appear to take a matrix of offset values
(unlike edgeR) in any of its functions so I've carried out the analysis
manually, ie. calculating a size factor for each tag of each sample,
adjusting the counts, then proceeding to calculate means and variances of
the adjusted counts, and finally fitting a curve for each condition to the
mean-var plot using locfit().

Essentially, I'd like to put these variance functions (or at least all
the predicted variances) and adjusted counts inside a DESeq object so that I
can take advantage of the other functions DESeq offers, tests, plots, etc...

Thanks for the help!

Sean


______________________________________________________________________
The information in this email is confidential and intend...{{dropped:4}}

_______________________________________________
Bioc-sig-sequencing mailing list
Bioc-sig-sequencing@r-project.org
https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing

Re: [Bioc-sig-seq] "pooled" dispersion estimation in edgeR

Reply via email to