Re: [Bioc-sig-seq] "pooled" dispersion estimation in edgeR

Sean Ruddy Mon, 18 Jul 2011 22:01:33 -0700

Works! Thanks!

-Sean


On Mon, Jul 18, 2011 at 9:54 PM, Gordon K Smyth <sm...@wehi.edu.au> wrote:

> Hi Sean,
>
> Sorry, the code I gave works as is with the devel version edgeR.  With the
> official release version you have to set:
>
>  design <- matrix(1,ncol(y),1)
>
> to get the same effect.
>
>
> Best wishes
> Gordon
>
> ------------------------------**---------------
> Professor Gordon K Smyth,
> Bioinformatics Division,
> Walter and Eliza Hall Institute of Medical Research,
> 1G Royal Parade, Parkville, Vic 3052, Australia.
> sm...@wehi.edu.au
> http://www.wehi.edu.au
> http://www.statsci.org/smyth
>
> On Mon, 18 Jul 2011, Sean Ruddy wrote:
>
>  Hi Gordon,
>>
>> I wasn't able to get your suggestion to work. estimateGLMCommonDisp()
>> seems
>> to want explicit values for the design. If I leave the design argument
>> empty
>> I get the error,
>>
>> Error in as.matrix(design) :
>>  argument "design" is missing, with no default
>>
>> I have release 2.8 installed. My code is
>>
>> y <- DGEList( countMat )
>> y$offset <- log( totals )
>> y <- estimateGLMCommonDisp( y , offset = y$offset )
>>
>> Sorry if I'm missing something obvious.
>>
>> Thanks,
>> Sean
>>
>>
>> On Fri, Jul 15, 2011 at 7:26 PM, Gordon K Smyth <sm...@wehi.edu.au>
>> wrote:
>>
>>  Hi Sean,
>>>
>>> On Fri, 15 Jul 2011, Sean Ruddy wrote:
>>>
>>>  Hi Gordon,
>>>
>>>>
>>>> Thanks for the response. One of my data sets has 8 conditions and no
>>>> replicates and so I wanted to emulate DESeq's way of pooling the samples
>>>> and
>>>> also use an offset matrix. I was hoping to avoid doing it manually so
>>>> that I
>>>> don't mess it up. I could do this all in edgeR and pool the samples but
>>>> I'm
>>>> not sure how well this would work under edgeR vs. DESeq.
>>>>
>>>>
>>> edgeR has a very flexible interface, so there was no need to explicitly
>>> introduce a "pooled" method.  Instead, this sort of thing can be handled
>>> by
>>> the usual functions in the usual way.  Suppose you have a data object y,
>>> which includes an offset matrix:
>>>
>>>  y$offset <- your matrix
>>>
>>> Then you can estimate the "pooled" dispersion simply by:
>>>
>>>  y <- estimateGLMCommonDisp(y)
>>>
>>> The fact that you don't supply a design matrix means that the samples are
>>> automatically treated as one group, i.e., pooled.  You can estimate a
>>> trended or tagwise dispersions in the same way.  Then
>>>
>>>  fit <- glmFit(y,design)  etc
>>>
>>> will do any analysis you want using dispersions estimated when the
>>> samples
>>> were pooled.
>>>
>>> I and the other edgeR authors are anxious to get feedback, so write again
>>> if this doesn't turn out to be clear.
>>>
>>>  I am curious though what sounds off to you in my previous email. I don't
>>>
>>>> feel entirely comfortable doing this manually but hopefully it's just
>>>> because I left out some details. I was trying to follow the DESeq method
>>>> and
>>>> the only difference I saw was in the size factor calculations which I
>>>> changed for my own needs by using the offset values for each tag and
>>>> sample.
>>>>
>>>>
>>> Even if you could estimate the variances yourself, I don't see any manual
>>> way that you could perform valid statistical tests, while correctly
>>> accounting for the offsets.  The whole negative binomial methodology
>>> requires genuine counts rather than adjusted counts.  So handling the
>>> offsets needs to be built-in.
>>>
>>> Best wishes
>>> Gordon
>>>
>>>  I appreciate the help!
>>>
>>>>
>>>> Best,
>>>> Sean
>>>>
>>>> On Fri, Jul 15, 2011 at 12:02 AM, Gordon K Smyth <sm...@wehi.edu.au>
>>>> wrote:
>>>>
>>>>  Hi Sean,
>>>>
>>>>>
>>>>> I'm curious to know why not use edgeR, since edgeR does what you want
>>>>> and
>>>>> DESeq doesn't?
>>>>>
>>>>> I might be wrong, but the manual analysis that you describe doesn't
>>>>> sound
>>>>> right.
>>>>>
>>>>> Best wishes
>>>>> Gordon
>>>>>
>>>>>  Date: Thu, 14 Jul 2011 12:54:49 -0700
>>>>>
>>>>>  From: Sean Ruddy <srudd...@gmail.com>
>>>>>> To: bioc-sig-sequencing@r-project.******org<bioc-sig-sequencing@r-**
>>>>>> **
>>>>>> project.org 
>>>>>> <bioc-sig-sequencing@r-**project.org<bioc-sig-sequencing@r-project.org>
>>>>>> >>
>>>>>>
>>>>>> Subject: [Bioc-sig-seq] Supplying own variance functions and adjusted
>>>>>>      counts  to a DESeq dataset
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I have a RNA-Seq count data set that requires separate offset values
>>>>>> for
>>>>>> each tag and sample. DESeq does not appear to take a matrix of offset
>>>>>> values
>>>>>> (unlike edgeR) in any of its functions so I've carried out the
>>>>>> analysis
>>>>>> manually, ie. calculating a size factor for each tag of each sample,
>>>>>> adjusting the counts, then proceeding to calculate means and variances
>>>>>> of
>>>>>> the adjusted counts, and finally fitting a curve for each condition to
>>>>>> the
>>>>>> mean-var plot using locfit().
>>>>>>
>>>>>> Essentially, I'd like to put these variance functions (or at least all
>>>>>> the predicted variances) and adjusted counts inside a DESeq object so
>>>>>> that I
>>>>>> can take advantage of the other functions DESeq offers, tests, plots,
>>>>>> etc...
>>>>>>
>>>>>> Thanks for the help!
>>>>>>
>>>>>> Sean
>>>>>>
>>>>>
> ______________________________**______________________________**__________
> The information in this email is confidential and inte...{{dropped:10}}

_______________________________________________
Bioc-sig-sequencing mailing list
Bioc-sig-sequencing@r-project.org
https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing

Re: [Bioc-sig-seq] "pooled" dispersion estimation in edgeR

Reply via email to