Re: [Bioc-sig-seq] RNA-Seq - analysis of time series data?

Mark Robinson Thu, 25 Nov 2010 01:42:00 -0800

Hi Jakob.

Just another thought on this.  One alternative for your analysis may be to 
treat it as a 1-way setup [using d from d <- DGEList(counts=rd,group=group)] 
and use the non-GLM methods.  This means that you would be limited to pairwise 
comparisons, but that may suffice.


Cheers,
Mark

On 2010-11-23, at 9:43 PM, Mark Robinson wrote:

> Hi Jakob.
> 
> The error message is due to the fact that your design matrix is not full rank 
> (i.e. sum of columns 4-7 = sum of columns 2-3).  This is of course my fault, 
> since I suggested it.
> 
> That setup (previous post) would've worked for an experiment with 
> observations for every combination of factors, which you do not have 
> (controls are only measured at T=0, AP2/AP6 are not measured at T=0).  
> Non-full-rank design matrices can be difficult to interpret, so an 
> alternative option is this:
> 
> S <- factor(rep( c("Acon","AP6","AP2"), 
>               c(6,5+6+6+5,5+6+5+4)))
> T <- factor(rep( c("T00","T06","T12","T24","T48","T06","T12","T24","T48"), 
>               c(6,5,6,6,5,5,6,5,4)))
> 
> f <- factor( paste(S,T,sep=".") )
> design <- model.matrix( ~ f )
> [...]
> 
> As mentioned below, you may need to adjust the reference (here again, control 
> at T=0) for testing a particular contrast of interest.
> 
> Let us know how you go.
> 
> Cheers,
> Mark
> 
> On 2010-11-23, at 12:25 AM, Jakob Hedegaard wrote:
> 
>> Hi Mark,
>> 
>> Thanks.
>> 
>> I have implemented your suggestion for the design matrix when reading the 
>> data (see below), but encounter the following error when running 
>> estimateCRDisp:
>> 
>>> d <- estimateCRDisp(d,design)
>> Error in chol.default(crossprod(design, .vecmat(mu/(1 + dispersion * mu),  : 
>> the leading minor of order 7 is not positive definite
>> 
>> Applying the simple design:
>> design.time <- model.matrix(~ T, data=d$sample)
>> does not result in the error....
>> 
>> Appreciate your help!
>> 
>> Best regards,
>> Jakob
>> 
>> 
>> 
>> 
>> library(edgeR)
>> r <- read.delim(file.path(datadir, 
>> "RNAseq_AP-2-6-INF_liver_transcript_pivot.txt"))
>> dim(r)
>> #[1] 78728   205
>> rdata <- r[2:205]
>> #samp <- substr(colnames(r)[2:205],6,10)
>> #usamp <- unique(samp)
>> samp <- substr(colnames(r)[2:205],16,28)
>> usamp <- unique(samp)        
>> rd <- matrix(NA,nrow=nrow(r),ncol=length(usamp))
>> for (i in 1:length(usamp)){
>> rd[,i] <- rowSums(rdata[,grepl(usamp[i],samp)])
>> }
>> colnames(rd) <- usamp
>> rownames(rd) <- r[,1]
>> # export data
>> #write.table(rd,file="RNAseq_AP-2-6-INF_liver_vs_AugSsc9_pivot.txt",sep="\t",quote=F,dec=".",row.names=T,col.names=NA)
>> 
>> group <- substr(unique(substr(colnames(r)[2:205],16,28)),7,13)
>> #[1] "control" "control" "control" "control" "control" "control" "AP6.T06" 
>> "AP6.T06" "AP6.T06" "AP6.T06" "AP6.T06" "AP6.T12"
>> #[13] "AP6.T12" "AP6.T12" "AP6.T12" "AP6.T12" "AP6.T12" "AP6.T24" "AP6.T24" 
>> "AP6.T24" "AP6.T24" "AP6.T24" "AP6.T24" "AP6.T48"
>> #[25] "AP6.T48" "AP6.T48" "AP6.T48" "AP6.T48" "AP2.T06" "AP2.T06" "AP2.T06" 
>> "AP2.T06" "AP2.T06" "AP2.T12" "AP2.T12" "AP2.T12"
>> #[37] "AP2.T12" "AP2.T12" "AP2.T12" "AP2.T24" "AP2.T24" "AP2.T24" "AP2.T24" 
>> "AP2.T24" "AP2.T48" "AP2.T48" "AP2.T48" "AP2.T48"
>> d <- DGEList(counts=rd,group=group)
>> d <- calcNormFactors(d)
>> d$samples$S <- factor(gsub("con","Acon",substr(d$samples$group,1,3)))
>> d$samples$T <- factor(gsub("rol","T00",(substr(d$samples$group,5,7))))
>> design <- model.matrix(~ S + T, data=d$sample)
>> # Analysis using Cox-Reid common dispersion
>> d <- estimateCRDisp(d,design)
>> 
>>> design
>>             (Intercept) SAP2 SAP6 TT06 TT12 TT24 TT48
>> AP_01_control           1    0    0    0    0    0    0
>> AP_02_control           1    0    0    0    0    0    0
>> AP_03_control           1    0    0    0    0    0    0
>> AP_28_control           1    0    0    0    0    0    0
>> AP_29_control           1    0    0    0    0    0    0
>> AP_30_control           1    0    0    0    0    0    0
>> AP_31_AP6.T06           1    0    1    1    0    0    0
>> AP_32_AP6.T06           1    0    1    1    0    0    0
>> AP_33_AP6.T06           1    0    1    1    0    0    0
>> AP_34_AP6.T06           1    0    1    1    0    0    0
>> AP_36_AP6.T06           1    0    1    1    0    0    0
>> AP_37_AP6.T12           1    0    1    0    1    0    0
>> AP_38_AP6.T12           1    0    1    0    1    0    0
>> AP_39_AP6.T12           1    0    1    0    1    0    0
>> AP_40_AP6.T12           1    0    1    0    1    0    0
>> AP_41_AP6.T12           1    0    1    0    1    0    0
>> AP_42_AP6.T12           1    0    1    0    1    0    0
>> AP_43_AP6.T24           1    0    1    0    0    1    0
>> AP_44_AP6.T24           1    0    1    0    0    1    0
>> AP_45_AP6.T24           1    0    1    0    0    1    0
>> AP_46_AP6.T24           1    0    1    0    0    1    0
>> AP_47_AP6.T24           1    0    1    0    0    1    0
>> AP_48_AP6.T24           1    0    1    0    0    1    0
>> AP_49_AP6.T48           1    0    1    0    0    0    1
>> AP_50_AP6.T48           1    0    1    0    0    0    1
>> AP_51_AP6.T48           1    0    1    0    0    0    1
>> AP_53_AP6.T48           1    0    1    0    0    0    1
>> AP_54_AP6.T48           1    0    1    0    0    0    1
>> AP_55_AP2.T06           1    1    0    1    0    0    0
>> AP_56_AP2.T06           1    1    0    1    0    0    0
>> AP_57_AP2.T06           1    1    0    1    0    0    0
>> AP_58_AP2.T06           1    1    0    1    0    0    0
>> AP_59_AP2.T06           1    1    0    1    0    0    0
>> AP_60_AP2.T12           1    1    0    0    1    0    0
>> AP_62_AP2.T12           1    1    0    0    1    0    0
>> AP_63_AP2.T12           1    1    0    0    1    0    0
>> AP_64_AP2.T12           1    1    0    0    1    0    0
>> AP_65_AP2.T12           1    1    0    0    1    0    0
>> AP_66_AP2.T12           1    1    0    0    1    0    0
>> AP_67_AP2.T24           1    1    0    0    0    1    0
>> AP_69_AP2.T24           1    1    0    0    0    1    0
>> AP_70_AP2.T24           1    1    0    0    0    1    0
>> AP_71_AP2.T24           1    1    0    0    0    1    0
>> AP_72_AP2.T24           1    1    0    0    0    1    0
>> AP_73_AP2.T48           1    1    0    0    0    0    1
>> AP_75_AP2.T48           1    1    0    0    0    0    1
>> AP_76_AP2.T48           1    1    0    0    0    0    1
>> AP_78_AP2.T48           1    1    0    0    0    0    1
>> attr(,"assign")
>> [1] 0 1 1 2 2 2 2
>> attr(,"contrasts")
>> attr(,"contrasts")$S
>> [1] "contr.treatment"
>> 
>> attr(,"contrasts")$T
>> [1] "contr.treatment"
>> 
>> 
>> -----Oprindelig meddelelse-----
>> Fra: Mark Robinson [mailto:mrobin...@wehi.edu.au] 
>> Sendt: 18. november 2010 05:51
>> Til: Jakob Hedegaard
>> Cc: bioc-sig-sequencing@r-project.org
>> Emne: Re: [Bioc-sig-seq] RNA-Seq - analysis of time series data?
>> 
>> Hi Jakob.
>> 
>> Basically, you can analyze your time course data with edgeR through glmFit() 
>> as you have already done, but by choosing an appropriate design matrix that 
>> will allow you to test your contrasts of interest.
>> 
>> For example, you might define a couple factors (please double check that 
>> this actually matches your $samples element of the DGEList object):
>> 
>> S <- factor(rep( c("Acon","AP6","AP2"), 
>>               c(6,5+6+6+5,5+6+5+4)))
>> T <- factor(rep( c("T00","T06","T12","T24","T48","T06","T12","T24","T48"), 
>>               c(6,5,6,6,5,5,6,5,4)))
>> 
>> ... and then a design matrix:
>> 
>> design <- model.matrix( ~ S + T )
>> 
>> So, this design matrix will look like this:
>> 
>> 
>>> design <- model.matrix( ~ S + T )
>>> design
>> (Intercept) SAP2 SAP6 TT06 TT12 TT24 TT48
>> 1            1    0    0    0    0    0    0
>> 2            1    0    0    0    0    0    0
>> 3            1    0    0    0    0    0    0
>> 4            1    0    0    0    0    0    0
>> 5            1    0    0    0    0    0    0
>> 6            1    0    0    0    0    0    0
>> 7            1    0    1    1    0    0    0
>> 8            1    0    1    1    0    0    0
>> 9            1    0    1    1    0    0    0
>> 10           1    0    1    1    0    0    0
>> 11           1    0    1    1    0    0    0
>> 12           1    0    1    0    1    0    0
>> 13           1    0    1    0    1    0    0
>> 14           1    0    1    0    1    0    0
>> 15           1    0    1    0    1    0    0
>> 16           1    0    1    0    1    0    0
>> 17           1    0    1    0    1    0    0
>> 18           1    0    1    0    0    1    0
>> 19           1    0    1    0    0    1    0
>> 20           1    0    1    0    0    1    0
>> 21           1    0    1    0    0    1    0
>> 22           1    0    1    0    0    1    0
>> 23           1    0    1    0    0    1    0
>> 24           1    0    1    0    0    0    1
>> 25           1    0    1    0    0    0    1
>> 26           1    0    1    0    0    0    1
>> 27           1    0    1    0    0    0    1
>> 28           1    0    1    0    0    0    1
>> 29           1    1    0    1    0    0    0
>> 30           1    1    0    1    0    0    0
>> 31           1    1    0    1    0    0    0
>> 32           1    1    0    1    0    0    0
>> 33           1    1    0    1    0    0    0
>> 34           1    1    0    0    1    0    0
>> 35           1    1    0    0    1    0    0
>> 36           1    1    0    0    1    0    0
>> 37           1    1    0    0    1    0    0
>> 38           1    1    0    0    1    0    0
>> 39           1    1    0    0    1    0    0
>> 40           1    1    0    0    0    1    0
>> 41           1    1    0    0    0    1    0
>> 42           1    1    0    0    0    1    0
>> 43           1    1    0    0    0    1    0
>> 44           1    1    0    0    0    1    0
>> 45           1    1    0    0    0    0    1
>> 46           1    1    0    0    0    0    1
>> 47           1    1    0    0    0    0    1
>> 48           1    1    0    0    0    0    1
>> attr(,"assign")
>> [1] 0 1 1 2 2 2 2
>> attr(,"contrasts")
>> attr(,"contrasts")$S
>> [1] "contr.treatment"
>> 
>> attr(,"contrasts")$T
>> [1] "contr.treatment"
>> 
>> 
>> Note that I've intentionally changed "con" to "Acon" in order to make it the 
>> control group ... so, the intercept term above represents the abundance 
>> parameter for the control/T00 samples.  Then, you can select your comparison 
>> of interest through the 'coef' argument of the glmLRT() function.  Note that 
>> you can select multiple columns.
>> 
>> By judicious choice of the design matrix, you should be able to fit any 
>> contrast of interest. Currently you will probably need to redefine the 
>> design matrix for a different reference sample to obtain different 
>> contrasts.  We will be adding a 'contrasts' argument to glmLRT() shortly, 
>> which will make it much easier to investigate different contrasts of 
>> interest.
>> 
>> Hope that helps.
>> 
>> Cheers,
>> Mark
>> 
>> 
>> On 2010-11-18, at 1:32 AM, Jakob Hedegaard wrote:
>> 
>>> Hi,
>>> 
>>> I wonder how to analyze a RNA-Seq dataset of a time-series experiment.
>>> 
>>> The dataset origin from a challenge experiment with 4-6 samples per time 
>>> point (T00,T06,T12,T24 and T48) from two series of challenge (with one of 
>>> two different serotypes). In total 48 individual samples and two factors, 
>>> time and serotype (see the table below for details)
>>> RNA-Seq profiles have been obtained using Illumina GAIIx with multiplexing.
>>> 
>>> I have used edgeR (the Cox-Reid and GLM method) to obtain the genes 
>>> significantly affected to each time point relative to time zero (e.g. 
>>> T12-T00) - thus initially ignoring the potential difference between the 
>>> serotypes.
>>> 
>>> But how can I obtain the genes significantly affected across the different 
>>> contrastes?
>>> 
>>> 
>>>      group lib.size norm.factors serotype time
>>> AP_01 control  3734226    1.0575860      con  T00
>>> AP_02 control  4528260    1.0581673      con  T00
>>> AP_03 control  3648163    1.0594602      con  T00
>>> AP_28 control  4671178    1.0430147      con  T00
>>> AP_29 control  3746020    1.0528085      con  T00
>>> AP_30 control  4471915    1.1277386      con  T00
>>> AP_31 AP6.T06  7384334    0.9187757      AP6  T06
>>> AP_32 AP6.T06  3649621    0.9035999      AP6  T06
>>> AP_33 AP6.T06  5644324    0.8929802      AP6  T06
>>> AP_34 AP6.T06  3791540    0.9396600      AP6  T06
>>> AP_36 AP6.T06  3922243    0.8524249      AP6  T06
>>> AP_37 AP6.T12  3113854    1.0491183      AP6  T12
>>> AP_38 AP6.T12  2153867    1.0996506      AP6  T12
>>> AP_39 AP6.T12  2979503    1.0905274      AP6  T12
>>> AP_40 AP6.T12  5375493    1.0420513      AP6  T12
>>> AP_41 AP6.T12  3769654    0.9094147      AP6  T12
>>> AP_42 AP6.T12  2621303    1.1272673      AP6  T12
>>> AP_43 AP6.T24  3537847    1.0037276      AP6  T24
>>> AP_44 AP6.T24  3660552    1.0757808      AP6  T24
>>> AP_45 AP6.T24  3284038    1.0701603      AP6  T24
>>> AP_46 AP6.T24  3250374    1.0230341      AP6  T24
>>> AP_47 AP6.T24  7208387    0.9535068      AP6  T24
>>> AP_48 AP6.T24  4169075    0.9730747      AP6  T24
>>> AP_49 AP6.T48  5989902    0.9825794      AP6  T48
>>> AP_50 AP6.T48  3529028    0.9471979      AP6  T48
>>> AP_51 AP6.T48  5104071    1.0772029      AP6  T48
>>> AP_53 AP6.T48  4823387    0.9710606      AP6  T48
>>> AP_54 AP6.T48  3788201    1.0976409      AP6  T48
>>> AP_55 AP2.T06  4919578    0.7872065      AP2  T06
>>> AP_56 AP2.T06  4580068    0.9533078      AP2  T06
>>> AP_57 AP2.T06  3908180    0.9193207      AP2  T06
>>> AP_58 AP2.T06  3466801    0.9887874      AP2  T06
>>> AP_59 AP2.T06  4267998    0.8769085      AP2  T06
>>> AP_60 AP2.T12  4533905    0.9058305      AP2  T12
>>> AP_62 AP2.T12  5906089    0.9352388      AP2  T12
>>> AP_63 AP2.T12  3676318    1.1260072      AP2  T12
>>> AP_64 AP2.T12  2206081    1.0579246      AP2  T12
>>> AP_65 AP2.T12  3955338    1.0221930      AP2  T12
>>> AP_66 AP2.T12  3775918    0.9300664      AP2  T12
>>> AP_67 AP2.T24  3853681    0.9659259      AP2  T24
>>> AP_69 AP2.T24  3761829    0.9592286      AP2  T24
>>> AP_70 AP2.T24  4263273    1.0742397      AP2  T24
>>> AP_71 AP2.T24  4736798    0.9864702      AP2  T24
>>> AP_72 AP2.T24  6387114    1.0462401      AP2  T24
>>> AP_73 AP2.T48  3389351    1.0303610      AP2  T48
>>> AP_75 AP2.T48  1489023    1.1863821      AP2  T48
>>> AP_76 AP2.T48  3588175    1.0250639      AP2  T48
>>> AP_78 AP2.T48  3848562    0.9855582      AP2  T48
>>> 
>>> 
>>> sessionInfo()
>>> R version 2.12.0 (2010-10-15)
>>> Platform: x86_64-pc-mingw32/x64 (64-bit)
>>> 
>>> locale:
>>> [1] LC_COLLATE=Danish_Denmark.1252  LC_CTYPE=Danish_Denmark.1252    
>>> LC_MONETARY=Danish_Denmark.1252 LC_NUMERIC=C                    
>>> LC_TIME=Danish_Denmark.1252    
>>> 
>>> attached base packages:
>>> [1] grDevices datasets  splines   graphics  stats     tcltk     utils     
>>> methods   base     
>>> 
>>> other attached packages:
>>> [1] edgeR_1.8.2     svSocket_0.9-50 TinnR_1.0.3     R2HTML_2.2      
>>> Hmisc_3.8-3     survival_2.35-8
>>> 
>>> loaded via a namespace (and not attached):
>>> [1] cluster_1.13.1  grid_2.12.0     lattice_0.19-13 limma_3.6.6     
>>> svMisc_0.9-60   tools_2.12.0  
>>> 
>>> 
>>> Best regards
>>> 
>>> Jakob Hedegaard
>>> Project scientist
>>> 
>>> AARHUS UNIVERSITY
>>> Faculty of Agricultural Sciences
>>> Dept. of Genetics and Biotechnology
>>> Blichers Allé 20, P.O. BOX 50
>>> DK-8830 Tjele
>>> Denmark
>>> 
>>> _______________________________________________
>>> Bioc-sig-sequencing mailing list
>>> Bioc-sig-sequencing@r-project.org
>>> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
>> 
>> ------------------------------
>> Mark Robinson, PhD (Melb)
>> Epigenetics Laboratory, Garvan
>> Bioinformatics Division, WEHI
>> e: mrobin...@wehi.edu.au
>> e: m.robin...@garvan.org.au
>> p: +61 (0)3 9345 2628
>> f: +61 (0)3 9347 0852
>> ------------------------------
>> 
>> 
>> ______________________________________________________________________
>> The information in this email is confidential and inte...{{dropped:24}}
> 
> _______________________________________________
> Bioc-sig-sequencing mailing list
> Bioc-sig-sequencing@r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing

------------------------------
Mark Robinson, PhD (Melb)
Epigenetics Laboratory, Garvan
Bioinformatics Division, WEHI
e: mrobin...@wehi.edu.au
e: m.robin...@garvan.org.au
p: +61 (0)3 9345 2628
f: +61 (0)3 9347 0852
------------------------------


______________________________________________________________________
The information in this email is confidential and intend...{{dropped:6}}

_______________________________________________
Bioc-sig-sequencing mailing list
Bioc-sig-sequencing@r-project.org
https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing

Re: [Bioc-sig-seq] RNA-Seq - analysis of time series data?

Reply via email to