Hi Jakob.
Just another thought on this. One alternative for your analysis may be to
treat it as a 1-way setup [using d from d <- DGEList(counts=rd,group=group)]
and use the non-GLM methods. This means that you would be limited to pairwise
comparisons, but that may suffice.
Cheers,
Mark
On 2010-11-23, at 9:43 PM, Mark Robinson wrote:
> Hi Jakob.
>
> The error message is due to the fact that your design matrix is not full rank
> (i.e. sum of columns 4-7 = sum of columns 2-3). This is of course my fault,
> since I suggested it.
>
> That setup (previous post) would've worked for an experiment with
> observations for every combination of factors, which you do not have
> (controls are only measured at T=0, AP2/AP6 are not measured at T=0).
> Non-full-rank design matrices can be difficult to interpret, so an
> alternative option is this:
>
> S <- factor(rep( c("Acon","AP6","AP2"),
> c(6,5+6+6+5,5+6+5+4)))
> T <- factor(rep( c("T00","T06","T12","T24","T48","T06","T12","T24","T48"),
> c(6,5,6,6,5,5,6,5,4)))
>
> f <- factor( paste(S,T,sep=".") )
> design <- model.matrix( ~ f )
> [...]
>
> As mentioned below, you may need to adjust the reference (here again, control
> at T=0) for testing a particular contrast of interest.
>
> Let us know how you go.
>
> Cheers,
> Mark
>
> On 2010-11-23, at 12:25 AM, Jakob Hedegaard wrote:
>
>> Hi Mark,
>>
>> Thanks.
>>
>> I have implemented your suggestion for the design matrix when reading the
>> data (see below), but encounter the following error when running
>> estimateCRDisp:
>>
>>> d <- estimateCRDisp(d,design)
>> Error in chol.default(crossprod(design, .vecmat(mu/(1 + dispersion * mu), :
>> the leading minor of order 7 is not positive definite
>>
>> Applying the simple design:
>> design.time <- model.matrix(~ T, data=d$sample)
>> does not result in the error....
>>
>> Appreciate your help!
>>
>> Best regards,
>> Jakob
>>
>>
>>
>>
>> library(edgeR)
>> r <- read.delim(file.path(datadir,
>> "RNAseq_AP-2-6-INF_liver_transcript_pivot.txt"))
>> dim(r)
>> #[1] 78728 205
>> rdata <- r[2:205]
>> #samp <- substr(colnames(r)[2:205],6,10)
>> #usamp <- unique(samp)
>> samp <- substr(colnames(r)[2:205],16,28)
>> usamp <- unique(samp)
>> rd <- matrix(NA,nrow=nrow(r),ncol=length(usamp))
>> for (i in 1:length(usamp)){
>> rd[,i] <- rowSums(rdata[,grepl(usamp[i],samp)])
>> }
>> colnames(rd) <- usamp
>> rownames(rd) <- r[,1]
>> # export data
>> #write.table(rd,file="RNAseq_AP-2-6-INF_liver_vs_AugSsc9_pivot.txt",sep="\t",quote=F,dec=".",row.names=T,col.names=NA)
>>
>> group <- substr(unique(substr(colnames(r)[2:205],16,28)),7,13)
>> #[1] "control" "control" "control" "control" "control" "control" "AP6.T06"
>> "AP6.T06" "AP6.T06" "AP6.T06" "AP6.T06" "AP6.T12"
>> #[13] "AP6.T12" "AP6.T12" "AP6.T12" "AP6.T12" "AP6.T12" "AP6.T24" "AP6.T24"
>> "AP6.T24" "AP6.T24" "AP6.T24" "AP6.T24" "AP6.T48"
>> #[25] "AP6.T48" "AP6.T48" "AP6.T48" "AP6.T48" "AP2.T06" "AP2.T06" "AP2.T06"
>> "AP2.T06" "AP2.T06" "AP2.T12" "AP2.T12" "AP2.T12"
>> #[37] "AP2.T12" "AP2.T12" "AP2.T12" "AP2.T24" "AP2.T24" "AP2.T24" "AP2.T24"
>> "AP2.T24" "AP2.T48" "AP2.T48" "AP2.T48" "AP2.T48"
>> d <- DGEList(counts=rd,group=group)
>> d <- calcNormFactors(d)
>> d$samples$S <- factor(gsub("con","Acon",substr(d$samples$group,1,3)))
>> d$samples$T <- factor(gsub("rol","T00",(substr(d$samples$group,5,7))))
>> design <- model.matrix(~ S + T, data=d$sample)
>> # Analysis using Cox-Reid common dispersion
>> d <- estimateCRDisp(d,design)
>>
>>> design
>> (Intercept) SAP2 SAP6 TT06 TT12 TT24 TT48
>> AP_01_control 1 0 0 0 0 0 0
>> AP_02_control 1 0 0 0 0 0 0
>> AP_03_control 1 0 0 0 0 0 0
>> AP_28_control 1 0 0 0 0 0 0
>> AP_29_control 1 0 0 0 0 0 0
>> AP_30_control 1 0 0 0 0 0 0
>> AP_31_AP6.T06 1 0 1 1 0 0 0
>> AP_32_AP6.T06 1 0 1 1 0 0 0
>> AP_33_AP6.T06 1 0 1 1 0 0 0
>> AP_34_AP6.T06 1 0 1 1 0 0 0
>> AP_36_AP6.T06 1 0 1 1 0 0 0
>> AP_37_AP6.T12 1 0 1 0 1 0 0
>> AP_38_AP6.T12 1 0 1 0 1 0 0
>> AP_39_AP6.T12 1 0 1 0 1 0 0
>> AP_40_AP6.T12 1 0 1 0 1 0 0
>> AP_41_AP6.T12 1 0 1 0 1 0 0
>> AP_42_AP6.T12 1 0 1 0 1 0 0
>> AP_43_AP6.T24 1 0 1 0 0 1 0
>> AP_44_AP6.T24 1 0 1 0 0 1 0
>> AP_45_AP6.T24 1 0 1 0 0 1 0
>> AP_46_AP6.T24 1 0 1 0 0 1 0
>> AP_47_AP6.T24 1 0 1 0 0 1 0
>> AP_48_AP6.T24 1 0 1 0 0 1 0
>> AP_49_AP6.T48 1 0 1 0 0 0 1
>> AP_50_AP6.T48 1 0 1 0 0 0 1
>> AP_51_AP6.T48 1 0 1 0 0 0 1
>> AP_53_AP6.T48 1 0 1 0 0 0 1
>> AP_54_AP6.T48 1 0 1 0 0 0 1
>> AP_55_AP2.T06 1 1 0 1 0 0 0
>> AP_56_AP2.T06 1 1 0 1 0 0 0
>> AP_57_AP2.T06 1 1 0 1 0 0 0
>> AP_58_AP2.T06 1 1 0 1 0 0 0
>> AP_59_AP2.T06 1 1 0 1 0 0 0
>> AP_60_AP2.T12 1 1 0 0 1 0 0
>> AP_62_AP2.T12 1 1 0 0 1 0 0
>> AP_63_AP2.T12 1 1 0 0 1 0 0
>> AP_64_AP2.T12 1 1 0 0 1 0 0
>> AP_65_AP2.T12 1 1 0 0 1 0 0
>> AP_66_AP2.T12 1 1 0 0 1 0 0
>> AP_67_AP2.T24 1 1 0 0 0 1 0
>> AP_69_AP2.T24 1 1 0 0 0 1 0
>> AP_70_AP2.T24 1 1 0 0 0 1 0
>> AP_71_AP2.T24 1 1 0 0 0 1 0
>> AP_72_AP2.T24 1 1 0 0 0 1 0
>> AP_73_AP2.T48 1 1 0 0 0 0 1
>> AP_75_AP2.T48 1 1 0 0 0 0 1
>> AP_76_AP2.T48 1 1 0 0 0 0 1
>> AP_78_AP2.T48 1 1 0 0 0 0 1
>> attr(,"assign")
>> [1] 0 1 1 2 2 2 2
>> attr(,"contrasts")
>> attr(,"contrasts")$S
>> [1] "contr.treatment"
>>
>> attr(,"contrasts")$T
>> [1] "contr.treatment"
>>
>>
>> -----Oprindelig meddelelse-----
>> Fra: Mark Robinson [mailto:[email protected]]
>> Sendt: 18. november 2010 05:51
>> Til: Jakob Hedegaard
>> Cc: [email protected]
>> Emne: Re: [Bioc-sig-seq] RNA-Seq - analysis of time series data?
>>
>> Hi Jakob.
>>
>> Basically, you can analyze your time course data with edgeR through glmFit()
>> as you have already done, but by choosing an appropriate design matrix that
>> will allow you to test your contrasts of interest.
>>
>> For example, you might define a couple factors (please double check that
>> this actually matches your $samples element of the DGEList object):
>>
>> S <- factor(rep( c("Acon","AP6","AP2"),
>> c(6,5+6+6+5,5+6+5+4)))
>> T <- factor(rep( c("T00","T06","T12","T24","T48","T06","T12","T24","T48"),
>> c(6,5,6,6,5,5,6,5,4)))
>>
>> ... and then a design matrix:
>>
>> design <- model.matrix( ~ S + T )
>>
>> So, this design matrix will look like this:
>>
>>
>>> design <- model.matrix( ~ S + T )
>>> design
>> (Intercept) SAP2 SAP6 TT06 TT12 TT24 TT48
>> 1 1 0 0 0 0 0 0
>> 2 1 0 0 0 0 0 0
>> 3 1 0 0 0 0 0 0
>> 4 1 0 0 0 0 0 0
>> 5 1 0 0 0 0 0 0
>> 6 1 0 0 0 0 0 0
>> 7 1 0 1 1 0 0 0
>> 8 1 0 1 1 0 0 0
>> 9 1 0 1 1 0 0 0
>> 10 1 0 1 1 0 0 0
>> 11 1 0 1 1 0 0 0
>> 12 1 0 1 0 1 0 0
>> 13 1 0 1 0 1 0 0
>> 14 1 0 1 0 1 0 0
>> 15 1 0 1 0 1 0 0
>> 16 1 0 1 0 1 0 0
>> 17 1 0 1 0 1 0 0
>> 18 1 0 1 0 0 1 0
>> 19 1 0 1 0 0 1 0
>> 20 1 0 1 0 0 1 0
>> 21 1 0 1 0 0 1 0
>> 22 1 0 1 0 0 1 0
>> 23 1 0 1 0 0 1 0
>> 24 1 0 1 0 0 0 1
>> 25 1 0 1 0 0 0 1
>> 26 1 0 1 0 0 0 1
>> 27 1 0 1 0 0 0 1
>> 28 1 0 1 0 0 0 1
>> 29 1 1 0 1 0 0 0
>> 30 1 1 0 1 0 0 0
>> 31 1 1 0 1 0 0 0
>> 32 1 1 0 1 0 0 0
>> 33 1 1 0 1 0 0 0
>> 34 1 1 0 0 1 0 0
>> 35 1 1 0 0 1 0 0
>> 36 1 1 0 0 1 0 0
>> 37 1 1 0 0 1 0 0
>> 38 1 1 0 0 1 0 0
>> 39 1 1 0 0 1 0 0
>> 40 1 1 0 0 0 1 0
>> 41 1 1 0 0 0 1 0
>> 42 1 1 0 0 0 1 0
>> 43 1 1 0 0 0 1 0
>> 44 1 1 0 0 0 1 0
>> 45 1 1 0 0 0 0 1
>> 46 1 1 0 0 0 0 1
>> 47 1 1 0 0 0 0 1
>> 48 1 1 0 0 0 0 1
>> attr(,"assign")
>> [1] 0 1 1 2 2 2 2
>> attr(,"contrasts")
>> attr(,"contrasts")$S
>> [1] "contr.treatment"
>>
>> attr(,"contrasts")$T
>> [1] "contr.treatment"
>>
>>
>> Note that I've intentionally changed "con" to "Acon" in order to make it the
>> control group ... so, the intercept term above represents the abundance
>> parameter for the control/T00 samples. Then, you can select your comparison
>> of interest through the 'coef' argument of the glmLRT() function. Note that
>> you can select multiple columns.
>>
>> By judicious choice of the design matrix, you should be able to fit any
>> contrast of interest. Currently you will probably need to redefine the
>> design matrix for a different reference sample to obtain different
>> contrasts. We will be adding a 'contrasts' argument to glmLRT() shortly,
>> which will make it much easier to investigate different contrasts of
>> interest.
>>
>> Hope that helps.
>>
>> Cheers,
>> Mark
>>
>>
>> On 2010-11-18, at 1:32 AM, Jakob Hedegaard wrote:
>>
>>> Hi,
>>>
>>> I wonder how to analyze a RNA-Seq dataset of a time-series experiment.
>>>
>>> The dataset origin from a challenge experiment with 4-6 samples per time
>>> point (T00,T06,T12,T24 and T48) from two series of challenge (with one of
>>> two different serotypes). In total 48 individual samples and two factors,
>>> time and serotype (see the table below for details)
>>> RNA-Seq profiles have been obtained using Illumina GAIIx with multiplexing.
>>>
>>> I have used edgeR (the Cox-Reid and GLM method) to obtain the genes
>>> significantly affected to each time point relative to time zero (e.g.
>>> T12-T00) - thus initially ignoring the potential difference between the
>>> serotypes.
>>>
>>> But how can I obtain the genes significantly affected across the different
>>> contrastes?
>>>
>>>
>>> group lib.size norm.factors serotype time
>>> AP_01 control 3734226 1.0575860 con T00
>>> AP_02 control 4528260 1.0581673 con T00
>>> AP_03 control 3648163 1.0594602 con T00
>>> AP_28 control 4671178 1.0430147 con T00
>>> AP_29 control 3746020 1.0528085 con T00
>>> AP_30 control 4471915 1.1277386 con T00
>>> AP_31 AP6.T06 7384334 0.9187757 AP6 T06
>>> AP_32 AP6.T06 3649621 0.9035999 AP6 T06
>>> AP_33 AP6.T06 5644324 0.8929802 AP6 T06
>>> AP_34 AP6.T06 3791540 0.9396600 AP6 T06
>>> AP_36 AP6.T06 3922243 0.8524249 AP6 T06
>>> AP_37 AP6.T12 3113854 1.0491183 AP6 T12
>>> AP_38 AP6.T12 2153867 1.0996506 AP6 T12
>>> AP_39 AP6.T12 2979503 1.0905274 AP6 T12
>>> AP_40 AP6.T12 5375493 1.0420513 AP6 T12
>>> AP_41 AP6.T12 3769654 0.9094147 AP6 T12
>>> AP_42 AP6.T12 2621303 1.1272673 AP6 T12
>>> AP_43 AP6.T24 3537847 1.0037276 AP6 T24
>>> AP_44 AP6.T24 3660552 1.0757808 AP6 T24
>>> AP_45 AP6.T24 3284038 1.0701603 AP6 T24
>>> AP_46 AP6.T24 3250374 1.0230341 AP6 T24
>>> AP_47 AP6.T24 7208387 0.9535068 AP6 T24
>>> AP_48 AP6.T24 4169075 0.9730747 AP6 T24
>>> AP_49 AP6.T48 5989902 0.9825794 AP6 T48
>>> AP_50 AP6.T48 3529028 0.9471979 AP6 T48
>>> AP_51 AP6.T48 5104071 1.0772029 AP6 T48
>>> AP_53 AP6.T48 4823387 0.9710606 AP6 T48
>>> AP_54 AP6.T48 3788201 1.0976409 AP6 T48
>>> AP_55 AP2.T06 4919578 0.7872065 AP2 T06
>>> AP_56 AP2.T06 4580068 0.9533078 AP2 T06
>>> AP_57 AP2.T06 3908180 0.9193207 AP2 T06
>>> AP_58 AP2.T06 3466801 0.9887874 AP2 T06
>>> AP_59 AP2.T06 4267998 0.8769085 AP2 T06
>>> AP_60 AP2.T12 4533905 0.9058305 AP2 T12
>>> AP_62 AP2.T12 5906089 0.9352388 AP2 T12
>>> AP_63 AP2.T12 3676318 1.1260072 AP2 T12
>>> AP_64 AP2.T12 2206081 1.0579246 AP2 T12
>>> AP_65 AP2.T12 3955338 1.0221930 AP2 T12
>>> AP_66 AP2.T12 3775918 0.9300664 AP2 T12
>>> AP_67 AP2.T24 3853681 0.9659259 AP2 T24
>>> AP_69 AP2.T24 3761829 0.9592286 AP2 T24
>>> AP_70 AP2.T24 4263273 1.0742397 AP2 T24
>>> AP_71 AP2.T24 4736798 0.9864702 AP2 T24
>>> AP_72 AP2.T24 6387114 1.0462401 AP2 T24
>>> AP_73 AP2.T48 3389351 1.0303610 AP2 T48
>>> AP_75 AP2.T48 1489023 1.1863821 AP2 T48
>>> AP_76 AP2.T48 3588175 1.0250639 AP2 T48
>>> AP_78 AP2.T48 3848562 0.9855582 AP2 T48
>>>
>>>
>>> sessionInfo()
>>> R version 2.12.0 (2010-10-15)
>>> Platform: x86_64-pc-mingw32/x64 (64-bit)
>>>
>>> locale:
>>> [1] LC_COLLATE=Danish_Denmark.1252 LC_CTYPE=Danish_Denmark.1252
>>> LC_MONETARY=Danish_Denmark.1252 LC_NUMERIC=C
>>> LC_TIME=Danish_Denmark.1252
>>>
>>> attached base packages:
>>> [1] grDevices datasets splines graphics stats tcltk utils
>>> methods base
>>>
>>> other attached packages:
>>> [1] edgeR_1.8.2 svSocket_0.9-50 TinnR_1.0.3 R2HTML_2.2
>>> Hmisc_3.8-3 survival_2.35-8
>>>
>>> loaded via a namespace (and not attached):
>>> [1] cluster_1.13.1 grid_2.12.0 lattice_0.19-13 limma_3.6.6
>>> svMisc_0.9-60 tools_2.12.0
>>>
>>>
>>> Best regards
>>>
>>> Jakob Hedegaard
>>> Project scientist
>>>
>>> AARHUS UNIVERSITY
>>> Faculty of Agricultural Sciences
>>> Dept. of Genetics and Biotechnology
>>> Blichers Allé 20, P.O. BOX 50
>>> DK-8830 Tjele
>>> Denmark
>>>
>>> _______________________________________________
>>> Bioc-sig-sequencing mailing list
>>> [email protected]
>>> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
>>
>> ------------------------------
>> Mark Robinson, PhD (Melb)
>> Epigenetics Laboratory, Garvan
>> Bioinformatics Division, WEHI
>> e: [email protected]
>> e: [email protected]
>> p: +61 (0)3 9345 2628
>> f: +61 (0)3 9347 0852
>> ------------------------------
>>
>>
>> ______________________________________________________________________
>> The information in this email is confidential and inte...{{dropped:24}}
>
> _______________________________________________
> Bioc-sig-sequencing mailing list
> [email protected]
> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
------------------------------
Mark Robinson, PhD (Melb)
Epigenetics Laboratory, Garvan
Bioinformatics Division, WEHI
e: [email protected]
e: [email protected]
p: +61 (0)3 9345 2628
f: +61 (0)3 9347 0852
------------------------------
______________________________________________________________________
The information in this email is confidential and intend...{{dropped:6}}
_______________________________________________
Bioc-sig-sequencing mailing list
[email protected]
https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing