Hi Jakob. Just another thought on this. One alternative for your analysis may be to treat it as a 1-way setup [using d from d <- DGEList(counts=rd,group=group)] and use the non-GLM methods. This means that you would be limited to pairwise comparisons, but that may suffice.
Cheers, Mark On 2010-11-23, at 9:43 PM, Mark Robinson wrote: > Hi Jakob. > > The error message is due to the fact that your design matrix is not full rank > (i.e. sum of columns 4-7 = sum of columns 2-3). This is of course my fault, > since I suggested it. > > That setup (previous post) would've worked for an experiment with > observations for every combination of factors, which you do not have > (controls are only measured at T=0, AP2/AP6 are not measured at T=0). > Non-full-rank design matrices can be difficult to interpret, so an > alternative option is this: > > S <- factor(rep( c("Acon","AP6","AP2"), > c(6,5+6+6+5,5+6+5+4))) > T <- factor(rep( c("T00","T06","T12","T24","T48","T06","T12","T24","T48"), > c(6,5,6,6,5,5,6,5,4))) > > f <- factor( paste(S,T,sep=".") ) > design <- model.matrix( ~ f ) > [...] > > As mentioned below, you may need to adjust the reference (here again, control > at T=0) for testing a particular contrast of interest. > > Let us know how you go. > > Cheers, > Mark > > On 2010-11-23, at 12:25 AM, Jakob Hedegaard wrote: > >> Hi Mark, >> >> Thanks. >> >> I have implemented your suggestion for the design matrix when reading the >> data (see below), but encounter the following error when running >> estimateCRDisp: >> >>> d <- estimateCRDisp(d,design) >> Error in chol.default(crossprod(design, .vecmat(mu/(1 + dispersion * mu), : >> the leading minor of order 7 is not positive definite >> >> Applying the simple design: >> design.time <- model.matrix(~ T, data=d$sample) >> does not result in the error.... >> >> Appreciate your help! >> >> Best regards, >> Jakob >> >> >> >> >> library(edgeR) >> r <- read.delim(file.path(datadir, >> "RNAseq_AP-2-6-INF_liver_transcript_pivot.txt")) >> dim(r) >> #[1] 78728 205 >> rdata <- r[2:205] >> #samp <- substr(colnames(r)[2:205],6,10) >> #usamp <- unique(samp) >> samp <- substr(colnames(r)[2:205],16,28) >> usamp <- unique(samp) >> rd <- matrix(NA,nrow=nrow(r),ncol=length(usamp)) >> for (i in 1:length(usamp)){ >> rd[,i] <- rowSums(rdata[,grepl(usamp[i],samp)]) >> } >> colnames(rd) <- usamp >> rownames(rd) <- r[,1] >> # export data >> #write.table(rd,file="RNAseq_AP-2-6-INF_liver_vs_AugSsc9_pivot.txt",sep="\t",quote=F,dec=".",row.names=T,col.names=NA) >> >> group <- substr(unique(substr(colnames(r)[2:205],16,28)),7,13) >> #[1] "control" "control" "control" "control" "control" "control" "AP6.T06" >> "AP6.T06" "AP6.T06" "AP6.T06" "AP6.T06" "AP6.T12" >> #[13] "AP6.T12" "AP6.T12" "AP6.T12" "AP6.T12" "AP6.T12" "AP6.T24" "AP6.T24" >> "AP6.T24" "AP6.T24" "AP6.T24" "AP6.T24" "AP6.T48" >> #[25] "AP6.T48" "AP6.T48" "AP6.T48" "AP6.T48" "AP2.T06" "AP2.T06" "AP2.T06" >> "AP2.T06" "AP2.T06" "AP2.T12" "AP2.T12" "AP2.T12" >> #[37] "AP2.T12" "AP2.T12" "AP2.T12" "AP2.T24" "AP2.T24" "AP2.T24" "AP2.T24" >> "AP2.T24" "AP2.T48" "AP2.T48" "AP2.T48" "AP2.T48" >> d <- DGEList(counts=rd,group=group) >> d <- calcNormFactors(d) >> d$samples$S <- factor(gsub("con","Acon",substr(d$samples$group,1,3))) >> d$samples$T <- factor(gsub("rol","T00",(substr(d$samples$group,5,7)))) >> design <- model.matrix(~ S + T, data=d$sample) >> # Analysis using Cox-Reid common dispersion >> d <- estimateCRDisp(d,design) >> >>> design >> (Intercept) SAP2 SAP6 TT06 TT12 TT24 TT48 >> AP_01_control 1 0 0 0 0 0 0 >> AP_02_control 1 0 0 0 0 0 0 >> AP_03_control 1 0 0 0 0 0 0 >> AP_28_control 1 0 0 0 0 0 0 >> AP_29_control 1 0 0 0 0 0 0 >> AP_30_control 1 0 0 0 0 0 0 >> AP_31_AP6.T06 1 0 1 1 0 0 0 >> AP_32_AP6.T06 1 0 1 1 0 0 0 >> AP_33_AP6.T06 1 0 1 1 0 0 0 >> AP_34_AP6.T06 1 0 1 1 0 0 0 >> AP_36_AP6.T06 1 0 1 1 0 0 0 >> AP_37_AP6.T12 1 0 1 0 1 0 0 >> AP_38_AP6.T12 1 0 1 0 1 0 0 >> AP_39_AP6.T12 1 0 1 0 1 0 0 >> AP_40_AP6.T12 1 0 1 0 1 0 0 >> AP_41_AP6.T12 1 0 1 0 1 0 0 >> AP_42_AP6.T12 1 0 1 0 1 0 0 >> AP_43_AP6.T24 1 0 1 0 0 1 0 >> AP_44_AP6.T24 1 0 1 0 0 1 0 >> AP_45_AP6.T24 1 0 1 0 0 1 0 >> AP_46_AP6.T24 1 0 1 0 0 1 0 >> AP_47_AP6.T24 1 0 1 0 0 1 0 >> AP_48_AP6.T24 1 0 1 0 0 1 0 >> AP_49_AP6.T48 1 0 1 0 0 0 1 >> AP_50_AP6.T48 1 0 1 0 0 0 1 >> AP_51_AP6.T48 1 0 1 0 0 0 1 >> AP_53_AP6.T48 1 0 1 0 0 0 1 >> AP_54_AP6.T48 1 0 1 0 0 0 1 >> AP_55_AP2.T06 1 1 0 1 0 0 0 >> AP_56_AP2.T06 1 1 0 1 0 0 0 >> AP_57_AP2.T06 1 1 0 1 0 0 0 >> AP_58_AP2.T06 1 1 0 1 0 0 0 >> AP_59_AP2.T06 1 1 0 1 0 0 0 >> AP_60_AP2.T12 1 1 0 0 1 0 0 >> AP_62_AP2.T12 1 1 0 0 1 0 0 >> AP_63_AP2.T12 1 1 0 0 1 0 0 >> AP_64_AP2.T12 1 1 0 0 1 0 0 >> AP_65_AP2.T12 1 1 0 0 1 0 0 >> AP_66_AP2.T12 1 1 0 0 1 0 0 >> AP_67_AP2.T24 1 1 0 0 0 1 0 >> AP_69_AP2.T24 1 1 0 0 0 1 0 >> AP_70_AP2.T24 1 1 0 0 0 1 0 >> AP_71_AP2.T24 1 1 0 0 0 1 0 >> AP_72_AP2.T24 1 1 0 0 0 1 0 >> AP_73_AP2.T48 1 1 0 0 0 0 1 >> AP_75_AP2.T48 1 1 0 0 0 0 1 >> AP_76_AP2.T48 1 1 0 0 0 0 1 >> AP_78_AP2.T48 1 1 0 0 0 0 1 >> attr(,"assign") >> [1] 0 1 1 2 2 2 2 >> attr(,"contrasts") >> attr(,"contrasts")$S >> [1] "contr.treatment" >> >> attr(,"contrasts")$T >> [1] "contr.treatment" >> >> >> -----Oprindelig meddelelse----- >> Fra: Mark Robinson [mailto:mrobin...@wehi.edu.au] >> Sendt: 18. november 2010 05:51 >> Til: Jakob Hedegaard >> Cc: bioc-sig-sequencing@r-project.org >> Emne: Re: [Bioc-sig-seq] RNA-Seq - analysis of time series data? >> >> Hi Jakob. >> >> Basically, you can analyze your time course data with edgeR through glmFit() >> as you have already done, but by choosing an appropriate design matrix that >> will allow you to test your contrasts of interest. >> >> For example, you might define a couple factors (please double check that >> this actually matches your $samples element of the DGEList object): >> >> S <- factor(rep( c("Acon","AP6","AP2"), >> c(6,5+6+6+5,5+6+5+4))) >> T <- factor(rep( c("T00","T06","T12","T24","T48","T06","T12","T24","T48"), >> c(6,5,6,6,5,5,6,5,4))) >> >> ... and then a design matrix: >> >> design <- model.matrix( ~ S + T ) >> >> So, this design matrix will look like this: >> >> >>> design <- model.matrix( ~ S + T ) >>> design >> (Intercept) SAP2 SAP6 TT06 TT12 TT24 TT48 >> 1 1 0 0 0 0 0 0 >> 2 1 0 0 0 0 0 0 >> 3 1 0 0 0 0 0 0 >> 4 1 0 0 0 0 0 0 >> 5 1 0 0 0 0 0 0 >> 6 1 0 0 0 0 0 0 >> 7 1 0 1 1 0 0 0 >> 8 1 0 1 1 0 0 0 >> 9 1 0 1 1 0 0 0 >> 10 1 0 1 1 0 0 0 >> 11 1 0 1 1 0 0 0 >> 12 1 0 1 0 1 0 0 >> 13 1 0 1 0 1 0 0 >> 14 1 0 1 0 1 0 0 >> 15 1 0 1 0 1 0 0 >> 16 1 0 1 0 1 0 0 >> 17 1 0 1 0 1 0 0 >> 18 1 0 1 0 0 1 0 >> 19 1 0 1 0 0 1 0 >> 20 1 0 1 0 0 1 0 >> 21 1 0 1 0 0 1 0 >> 22 1 0 1 0 0 1 0 >> 23 1 0 1 0 0 1 0 >> 24 1 0 1 0 0 0 1 >> 25 1 0 1 0 0 0 1 >> 26 1 0 1 0 0 0 1 >> 27 1 0 1 0 0 0 1 >> 28 1 0 1 0 0 0 1 >> 29 1 1 0 1 0 0 0 >> 30 1 1 0 1 0 0 0 >> 31 1 1 0 1 0 0 0 >> 32 1 1 0 1 0 0 0 >> 33 1 1 0 1 0 0 0 >> 34 1 1 0 0 1 0 0 >> 35 1 1 0 0 1 0 0 >> 36 1 1 0 0 1 0 0 >> 37 1 1 0 0 1 0 0 >> 38 1 1 0 0 1 0 0 >> 39 1 1 0 0 1 0 0 >> 40 1 1 0 0 0 1 0 >> 41 1 1 0 0 0 1 0 >> 42 1 1 0 0 0 1 0 >> 43 1 1 0 0 0 1 0 >> 44 1 1 0 0 0 1 0 >> 45 1 1 0 0 0 0 1 >> 46 1 1 0 0 0 0 1 >> 47 1 1 0 0 0 0 1 >> 48 1 1 0 0 0 0 1 >> attr(,"assign") >> [1] 0 1 1 2 2 2 2 >> attr(,"contrasts") >> attr(,"contrasts")$S >> [1] "contr.treatment" >> >> attr(,"contrasts")$T >> [1] "contr.treatment" >> >> >> Note that I've intentionally changed "con" to "Acon" in order to make it the >> control group ... so, the intercept term above represents the abundance >> parameter for the control/T00 samples. Then, you can select your comparison >> of interest through the 'coef' argument of the glmLRT() function. Note that >> you can select multiple columns. >> >> By judicious choice of the design matrix, you should be able to fit any >> contrast of interest. Currently you will probably need to redefine the >> design matrix for a different reference sample to obtain different >> contrasts. We will be adding a 'contrasts' argument to glmLRT() shortly, >> which will make it much easier to investigate different contrasts of >> interest. >> >> Hope that helps. >> >> Cheers, >> Mark >> >> >> On 2010-11-18, at 1:32 AM, Jakob Hedegaard wrote: >> >>> Hi, >>> >>> I wonder how to analyze a RNA-Seq dataset of a time-series experiment. >>> >>> The dataset origin from a challenge experiment with 4-6 samples per time >>> point (T00,T06,T12,T24 and T48) from two series of challenge (with one of >>> two different serotypes). In total 48 individual samples and two factors, >>> time and serotype (see the table below for details) >>> RNA-Seq profiles have been obtained using Illumina GAIIx with multiplexing. >>> >>> I have used edgeR (the Cox-Reid and GLM method) to obtain the genes >>> significantly affected to each time point relative to time zero (e.g. >>> T12-T00) - thus initially ignoring the potential difference between the >>> serotypes. >>> >>> But how can I obtain the genes significantly affected across the different >>> contrastes? >>> >>> >>> group lib.size norm.factors serotype time >>> AP_01 control 3734226 1.0575860 con T00 >>> AP_02 control 4528260 1.0581673 con T00 >>> AP_03 control 3648163 1.0594602 con T00 >>> AP_28 control 4671178 1.0430147 con T00 >>> AP_29 control 3746020 1.0528085 con T00 >>> AP_30 control 4471915 1.1277386 con T00 >>> AP_31 AP6.T06 7384334 0.9187757 AP6 T06 >>> AP_32 AP6.T06 3649621 0.9035999 AP6 T06 >>> AP_33 AP6.T06 5644324 0.8929802 AP6 T06 >>> AP_34 AP6.T06 3791540 0.9396600 AP6 T06 >>> AP_36 AP6.T06 3922243 0.8524249 AP6 T06 >>> AP_37 AP6.T12 3113854 1.0491183 AP6 T12 >>> AP_38 AP6.T12 2153867 1.0996506 AP6 T12 >>> AP_39 AP6.T12 2979503 1.0905274 AP6 T12 >>> AP_40 AP6.T12 5375493 1.0420513 AP6 T12 >>> AP_41 AP6.T12 3769654 0.9094147 AP6 T12 >>> AP_42 AP6.T12 2621303 1.1272673 AP6 T12 >>> AP_43 AP6.T24 3537847 1.0037276 AP6 T24 >>> AP_44 AP6.T24 3660552 1.0757808 AP6 T24 >>> AP_45 AP6.T24 3284038 1.0701603 AP6 T24 >>> AP_46 AP6.T24 3250374 1.0230341 AP6 T24 >>> AP_47 AP6.T24 7208387 0.9535068 AP6 T24 >>> AP_48 AP6.T24 4169075 0.9730747 AP6 T24 >>> AP_49 AP6.T48 5989902 0.9825794 AP6 T48 >>> AP_50 AP6.T48 3529028 0.9471979 AP6 T48 >>> AP_51 AP6.T48 5104071 1.0772029 AP6 T48 >>> AP_53 AP6.T48 4823387 0.9710606 AP6 T48 >>> AP_54 AP6.T48 3788201 1.0976409 AP6 T48 >>> AP_55 AP2.T06 4919578 0.7872065 AP2 T06 >>> AP_56 AP2.T06 4580068 0.9533078 AP2 T06 >>> AP_57 AP2.T06 3908180 0.9193207 AP2 T06 >>> AP_58 AP2.T06 3466801 0.9887874 AP2 T06 >>> AP_59 AP2.T06 4267998 0.8769085 AP2 T06 >>> AP_60 AP2.T12 4533905 0.9058305 AP2 T12 >>> AP_62 AP2.T12 5906089 0.9352388 AP2 T12 >>> AP_63 AP2.T12 3676318 1.1260072 AP2 T12 >>> AP_64 AP2.T12 2206081 1.0579246 AP2 T12 >>> AP_65 AP2.T12 3955338 1.0221930 AP2 T12 >>> AP_66 AP2.T12 3775918 0.9300664 AP2 T12 >>> AP_67 AP2.T24 3853681 0.9659259 AP2 T24 >>> AP_69 AP2.T24 3761829 0.9592286 AP2 T24 >>> AP_70 AP2.T24 4263273 1.0742397 AP2 T24 >>> AP_71 AP2.T24 4736798 0.9864702 AP2 T24 >>> AP_72 AP2.T24 6387114 1.0462401 AP2 T24 >>> AP_73 AP2.T48 3389351 1.0303610 AP2 T48 >>> AP_75 AP2.T48 1489023 1.1863821 AP2 T48 >>> AP_76 AP2.T48 3588175 1.0250639 AP2 T48 >>> AP_78 AP2.T48 3848562 0.9855582 AP2 T48 >>> >>> >>> sessionInfo() >>> R version 2.12.0 (2010-10-15) >>> Platform: x86_64-pc-mingw32/x64 (64-bit) >>> >>> locale: >>> [1] LC_COLLATE=Danish_Denmark.1252 LC_CTYPE=Danish_Denmark.1252 >>> LC_MONETARY=Danish_Denmark.1252 LC_NUMERIC=C >>> LC_TIME=Danish_Denmark.1252 >>> >>> attached base packages: >>> [1] grDevices datasets splines graphics stats tcltk utils >>> methods base >>> >>> other attached packages: >>> [1] edgeR_1.8.2 svSocket_0.9-50 TinnR_1.0.3 R2HTML_2.2 >>> Hmisc_3.8-3 survival_2.35-8 >>> >>> loaded via a namespace (and not attached): >>> [1] cluster_1.13.1 grid_2.12.0 lattice_0.19-13 limma_3.6.6 >>> svMisc_0.9-60 tools_2.12.0 >>> >>> >>> Best regards >>> >>> Jakob Hedegaard >>> Project scientist >>> >>> AARHUS UNIVERSITY >>> Faculty of Agricultural Sciences >>> Dept. of Genetics and Biotechnology >>> Blichers Allé 20, P.O. BOX 50 >>> DK-8830 Tjele >>> Denmark >>> >>> _______________________________________________ >>> Bioc-sig-sequencing mailing list >>> Bioc-sig-sequencing@r-project.org >>> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing >> >> ------------------------------ >> Mark Robinson, PhD (Melb) >> Epigenetics Laboratory, Garvan >> Bioinformatics Division, WEHI >> e: mrobin...@wehi.edu.au >> e: m.robin...@garvan.org.au >> p: +61 (0)3 9345 2628 >> f: +61 (0)3 9347 0852 >> ------------------------------ >> >> >> ______________________________________________________________________ >> The information in this email is confidential and inte...{{dropped:24}} > > _______________________________________________ > Bioc-sig-sequencing mailing list > Bioc-sig-sequencing@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing ------------------------------ Mark Robinson, PhD (Melb) Epigenetics Laboratory, Garvan Bioinformatics Division, WEHI e: mrobin...@wehi.edu.au e: m.robin...@garvan.org.au p: +61 (0)3 9345 2628 f: +61 (0)3 9347 0852 ------------------------------ ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:6}} _______________________________________________ Bioc-sig-sequencing mailing list Bioc-sig-sequencing@r-project.org https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing