Re: Reproducing RMA with Gene ST data (Was: Re: [aroma.affymetrix] Re: How do you analyze Gene ST Data?)
Hi Andy. I don't think you've gotten a response on this. Sorry for the delay -- holidays. Some comments below. On 31/12/2008, at 1:18 AM, Andy_Paparountas wrote: > > Hi all , > > I really find this conversation very interesting. I am trying to > analyze a set of 3 treatment and 3 control samples of MoGeneSt10 > array. Thus far with the code pwhite shared I was able to do RMA > Background correction , quantile normalization and got QC , RLE , > NUSE , density plots. > > Q1. Is there any code to get similar results to affyQCreport? or even > how can we use affyQCreport to get QC from these arrays? As far as I know, affyQCreport has not been ported to aroma.affymetrix. I usually make due with RLE, NUSE and density plots for my QC. If there is something specific in affyQCreport that you like, it may be easy to port over. Maybe you'd consider doing the implementation. > Q2. I tried to export my data to an AffyBatch object in order to play > around with older methods > ab <- extractAffyBatch(cs) > > but I got a Warning message: > "CDF enviroment package 'mogene10stv1cdf' not installed. The 'affy' > package will later try to download from Bioconductor and install it." > > of course 'mogene10stv1cdf' does not exist as far as I know , > instead we should use "mogene10st.db". > > But what should the exact code be to connect the normalized data to > the annotation contained inside "mogene10st.db" ? A couple points here. First, it looks like Bioconductor is not currently supporting the 'affy' way of doing things for these new (1.0 ST) chips. If you skim the BioC mailing list archives, the suggestion is to use the 'oligo' package or 'xps'. But, then you are outside the world of AffyBatch objects. So, it doesn't make sense to use aroma.affymetrix's 'extractAffyBatch' for these chips. Second, I believe 'mogene10st.db' only really maps the Gene 1.0 ST identifiers to GO attributes, UNIGENE ids, chromosome locations and a whole host of other things. I don't think the physical probe locations are present within 'mogene10st.db', so it is not a replacement for the CDF file/environment. Hope that helps. Mark > I would really appreciate some help here :) > > Thanks all. > > > On 5 ΔΡκ, 17:43, pwhite...@gmail.com wrote: >> Hi Mark, >> >> Thanks for adding flavor="oligo" to RmaPlm. I verified it with the >> new >> release and the HGU133Plus2 data I have and it all looks good. >> Pairs plots >> are attached. >> >> Thanks, >> >> Peter >> >> On Thu, Dec 4, 2008 at 5:41 PM, Mark Robinson >> wrote: >> >>> Thanks Peter. >> >>> Perhaps you can repeat this comparison after the next release (this >>> will be very soon!) and split the aroma.affymetrix comparison into: >> >>> - aroma.affy.oligo -- with RmaPlm(csN,flavor="oligo") >>> - aroma.affy.affyPLM -- with flavor="affyPLM" (as you've done >>> already) >> >>> Perhaps the best way to look at all of this at once is with a single >>> pairs() plot. >> >>> Cheers, >>> Mark >> >>> On 05/12/2008, at 9:01 AM, pwhite...@gmail.com wrote: >> Dear Mark and Henrik, >> I wanted to confirm that your summary was correct regarding the different flavors for probeset summarization. I downloaded the MAQC HG_U133_Plus_2 array data from the MAQC website: >> http://edkb.fda.gov/MAQC/MainStudy/upload/MAQC_AFX_123456_120CELs.zip >> I then ran the analysis of the arrays from site 1, using just the A and B samples, with aroma.affymetrix, affy, affyPLM and oligo (see below for the complete code I used to do this). Basically the aroma.affymetrix and affyPLM data was essentially identical. The affy and oligo data was also essentially identical. As observed with the Gene ST array data there were significant differences between aroma.affymetrix and affy or oligo. Plots are attached. >> The Gene ST arrays do not have any MM probes - as we are using RMA rather than GCRMA this should not have affected anything. >> Thanks, >> Peter >> #OLIGO ANALYSIS >> library(pd.hg.u133.plus.2) library(pdInfoBuilder) fn <- dir("G:\\BGC_EXPERIMENTS\\MAQC_Data\\HG- U133_Plus_2","CEL",full=T)[1:10] raw.oligo<-read.celfiles(filenames=fn,pkgname="pd.hg.u133.plus.2") eset.oligo<-rma(raw.oligo) data.oligo<-exprs(eset.oligo) >> #AFFY ANALYSIS >> library(affy) fn <- dir("G:\\BGC_EXPERIMENTS\\MAQC_Data\\HG- U133_Plus_2","CEL",full=T)[1:10] raw.affy <- ReadAffy(filenames=fn) eset.affy <- rma(raw.affy) data.affy <- exprs(eset.affy) >> #AFFY PLM ANALYSIS >> library(affyPLM) fn <- dir("G:\\BGC_EXPERIMENTS\\MAQC_Data\\HG- U133_Plus_2","CEL",full=T)[1:10] raw.affyPLM <- ReadAffy(filenames=fn) fit.affyPLM <- fitPLM(raw.affyPLM, verbos=9) data.affyPLM <- coefs(fit.affyPLM) #Analysis of MAQC on Human U113 Plus 2 >> setwd("G:\\BGC_EXPERIMENTS\\MAQC_Analysis") library(aroma.affymetr
Re: Reproducing RMA with Gene ST data (Was: Re: [aroma.affymetrix] Re: How do you analyze Gene ST Data?)
Hi all , I really find this conversation very interesting. I am trying to analyze a set of 3 treatment and 3 control samples of MoGeneSt10 array. Thus far with the code pwhite shared I was able to do RMA Background correction , quantile normalization and got QC , RLE , NUSE , density plots. Q1. Is there any code to get similar results to affyQCreport? or even how can we use affyQCreport to get QC from these arrays? Q2. I tried to export my data to an AffyBatch object in order to play around with older methods ab <- extractAffyBatch(cs) but I got a Warning message: "CDF enviroment package 'mogene10stv1cdf' not installed. The 'affy' package will later try to download from Bioconductor and install it." of course 'mogene10stv1cdf' does not exist as far as I know , instead we should use "mogene10st.db". But what should the exact code be to connect the normalized data to the annotation contained inside "mogene10st.db" ? I would really appreciate some help here :) Thanks all. On 5 Δεκ, 17:43, pwhite...@gmail.com wrote: > Hi Mark, > > Thanks for adding flavor="oligo" to RmaPlm. I verified it with the new > release and the HGU133Plus2 data I have and it all looks good. Pairs plots > are attached. > > Thanks, > > Peter > > On Thu, Dec 4, 2008 at 5:41 PM, Mark Robinson wrote: > > > Thanks Peter. > > > Perhaps you can repeat this comparison after the next release (this > > will be very soon!) and split the aroma.affymetrix comparison into: > > > - aroma.affy.oligo -- with RmaPlm(csN,flavor="oligo") > > - aroma.affy.affyPLM -- with flavor="affyPLM" (as you've done already) > > > Perhaps the best way to look at all of this at once is with a single > > pairs() plot. > > > Cheers, > > Mark > > > On 05/12/2008, at 9:01 AM, pwhite...@gmail.com wrote: > > > > Dear Mark and Henrik, > > > > I wanted to confirm that your summary was correct regarding the > > > different flavors for probeset summarization. I downloaded the MAQC > > > HG_U133_Plus_2 array data from the MAQC website: > > > >http://edkb.fda.gov/MAQC/MainStudy/upload/MAQC_AFX_123456_120CELs.zip > > > > I then ran the analysis of the arrays from site 1, using just the A > > > and B samples, with aroma.affymetrix, affy, affyPLM and oligo (see > > > below for the complete code I used to do this). Basically the > > > aroma.affymetrix and affyPLM data was essentially identical. The > > > affy and oligo data was also essentially identical. As observed with > > > the Gene ST array data there were significant differences between > > > aroma.affymetrix and affy or oligo. Plots are attached. > > > > The Gene ST arrays do not have any MM probes - as we are using RMA > > > rather than GCRMA this should not have affected anything. > > > > Thanks, > > > > Peter > > > > #OLIGO ANALYSIS > > > > library(pd.hg.u133.plus.2) > > > library(pdInfoBuilder) > > > fn <- dir("G:\\BGC_EXPERIMENTS\\MAQC_Data\\HG- > > > U133_Plus_2","CEL",full=T)[1:10] > > > raw.oligo<-read.celfiles(filenames=fn,pkgname="pd.hg.u133.plus.2") > > > eset.oligo<-rma(raw.oligo) > > > data.oligo<-exprs(eset.oligo) > > > > #AFFY ANALYSIS > > > > library(affy) > > > fn <- dir("G:\\BGC_EXPERIMENTS\\MAQC_Data\\HG- > > > U133_Plus_2","CEL",full=T)[1:10] > > > raw.affy <- ReadAffy(filenames=fn) > > > eset.affy <- rma(raw.affy) > > > data.affy <- exprs(eset.affy) > > > > #AFFY PLM ANALYSIS > > > > library(affyPLM) > > > fn <- dir("G:\\BGC_EXPERIMENTS\\MAQC_Data\\HG- > > > U133_Plus_2","CEL",full=T)[1:10] > > > raw.affyPLM <- ReadAffy(filenames=fn) > > > fit.affyPLM <- fitPLM(raw.affyPLM, verbos=9) > > > data.affyPLM <- coefs(fit.affyPLM) > > > #Analysis of MAQC on Human U113 Plus 2 > > > > setwd("G:\\BGC_EXPERIMENTS\\MAQC_Analysis") > > > library(aroma.affymetrix) > > > prefixName <- "MAQC_Data" > > > chip1 <- "HG-U133_Plus_2" > > > cdf <- AffymetrixCdfFile$fromChipType("HG-U133_Plus_2") > > > cs <- AffymetrixCelSet$byName(prefixName, cdf=cdf, chipType=chip1) > > > pattern <- "AFX_1_[AB]" > > > idxs <- grep(pattern, getNames(cs)) > > > cs <- extract(cs, idxs) > > > bc <- RmaBackgroundCorrection(cs) > > > csBC <- process(bc) > > > qn <- QuantileNormalization(csBC, typesToUpdate="pm") > > > csN <- process(qn) > > > plm <- RmaPlm(csN, flavor="affyPLM") #flavor="oligo", must > > > library(oligo) > > > fit(plm) > > > ces <- getChipEffectSet(plm) > > > getExprs <- function(ces, ...) { > > > cdf <- getCdf(ces) > > > theta <- extractMatrix(ces, ..., returnUgcMap=TRUE) > > > ugcMap <- attr(theta, "unitGroupCellMap") > > > un<-getUnitNames(cdf, ugcMap[,"unit"]) > > > rownames(theta) <- un > > > log2(theta) > > > } > > > data.aroma <- getExprs(ces) > > > > #COMPARING THE DATASETS > > > > > dim(data.affy) > > > [1] 54675 10 > > > > dim(data.affyPLM) > > > [1] 54675 10 > > > > dim(data.oligo) > > > [1] 54613 10 > > > > dim(data.aroma) > > > [1] 54675 10 > > > > #Unlike in the Gene ST analysis the packages do not return the > > > probes in the same order so be careful to reorder them. Also not > >
Re: Reproducing RMA with Gene ST data (Was: Re: [aroma.affymetrix] Re: How do you analyze Gene ST Data?)
Thanks Peter. Perhaps you can repeat this comparison after the next release (this will be very soon!) and split the aroma.affymetrix comparison into: - aroma.affy.oligo -- with RmaPlm(csN,flavor="oligo") - aroma.affy.affyPLM -- with flavor="affyPLM" (as you've done already) Perhaps the best way to look at all of this at once is with a single pairs() plot. Cheers, Mark On 05/12/2008, at 9:01 AM, [EMAIL PROTECTED] wrote: > Dear Mark and Henrik, > > I wanted to confirm that your summary was correct regarding the > different flavors for probeset summarization. I downloaded the MAQC > HG_U133_Plus_2 array data from the MAQC website: > > http://edkb.fda.gov/MAQC/MainStudy/upload/MAQC_AFX_123456_120CELs.zip > > I then ran the analysis of the arrays from site 1, using just the A > and B samples, with aroma.affymetrix, affy, affyPLM and oligo (see > below for the complete code I used to do this). Basically the > aroma.affymetrix and affyPLM data was essentially identical. The > affy and oligo data was also essentially identical. As observed with > the Gene ST array data there were significant differences between > aroma.affymetrix and affy or oligo. Plots are attached. > > The Gene ST arrays do not have any MM probes - as we are using RMA > rather than GCRMA this should not have affected anything. > > Thanks, > > Peter > > #OLIGO ANALYSIS > > library(pd.hg.u133.plus.2) > library(pdInfoBuilder) > fn <- dir("G:\\BGC_EXPERIMENTS\\MAQC_Data\\HG- > U133_Plus_2","CEL",full=T)[1:10] > raw.oligo<-read.celfiles(filenames=fn,pkgname="pd.hg.u133.plus.2") > eset.oligo<-rma(raw.oligo) > data.oligo<-exprs(eset.oligo) > > > #AFFY ANALYSIS > > library(affy) > fn <- dir("G:\\BGC_EXPERIMENTS\\MAQC_Data\\HG- > U133_Plus_2","CEL",full=T)[1:10] > raw.affy <- ReadAffy(filenames=fn) > eset.affy <- rma(raw.affy) > data.affy <- exprs(eset.affy) > > > #AFFY PLM ANALYSIS > > library(affyPLM) > fn <- dir("G:\\BGC_EXPERIMENTS\\MAQC_Data\\HG- > U133_Plus_2","CEL",full=T)[1:10] > raw.affyPLM <- ReadAffy(filenames=fn) > fit.affyPLM <- fitPLM(raw.affyPLM, verbos=9) > data.affyPLM <- coefs(fit.affyPLM) > #Analysis of MAQC on Human U113 Plus 2 > > setwd("G:\\BGC_EXPERIMENTS\\MAQC_Analysis") > library(aroma.affymetrix) > prefixName <- "MAQC_Data" > chip1 <- "HG-U133_Plus_2" > cdf <- AffymetrixCdfFile$fromChipType("HG-U133_Plus_2") > cs <- AffymetrixCelSet$byName(prefixName, cdf=cdf, chipType=chip1) > pattern <- "AFX_1_[AB]" > idxs <- grep(pattern, getNames(cs)) > cs <- extract(cs, idxs) > bc <- RmaBackgroundCorrection(cs) > csBC <- process(bc) > qn <- QuantileNormalization(csBC, typesToUpdate="pm") > csN <- process(qn) > plm <- RmaPlm(csN, flavor="affyPLM") #flavor="oligo", must > library(oligo) > fit(plm) > ces <- getChipEffectSet(plm) > getExprs <- function(ces, ...) { > cdf <- getCdf(ces) > theta <- extractMatrix(ces, ..., returnUgcMap=TRUE) > ugcMap <- attr(theta, "unitGroupCellMap") > un<-getUnitNames(cdf, ugcMap[,"unit"]) > rownames(theta) <- un > log2(theta) > } > data.aroma <- getExprs(ces) > > > #COMPARING THE DATASETS > > > dim(data.affy) > [1] 5467510 > > dim(data.affyPLM) > [1] 5467510 > > dim(data.oligo) > [1] 5461310 > > dim(data.aroma) > [1] 5467510 > > #Unlike in the Gene ST analysis the packages do not return the > probes in the same order so be careful to reorder them. Also not > that Oligo removes the control probes (AFFX*). > > sum(rownames(data.affyPLM)==rownames(data.affy)) > # [1] 54675 > o <- match(rownames(data.oligo), rownames(data.affy)) > data.affy <- data.affy[o,] > data.affyPLM <- data.affyPLM[o,] > sum(rownames(data.affy)==rownames(data.oligo)) > # [1] 54613 > o <- match(rownames(data.affy), rownames(data.aroma)) > data.aroma <- data.aroma[o,] > sum(rownames(data.affy)==rownames(data.aroma)) > # [1] 54613 > > e<- (data.aroma - data.affy) > mean(as.vector(e^2), na.rm=T) > # [1] 0.2119428 > sd(as.vector(e^2), na.rm=T) > # [1] 0.3704433 > > e <- (data.aroma - data.oligo) > mean(as.vector(e^2), na.rm=T) > # [1] 0.2104522 > sd(as.vector(e^2), na.rm=T) > # [1] 0.3688539 > > e<- (data.aroma - data.affyPLM) > mean(as.vector(e^2), na.rm=T) > # [1] 1.160118e-05 > sd(as.vector(e^2), na.rm=T) > # [1] 2.125207e-05 > > e<- (data.affy - data.oligo) > mean(as.vector(e^2), na.rm=T) > # [1] 1.345037e-05 > sd(as.vector(e^2), na.rm=T) > # [1] 4.111692e-05 > > plot(data.aroma[,1],data.affyPLM[,1],main="Comparison of Aroma and > AffyPLM Data", > col="red",cex=0.5) > abline(0,1, lwd=2) > savePlot(file="HGU133Plus2_Aroma_vs_AffyPLM", type="png") > > plot(data.affy[,1],data.oligo[,1],main="Comparison of Affy and Oligo > Data", > col="red",cex=0.5) > abline(0,1, lwd=2) > savePlot(file="HGU133Plus2_Affy_vs_Oligo", type="png") > > plot(data.aroma[,1],data.affy[,1],main="Comparison of Aroma and Affy > Data", > col="red",cex=0.5) > abline(0,1, lwd=2) > savePlot(file="HGU133Plus2_Aroma_vs_Affy", type="png") > > plot(data.aroma[,1],data.oligo[,1],main="Comparison of Ar
Re: Reproducing RMA with Gene ST data (Was: Re: [aroma.affymetrix] Re: How do you analyze Gene ST Data?)
Dear Mark and Henrik, I wanted to confirm that your summary was correct regarding the different flavors for probeset summarization. I downloaded the MAQC HG_U133_Plus_2 array data from the MAQC website: http://edkb.fda.gov/MAQC/MainStudy/upload/MAQC_AFX_123456_120CELs.zip I then ran the analysis of the arrays from site 1, using just the A and B samples, with aroma.affymetrix, affy, affyPLM and oligo (see below for the complete code I used to do this). Basically the aroma.affymetrix and affyPLM data was essentially identical. The affy and oligo data was also essentially identical. As observed with the Gene ST array data there were significant differences between aroma.affymetrix and affy or oligo. Plots are attached. The Gene ST arrays do not have any MM probes - as we are using RMA rather than GCRMA this should not have affected anything. Thanks, Peter #OLIGO ANALYSIS library(pd.hg.u133.plus.2) library(pdInfoBuilder) fn <- dir("G:\\BGC_EXPERIMENTS\\MAQC_Data\\HG-U133_Plus_2","CEL",full=T)[1:10] raw.oligo<-read.celfiles(filenames=fn,pkgname="pd.hg.u133.plus.2") eset.oligo<-rma(raw.oligo) data.oligo<-exprs(eset.oligo) #AFFY ANALYSIS library(affy) fn <- dir("G:\\BGC_EXPERIMENTS\\MAQC_Data\\HG-U133_Plus_2","CEL",full=T)[1:10] raw.affy <- ReadAffy(filenames=fn) eset.affy <- rma(raw.affy) data.affy <- exprs(eset.affy) #AFFY PLM ANALYSIS library(affyPLM) fn <- dir("G:\\BGC_EXPERIMENTS\\MAQC_Data\\HG-U133_Plus_2","CEL",full=T)[1:10] raw.affyPLM <- ReadAffy(filenames=fn) fit.affyPLM <- fitPLM(raw.affyPLM, verbos=9) data.affyPLM <- coefs(fit.affyPLM) #Analysis of MAQC on Human U113 Plus 2 setwd("G:\\BGC_EXPERIMENTS\\MAQC_Analysis") library(aroma.affymetrix) prefixName <- "MAQC_Data" chip1 <- "HG-U133_Plus_2" cdf <- AffymetrixCdfFile$fromChipType("HG-U133_Plus_2") cs <- AffymetrixCelSet$byName(prefixName, cdf=cdf, chipType=chip1) pattern <- "AFX_1_[AB]" idxs <- grep(pattern, getNames(cs)) cs <- extract(cs, idxs) bc <- RmaBackgroundCorrection(cs) csBC <- process(bc) qn <- QuantileNormalization(csBC, typesToUpdate="pm") csN <- process(qn) plm <- RmaPlm(csN, flavor="affyPLM") #flavor="oligo", must library(oligo) fit(plm) ces <- getChipEffectSet(plm) getExprs <- function(ces, ...) { cdf <- getCdf(ces) theta <- extractMatrix(ces, ..., returnUgcMap=TRUE) ugcMap <- attr(theta, "unitGroupCellMap") un<-getUnitNames(cdf, ugcMap[,"unit"]) rownames(theta) <- un log2(theta) } data.aroma <- getExprs(ces) #COMPARING THE DATASETS > dim(data.affy) [1] 5467510 > dim(data.affyPLM) [1] 5467510 > dim(data.oligo) [1] 5461310 > dim(data.aroma) [1] 5467510 #Unlike in the Gene ST analysis the packages do not return the probes in the same order so be careful to reorder them. Also not that Oligo removes the control probes (AFFX*). sum(rownames(data.affyPLM)==rownames(data.affy)) # [1] 54675 o <- match(rownames(data.oligo), rownames(data.affy)) data.affy <- data.affy[o,] data.affyPLM <- data.affyPLM[o,] sum(rownames(data.affy)==rownames(data.oligo)) # [1] 54613 o <- match(rownames(data.affy), rownames(data.aroma)) data.aroma <- data.aroma[o,] sum(rownames(data.affy)==rownames(data.aroma)) # [1] 54613 e<- (data.aroma - data.affy) mean(as.vector(e^2), na.rm=T) # [1] 0.2119428 sd(as.vector(e^2), na.rm=T) # [1] 0.3704433 e <- (data.aroma - data.oligo) mean(as.vector(e^2), na.rm=T) # [1] 0.2104522 sd(as.vector(e^2), na.rm=T) # [1] 0.3688539 e<- (data.aroma - data.affyPLM) mean(as.vector(e^2), na.rm=T) # [1] 1.160118e-05 sd(as.vector(e^2), na.rm=T) # [1] 2.125207e-05 e<- (data.affy - data.oligo) mean(as.vector(e^2), na.rm=T) # [1] 1.345037e-05 sd(as.vector(e^2), na.rm=T) # [1] 4.111692e-05 plot(data.aroma[,1],data.affyPLM[,1],main="Comparison of Aroma and AffyPLM Data", col="red",cex=0.5) abline(0,1, lwd=2) savePlot(file="HGU133Plus2_Aroma_vs_AffyPLM", type="png") plot(data.affy[,1],data.oligo[,1],main="Comparison of Affy and Oligo Data", col="red",cex=0.5) abline(0,1, lwd=2) savePlot(file="HGU133Plus2_Affy_vs_Oligo", type="png") plot(data.aroma[,1],data.affy[,1],main="Comparison of Aroma and Affy Data", col="red",cex=0.5) abline(0,1, lwd=2) savePlot(file="HGU133Plus2_Aroma_vs_Affy", type="png") plot(data.aroma[,1],data.oligo[,1],main="Comparison of Aroma and Oligo Data", col="red",cex=0.5) abline(0,1, lwd=2) savePlot(file="HGU133Plus2_Aroma_vs_Oligo", type="png") plot(data.affy[,1],data.affyPLM[,1],main="Comparison of Affy and AffyPLM Data", col="red",cex=0.5) abline(0,1, lwd=2) savePlot(file="HGU133Plus2_Affy_vs_AffyPLM", type="png") # FYI CREATING HG_U133_PLUS_2 Oligo Annotation LIbrary setwd("P:\\ANNOTATION\\AffyAnnotation\\Human\\HG-U133_Plus_2") library(pdInfoBuilder) cdfFile <- "HG-U133_Plus_2.cdf" csvAnnoFile <- "HG-U133_Plus_2.na27.annot.csv" tabSeqFile <- "HG-U133_Plus_2.probe_tab" pkg <- new("AffyExpressionPDInfoPkgSeed", author="Peter White", email=" [EMAIL PROTECTED]", version="0.2.0", genomebuild="UCSC hg18, June 2006", chipName="hgu133plus2", manufacturer="affymetr
Re: Reproducing RMA with Gene ST data (Was: Re: [aroma.affymetrix] Re: How do you analyze Gene ST Data?)
Hi Mark, I ran your code below on the Mouse Gene ST CDF and agree that like the human array, the probes used by aroma.affymetrix and oligo are identical, appart from the missing 45 control probesets I mentioned previously (I listed them below as requested). For the Human array were the number of probesets the same using either method? length(un) [1] 35557 length(ids) [1] 35512 setdiff(un,ids) [1] 10338032 10338034 10338021 10338023 10338061 10338028 10338012 10338052 [9] 10338027 10338018 10338046 10338002 10338051 10338019 10338006 10338015 [17] 10338007 10338005 10338010 10338033 10338013 10338048 10338030 10338050 [25] 10338055 10338014 10338024 10338054 10338040 10338043 10338008 10338049 [33] 10338062 10338031 10338022 10338009 10338053 10338020 10338011 10338016 [41] 10338057 10338039 10338058 10338038 10338045 matches[1:5,] pdInfoBuilder unsupportedCDF union 1052992125 2525 1042373125 2525 1060380929 2929 1048604129 2929 10341702 4 4 4 table(matches[,3]-matches[,1]) 0 35512 table(matches[,3]-matches[,2]) 0 35512 Peter On Dec 3, 4:36 pm, Mark Robinson <[EMAIL PROTECTED]> wrote: > Hi all. > > First of all, thanks Peter for 1) doing this testing and 2) for > spelling everything out. I expect to refer people to this thread in > the future, so thanks for that. > > Just wanted to add 3 more tidbits of hopefully useful information. > > 1. I dug a bit into why flavor="oligo" doesn't work within > aroma.affymetrix. It turns out it was a simple fix. Since I don't > use it regularly (it doesn't give probe affinities!) and the > underlying 'oligo' functions had changed, it stopped working. Its > corrected now. I've checked in the fix, so flavor='oligo' will be > available in the next release. In my tests, it appears VERY close to > 'affy' ... and since its based on 'oligo' code, it should be VERY VERY > similar. > > ... > plm1 <- RmaPlm(csN,flavor="oligo") > fit(plm1,verbose=verbose) > ces <- getChipEffectSet(plm1) > data.aroma.oligo <- getExprs(ces) > ... > > > mean( (data.affy[,1]-data.aroma.oligo[,1])^2 ) > [1] 0.0003193267 > > 2. I dug a bit into the unsupported CDF and the 'platformDesign' > objects from oligo and from what I can tell, the probes used in the > 33252 units (I'm looking at Human) within aroma.affymetrix are > identical to the probes used within oligo (as built with > pdInfoBuilder) ... not a single probe no accounted for. In case you > haven't dug into pdInfoBuilder before and the SQLite db behind, here > are some commands you may find useful ... > > --- > library(pd.hugene.1.0.st.v1) > library(pdInfoBuilder) > fn <- dir("rawData/tissues/HuGene-1_0-st-v1","CEL",full=TRUE)[1:3] > x <- read.celfiles(filenames=fn,pkgname="pd.hugene.1.0.st.v1") > pd <-getPlatformDesign(x) > ff <- dbGetQuery(db(pd), "select * from pmfeature") > > # three 3 lines speed up the splitting ... > ffs <- split(ff, substr(ff$fsetid,1,4)) > ffs <- unlist( lapply(ffs, FUN=function(u) split(u,u$fsetid)), > recursive=FALSE) > names(ffs) <- substr(names(ffs), 6,nchar(names(ffs))) > > cs <- AffymetrixCelSet$fromName(name, chipType=chip) > cdf <- getCdf(cs) > cdfCells <- readCdfUnits(getPathname(cdf), units=NULL, readXY=FALSE, > readBases=FALSE, readExpos=FALSE, > readType=FALSE, > readDirection=FALSE,stratifyBy=c("pm"), readIndices=TRUE, verbose=0) > > un <- unique(ff$fsetid) > ids <- intersect(un,names(cdfCells)) > > mf <- match(ids,names(ffs)) > mc <- match(ids,names(cdfCells)) > > matches <- matrix(NA,nr=length(ids),nc=3) > rownames(matches) <- ids > colnames(matches) <- c("pdInfoBuilder","unsupportedCDF","union") > > for(i in 1:nrow(matches)) { > a <- cdfCells[[ ids[i] ]]$groups[[1]]$indices > b <- ffs[[ ids[i] ]]$fid > matches[i,1] <- length(b) > matches[i,2] <- length(a) > matches[i,3] <- length(union(a,b)) > cat(ids[i],"\n")} > > --- > > ... this gives .. > > matches[1:5,] > pdInfoBuilder unsupportedCDF union > 7981326 27 27 27 > 8095005 42 42 42 > 8100310 10 10 10 > 7948117 15 15 15 > 8155877 25 25 25 > > table(matches[,3]-matches[,1]) > 0 > 33252 > > table(matches[,3]-matches[,2]) > 0 > 33252 > > 3. This doesn't address the problem of the missing probesets. I'm > happy to go and collect these if people want them, but based on the > reply from Affymetrix (thanks to Mark Cowley for the leg work here), > they are probably 'low-coverage transcript clusters' that can be > 'safely ignored'. See: > > http://article.gmane.org/gmane.science.biology.informatics.conductor/... > > SUMMARY: > > [aroma.affymetrix flavor='oligo'] = rma in affy, rma in oligo > > [aroma.affymetrix flavor='affyPLM'] = fitPLM in
Re: Reproducing RMA with Gene ST data (Was: Re: [aroma.affymetrix] Re: How do you analyze Gene ST Data?)
On Dec 3, 8:45 pm, Mark Robinson <[EMAIL PROTECTED]> wrote: > On 04/12/2008, at 10:17 AM, Henrik Bengtsson wrote: > > > So, it all has to do with *how* the log-additive probe-level model is > > *fitted*, correct? > > Correct. Identical linear model, different procedure for fitting. > > (as a bit of an aside ... I think of these things in terms of > influence functions -- median polish will have a different IF than the > defaults in affyPLM's robust fit) > > M. > > > Thus, the model is the same but the algorithms > > differ. That gives us some sense of how much variance we get from > > using different algorithms regardless of model. Simulation studies > > (under the model) could then show if for instance one of the > > algorithms is more biased than others. > > > Thanks for fixing the flavor="oligo". It will be part of the next > > release. > > > Cheers > > > Henrik > > > On Wed, Dec 3, 2008 at 1:36 PM, Mark Robinson > > <[EMAIL PROTECTED]> wrote: > > >> Hi all. > > >> First of all, thanks Peter for 1) doing this testing and 2) for > >> spelling everything out. I expect to refer people to this thread in > >> the future, so thanks for that. > > >> Just wanted to add 3 more tidbits of hopefully useful information. > > >> 1. I dug a bit into why flavor="oligo" doesn't work within > >> aroma.affymetrix. It turns out it was a simple fix. Since I don't > >> use it regularly (it doesn't give probe affinities!) and the > >> underlying 'oligo' functions had changed, it stopped working. Its > >> corrected now. I've checked in the fix, so flavor='oligo' will be > >> available in the next release. In my tests, it appears VERY close to > >> 'affy' ... and since its based on 'oligo' code, it should be VERY > >> VERY > >> similar. > > >> ... > >> plm1 <- RmaPlm(csN,flavor="oligo") > >> fit(plm1,verbose=verbose) > >> ces <- getChipEffectSet(plm1) > >> data.aroma.oligo <- getExprs(ces) > >> ... > > >>> mean( (data.affy[,1]-data.aroma.oligo[,1])^2 ) > >> [1] 0.0003193267 > > >> 2. I dug a bit into the unsupported CDF and the 'platformDesign' > >> objects from oligo and from what I can tell, the probes used in the > >> 33252 units (I'm looking at Human) within aroma.affymetrix are > >> identical to the probes used within oligo (as built with > >> pdInfoBuilder) ... not a single probe no accounted for. In case you > >> haven't dug into pdInfoBuilder before and the SQLite db behind, here > >> are some commands you may find useful ... > > >> --- > >> library(pd.hugene.1.0.st.v1) > >> library(pdInfoBuilder) > >> fn <- dir("rawData/tissues/HuGene-1_0-st-v1","CEL",full=TRUE)[1:3] > >> x <- read.celfiles(filenames=fn,pkgname="pd.hugene.1.0.st.v1") > >> pd <-getPlatformDesign(x) > >> ff <- dbGetQuery(db(pd), "select * from pmfeature") > > >> # three 3 lines speed up the splitting ... > >> ffs <- split(ff, substr(ff$fsetid,1,4)) > >> ffs <- unlist( lapply(ffs, FUN=function(u) split(u,u$fsetid)), > >> recursive=FALSE) > >> names(ffs) <- substr(names(ffs), 6,nchar(names(ffs))) > > >> cs <- AffymetrixCelSet$fromName(name, chipType=chip) > >> cdf <- getCdf(cs) > >> cdfCells <- readCdfUnits(getPathname(cdf), units=NULL, readXY=FALSE, > >> readBases=FALSE, readExpos=FALSE, > >> readType=FALSE, > >> readDirection=FALSE,stratifyBy=c("pm"), readIndices=TRUE, verbose=0) > > >> un <- unique(ff$fsetid) > >> ids <- intersect(un,names(cdfCells)) > > >> mf <- match(ids,names(ffs)) > >> mc <- match(ids,names(cdfCells)) > > >> matches <- matrix(NA,nr=length(ids),nc=3) > >> rownames(matches) <- ids > >> colnames(matches) <- c("pdInfoBuilder","unsupportedCDF","union") > > >> for(i in 1:nrow(matches)) { > >> a <- cdfCells[[ ids[i] ]]$groups[[1]]$indices > >> b <- ffs[[ ids[i] ]]$fid > >> matches[i,1] <- length(b) > >> matches[i,2] <- length(a) > >> matches[i,3] <- length(union(a,b)) > >> cat(ids[i],"\n") > >> } > >> --- > > >> ... this gives .. > >>> matches[1:5,] > >> pdInfoBuilder unsupportedCDF union > >> 7981326 27 27 27 > >> 8095005 42 42 42 > >> 8100310 10 10 10 > >> 7948117 15 15 15 > >> 8155877 25 25 25 > >>> table(matches[,3]-matches[,1]) > >> 0 > >> 33252 > >>> table(matches[,3]-matches[,2]) > >> 0 > >> 33252 > > >> 3. This doesn't address the problem of the missing probesets. I'm > >> happy to go and collect these if people want them, but based on the > >> reply from Affymetrix (thanks to Mark Cowley for the leg work here), > >> they are probably 'low-coverage transcript clusters' that can be > >> 'safely ignored'. See: > > >>http://article.gmane.org/gmane.science.biology.informatics.conductor/... > > >> SUMMARY: > > >> [aroma.affymetrix flavor='oligo'] = rma in affy, rma in oligo > > >> [aroma.affymetrix flavor='affyPLM'] = fitPLM in affyPLM > > >> ... where '=' means 'for all intents and purposes equivalent', not > >> strictly equal. >
Re: Reproducing RMA with Gene ST data (Was: Re: [aroma.affymetrix] Re: How do you analyze Gene ST Data?)
On 04/12/2008, at 10:17 AM, Henrik Bengtsson wrote: > So, it all has to do with *how* the log-additive probe-level model is > *fitted*, correct? Correct. Identical linear model, different procedure for fitting. (as a bit of an aside ... I think of these things in terms of influence functions -- median polish will have a different IF than the defaults in affyPLM's robust fit) M. > Thus, the model is the same but the algorithms > differ. That gives us some sense of how much variance we get from > using different algorithms regardless of model. Simulation studies > (under the model) could then show if for instance one of the > algorithms is more biased than others. > > Thanks for fixing the flavor="oligo". It will be part of the next > release. > > Cheers > > Henrik > > On Wed, Dec 3, 2008 at 1:36 PM, Mark Robinson > <[EMAIL PROTECTED]> wrote: >> >> Hi all. >> >> First of all, thanks Peter for 1) doing this testing and 2) for >> spelling everything out. I expect to refer people to this thread in >> the future, so thanks for that. >> >> Just wanted to add 3 more tidbits of hopefully useful information. >> >> 1. I dug a bit into why flavor="oligo" doesn't work within >> aroma.affymetrix. It turns out it was a simple fix. Since I don't >> use it regularly (it doesn't give probe affinities!) and the >> underlying 'oligo' functions had changed, it stopped working. Its >> corrected now. I've checked in the fix, so flavor='oligo' will be >> available in the next release. In my tests, it appears VERY close to >> 'affy' ... and since its based on 'oligo' code, it should be VERY >> VERY >> similar. >> >> ... >> plm1 <- RmaPlm(csN,flavor="oligo") >> fit(plm1,verbose=verbose) >> ces <- getChipEffectSet(plm1) >> data.aroma.oligo <- getExprs(ces) >> ... >> >>> mean( (data.affy[,1]-data.aroma.oligo[,1])^2 ) >> [1] 0.0003193267 >> >> 2. I dug a bit into the unsupported CDF and the 'platformDesign' >> objects from oligo and from what I can tell, the probes used in the >> 33252 units (I'm looking at Human) within aroma.affymetrix are >> identical to the probes used within oligo (as built with >> pdInfoBuilder) ... not a single probe no accounted for. In case you >> haven't dug into pdInfoBuilder before and the SQLite db behind, here >> are some commands you may find useful ... >> >> --- >> library(pd.hugene.1.0.st.v1) >> library(pdInfoBuilder) >> fn <- dir("rawData/tissues/HuGene-1_0-st-v1","CEL",full=TRUE)[1:3] >> x <- read.celfiles(filenames=fn,pkgname="pd.hugene.1.0.st.v1") >> pd <-getPlatformDesign(x) >> ff <- dbGetQuery(db(pd), "select * from pmfeature") >> >> # three 3 lines speed up the splitting ... >> ffs <- split(ff, substr(ff$fsetid,1,4)) >> ffs <- unlist( lapply(ffs, FUN=function(u) split(u,u$fsetid)), >> recursive=FALSE) >> names(ffs) <- substr(names(ffs), 6,nchar(names(ffs))) >> >> cs <- AffymetrixCelSet$fromName(name, chipType=chip) >> cdf <- getCdf(cs) >> cdfCells <- readCdfUnits(getPathname(cdf), units=NULL, readXY=FALSE, >> readBases=FALSE, readExpos=FALSE, >> readType=FALSE, >> readDirection=FALSE,stratifyBy=c("pm"), readIndices=TRUE, verbose=0) >> >> un <- unique(ff$fsetid) >> ids <- intersect(un,names(cdfCells)) >> >> mf <- match(ids,names(ffs)) >> mc <- match(ids,names(cdfCells)) >> >> matches <- matrix(NA,nr=length(ids),nc=3) >> rownames(matches) <- ids >> colnames(matches) <- c("pdInfoBuilder","unsupportedCDF","union") >> >> for(i in 1:nrow(matches)) { >> a <- cdfCells[[ ids[i] ]]$groups[[1]]$indices >> b <- ffs[[ ids[i] ]]$fid >> matches[i,1] <- length(b) >> matches[i,2] <- length(a) >> matches[i,3] <- length(union(a,b)) >> cat(ids[i],"\n") >> } >> --- >> >> ... this gives .. >>> matches[1:5,] >>pdInfoBuilder unsupportedCDF union >> 798132627 2727 >> 809500542 4242 >> 810031010 1010 >> 794811715 1515 >> 815587725 2525 >>> table(matches[,3]-matches[,1]) >>0 >> 33252 >>> table(matches[,3]-matches[,2]) >>0 >> 33252 >> >> 3. This doesn't address the problem of the missing probesets. I'm >> happy to go and collect these if people want them, but based on the >> reply from Affymetrix (thanks to Mark Cowley for the leg work here), >> they are probably 'low-coverage transcript clusters' that can be >> 'safely ignored'. See: >> >> http://article.gmane.org/gmane.science.biology.informatics.conductor/19738 >> >> >> SUMMARY: >> >> [aroma.affymetrix flavor='oligo'] = rma in affy, rma in oligo >> >> [aroma.affymetrix flavor='affyPLM'] = fitPLM in affyPLM >> >> ... where '=' means 'for all intents and purposes equivalent', not >> strictly equal. >> >> Cheers, >> Mark >> >> >> >> >> >> On 04/12/2008, at 7:43 AM, Henrik Bengtsson wrote: >> >>> Hi, >>> >>> thanks for sharing all this. >>> >>> On Tue, Dec 2, 2008 at 11:54 AM, pwhiteusa <[EMAIL PROTECTED]> >>> wrote: Hi All, Here
Re: Reproducing RMA with Gene ST data (Was: Re: [aroma.affymetrix] Re: How do you analyze Gene ST Data?)
Hi, thanks Mark for this. So, it all has to do with *how* the log-additive probe-level model is *fitted*, correct? Thus, the model is the same but the algorithms differ. That gives us some sense of how much variance we get from using different algorithms regardless of model. Simulation studies (under the model) could then show if for instance one of the algorithms is more biased than others. Thanks for fixing the flavor="oligo". It will be part of the next release. Cheers Henrik On Wed, Dec 3, 2008 at 1:36 PM, Mark Robinson <[EMAIL PROTECTED]> wrote: > > Hi all. > > First of all, thanks Peter for 1) doing this testing and 2) for > spelling everything out. I expect to refer people to this thread in > the future, so thanks for that. > > Just wanted to add 3 more tidbits of hopefully useful information. > > 1. I dug a bit into why flavor="oligo" doesn't work within > aroma.affymetrix. It turns out it was a simple fix. Since I don't > use it regularly (it doesn't give probe affinities!) and the > underlying 'oligo' functions had changed, it stopped working. Its > corrected now. I've checked in the fix, so flavor='oligo' will be > available in the next release. In my tests, it appears VERY close to > 'affy' ... and since its based on 'oligo' code, it should be VERY VERY > similar. > > ... > plm1 <- RmaPlm(csN,flavor="oligo") > fit(plm1,verbose=verbose) > ces <- getChipEffectSet(plm1) > data.aroma.oligo <- getExprs(ces) > ... > > > mean( (data.affy[,1]-data.aroma.oligo[,1])^2 ) > [1] 0.0003193267 > > 2. I dug a bit into the unsupported CDF and the 'platformDesign' > objects from oligo and from what I can tell, the probes used in the > 33252 units (I'm looking at Human) within aroma.affymetrix are > identical to the probes used within oligo (as built with > pdInfoBuilder) ... not a single probe no accounted for. In case you > haven't dug into pdInfoBuilder before and the SQLite db behind, here > are some commands you may find useful ... > > --- > library(pd.hugene.1.0.st.v1) > library(pdInfoBuilder) > fn <- dir("rawData/tissues/HuGene-1_0-st-v1","CEL",full=TRUE)[1:3] > x <- read.celfiles(filenames=fn,pkgname="pd.hugene.1.0.st.v1") > pd <-getPlatformDesign(x) > ff <- dbGetQuery(db(pd), "select * from pmfeature") > > # three 3 lines speed up the splitting ... > ffs <- split(ff, substr(ff$fsetid,1,4)) > ffs <- unlist( lapply(ffs, FUN=function(u) split(u,u$fsetid)), > recursive=FALSE) > names(ffs) <- substr(names(ffs), 6,nchar(names(ffs))) > > cs <- AffymetrixCelSet$fromName(name, chipType=chip) > cdf <- getCdf(cs) > cdfCells <- readCdfUnits(getPathname(cdf), units=NULL, readXY=FALSE, > readBases=FALSE, readExpos=FALSE, > readType=FALSE, > readDirection=FALSE,stratifyBy=c("pm"), readIndices=TRUE, verbose=0) > > un <- unique(ff$fsetid) > ids <- intersect(un,names(cdfCells)) > > mf <- match(ids,names(ffs)) > mc <- match(ids,names(cdfCells)) > > matches <- matrix(NA,nr=length(ids),nc=3) > rownames(matches) <- ids > colnames(matches) <- c("pdInfoBuilder","unsupportedCDF","union") > > for(i in 1:nrow(matches)) { > a <- cdfCells[[ ids[i] ]]$groups[[1]]$indices > b <- ffs[[ ids[i] ]]$fid > matches[i,1] <- length(b) > matches[i,2] <- length(a) > matches[i,3] <- length(union(a,b)) > cat(ids[i],"\n") > } > --- > > ... this gives .. > > matches[1:5,] > pdInfoBuilder unsupportedCDF union > 798132627 2727 > 809500542 4242 > 810031010 1010 > 794811715 1515 > 815587725 2525 > > table(matches[,3]-matches[,1]) > 0 > 33252 > > table(matches[,3]-matches[,2]) > 0 > 33252 > > 3. This doesn't address the problem of the missing probesets. I'm > happy to go and collect these if people want them, but based on the > reply from Affymetrix (thanks to Mark Cowley for the leg work here), > they are probably 'low-coverage transcript clusters' that can be > 'safely ignored'. See: > > http://article.gmane.org/gmane.science.biology.informatics.conductor/19738 > > > SUMMARY: > > [aroma.affymetrix flavor='oligo'] = rma in affy, rma in oligo > > [aroma.affymetrix flavor='affyPLM'] = fitPLM in affyPLM > > ... where '=' means 'for all intents and purposes equivalent', not > strictly equal. > > Cheers, > Mark > > > > > > On 04/12/2008, at 7:43 AM, Henrik Bengtsson wrote: > >> Hi, >> >> thanks for sharing all this. >> >> On Tue, Dec 2, 2008 at 11:54 AM, pwhiteusa <[EMAIL PROTECTED]> >> wrote: >>> >>> Hi All, >>> >>> Here is the exact code I used to analyze Gene ST data for an >>> experiment performed with the MoGene-1_0-st-v1 array. >>> >>> AROMA.AFFYMETRIX >>> >>> library(aroma.affymetrix) >>> cdf <- AffymetrixCdfFile$fromChipType("MoGene-1_0-st-v1",tags="r3") >>> prefixName <- "Pierson" >>> cs <- AffymetrixCelSet$byName(prefixName, cdf=cdf) >>> bc <- RmaBackgroundCorrection(cs) >>> csBC <- process(bc) >>> qn <- QuantileNormalization
Re: Reproducing RMA with Gene ST data (Was: Re: [aroma.affymetrix] Re: How do you analyze Gene ST Data?)
Hi all. First of all, thanks Peter for 1) doing this testing and 2) for spelling everything out. I expect to refer people to this thread in the future, so thanks for that. Just wanted to add 3 more tidbits of hopefully useful information. 1. I dug a bit into why flavor="oligo" doesn't work within aroma.affymetrix. It turns out it was a simple fix. Since I don't use it regularly (it doesn't give probe affinities!) and the underlying 'oligo' functions had changed, it stopped working. Its corrected now. I've checked in the fix, so flavor='oligo' will be available in the next release. In my tests, it appears VERY close to 'affy' ... and since its based on 'oligo' code, it should be VERY VERY similar. ... plm1 <- RmaPlm(csN,flavor="oligo") fit(plm1,verbose=verbose) ces <- getChipEffectSet(plm1) data.aroma.oligo <- getExprs(ces) ... > mean( (data.affy[,1]-data.aroma.oligo[,1])^2 ) [1] 0.0003193267 2. I dug a bit into the unsupported CDF and the 'platformDesign' objects from oligo and from what I can tell, the probes used in the 33252 units (I'm looking at Human) within aroma.affymetrix are identical to the probes used within oligo (as built with pdInfoBuilder) ... not a single probe no accounted for. In case you haven't dug into pdInfoBuilder before and the SQLite db behind, here are some commands you may find useful ... --- library(pd.hugene.1.0.st.v1) library(pdInfoBuilder) fn <- dir("rawData/tissues/HuGene-1_0-st-v1","CEL",full=TRUE)[1:3] x <- read.celfiles(filenames=fn,pkgname="pd.hugene.1.0.st.v1") pd <-getPlatformDesign(x) ff <- dbGetQuery(db(pd), "select * from pmfeature") # three 3 lines speed up the splitting ... ffs <- split(ff, substr(ff$fsetid,1,4)) ffs <- unlist( lapply(ffs, FUN=function(u) split(u,u$fsetid)), recursive=FALSE) names(ffs) <- substr(names(ffs), 6,nchar(names(ffs))) cs <- AffymetrixCelSet$fromName(name, chipType=chip) cdf <- getCdf(cs) cdfCells <- readCdfUnits(getPathname(cdf), units=NULL, readXY=FALSE, readBases=FALSE, readExpos=FALSE, readType=FALSE, readDirection=FALSE,stratifyBy=c("pm"), readIndices=TRUE, verbose=0) un <- unique(ff$fsetid) ids <- intersect(un,names(cdfCells)) mf <- match(ids,names(ffs)) mc <- match(ids,names(cdfCells)) matches <- matrix(NA,nr=length(ids),nc=3) rownames(matches) <- ids colnames(matches) <- c("pdInfoBuilder","unsupportedCDF","union") for(i in 1:nrow(matches)) { a <- cdfCells[[ ids[i] ]]$groups[[1]]$indices b <- ffs[[ ids[i] ]]$fid matches[i,1] <- length(b) matches[i,2] <- length(a) matches[i,3] <- length(union(a,b)) cat(ids[i],"\n") } --- ... this gives .. > matches[1:5,] pdInfoBuilder unsupportedCDF union 798132627 2727 809500542 4242 810031010 1010 794811715 1515 815587725 2525 > table(matches[,3]-matches[,1]) 0 33252 > table(matches[,3]-matches[,2]) 0 33252 3. This doesn't address the problem of the missing probesets. I'm happy to go and collect these if people want them, but based on the reply from Affymetrix (thanks to Mark Cowley for the leg work here), they are probably 'low-coverage transcript clusters' that can be 'safely ignored'. See: http://article.gmane.org/gmane.science.biology.informatics.conductor/19738 SUMMARY: [aroma.affymetrix flavor='oligo'] = rma in affy, rma in oligo [aroma.affymetrix flavor='affyPLM'] = fitPLM in affyPLM ... where '=' means 'for all intents and purposes equivalent', not strictly equal. Cheers, Mark On 04/12/2008, at 7:43 AM, Henrik Bengtsson wrote: > Hi, > > thanks for sharing all this. > > On Tue, Dec 2, 2008 at 11:54 AM, pwhiteusa <[EMAIL PROTECTED]> > wrote: >> >> Hi All, >> >> Here is the exact code I used to analyze Gene ST data for an >> experiment performed with the MoGene-1_0-st-v1 array. >> >> AROMA.AFFYMETRIX >> >> library(aroma.affymetrix) >> cdf <- AffymetrixCdfFile$fromChipType("MoGene-1_0-st-v1",tags="r3") >> prefixName <- "Pierson" >> cs <- AffymetrixCelSet$byName(prefixName, cdf=cdf) >> bc <- RmaBackgroundCorrection(cs) >> csBC <- process(bc) >> qn <- QuantileNormalization(csBC, typesToUpdate="pm") >> csN <- process(qn) >> plm <- RmaPlm(csN, flavor="affyPLM") #flavor="oligo", must library >> (oligo) >> fit(plm) >> ces <- getChipEffectSet(plm) >> getExprs <- function(ces, ...) { >> cdf <- getCdf(ces) >> theta <- extractMatrix(ces, ..., returnUgcMap=TRUE) >> ugcMap <- attr(theta, "unitGroupCellMap") >> un<-getUnitNames(cdf, ugcMap[,"unit"]) >> rownames(theta) <- un >> log2(theta) >> } >> data.aroma <- getExprs(ces) > > I think that getExprs() call can be replaced by: > > data.aroma <- extractDataFrame(ces, addNames=TRUE) > > The difference is that the unit names will be in a separate column and > not as row names. You will also get group names and more, but those > you can drop if you want to. > > Ma
Re: Reproducing RMA with Gene ST data (Was: Re: [aroma.affymetrix] Re: How do you analyze Gene ST Data?)
Hi, thanks for sharing all this. On Tue, Dec 2, 2008 at 11:54 AM, pwhiteusa <[EMAIL PROTECTED]> wrote: > > Hi All, > > Here is the exact code I used to analyze Gene ST data for an > experiment performed with the MoGene-1_0-st-v1 array. > > AROMA.AFFYMETRIX > > library(aroma.affymetrix) > cdf <- AffymetrixCdfFile$fromChipType("MoGene-1_0-st-v1",tags="r3") > prefixName <- "Pierson" > cs <- AffymetrixCelSet$byName(prefixName, cdf=cdf) > bc <- RmaBackgroundCorrection(cs) > csBC <- process(bc) > qn <- QuantileNormalization(csBC, typesToUpdate="pm") > csN <- process(qn) > plm <- RmaPlm(csN, flavor="affyPLM") #flavor="oligo", must library > (oligo) > fit(plm) > ces <- getChipEffectSet(plm) > getExprs <- function(ces, ...) { > cdf <- getCdf(ces) > theta <- extractMatrix(ces, ..., returnUgcMap=TRUE) > ugcMap <- attr(theta, "unitGroupCellMap") > un<-getUnitNames(cdf, ugcMap[,"unit"]) > rownames(theta) <- un > log2(theta) > } > data.aroma <- getExprs(ces) I think that getExprs() call can be replaced by: data.aroma <- extractDataFrame(ces, addNames=TRUE) The difference is that the unit names will be in a separate column and not as row names. You will also get group names and more, but those you can drop if you want to. Mark, is it correct that we can "deprecate" the suggestion to use getExprs()? Btw, is this documented somewhere online, or is this knowledge only from the mailing list? > > Easy! Now to get the same data using the Affy packages: > > > BIOCONDUCTOR AFFY > > You first need to create or download your mogene10stv1cdf library from > the Affy unsupported CDF file (https://stat.ethz.ch/pipermail/bioc- > devel/2007-October/001403.html has some detail on how to do this). > However, as Mark Robinson pointed out there are potential issues with > using the Affy unsupported CDF files. See the following for some > details: > > https://stat.ethz.ch/pipermail/bioconductor/2007-November/020188.html > > library(affy) > AffyRaw <- ReadAffy() > AffyEset <- rma(AffyRaw) > data.affy <- exprs(AffyEset) > > > BIOCONDUCTOR OLIGO > > Download all the required Affy annotation files to your Mouse Gene v1 > ST array directory: > > http://www.affymetrix.com/support/technical/byproduct.affx?product=mogene-1_0-st-v1 > > setwd("P:\\ANNOTATION\\AffyAnnotation\\Mouse\\MoGene-1_0-st-v1") > library(pdInfoBuilder) > pgfFile <- "MoGene-1_0-st-v1.r3.pgf" > clfFile <- "MoGene-1_0-st-v1.r3.clf" > transFile <- "MoGene-1_0-st-v1.na26.mm9.transcript.csv" > probeFile <- "MoGene-1_0-st-v1.probe.tab" > pkg <- new("AffyGenePDInfoPkgSeed", author="Peter White", > email="[EMAIL PROTECTED]", version="0.1.3", > genomebuild="UCSC mm9, July 2007", chipName="MoGene10stv1", > manufacturer="affymetrix", biocViews="AnnotationData", > pgfFile=pgfFile, clfFile=clfFile, transFile=transFile, > probeFile=probeFile) > makePdInfoPackage(pkg, destDir=".") > > #This takes a little while to make the Package. Once created you will > need to install the package from the Windows DOS prompt (navigate to > the annotation directory with the newly created pd package to be > installed): > > R CMD INSTALL pd.mogene.1.0.st.v1\ > > Note for this to work you need RTools and you Path variable set up > correctly as described at: > > http://cran.r-project.org/doc/manuals/R-admin.html#The-Windows-toolset) > > Now return to R, set the working directory to your CEL file directory: > > library(pd.mogene.1.0.st.v1) > library(oligo) > OligoRaw<-read.celfiles(filenames=list.celfiles()) > OligoEset<-rma(OligoRaw) > data.oligo<-exprs(OligoEset) > > > COMPARING THE TWO DATASETS > > Here is what I did to compare the data generate by affy, oligo and > aroma.affymetrix: > > dim(data.aroma) > [1] 3551216 > dim(data.affy) > [1] 3551216 > length(grep(TRUE, rownames(data.affy)==rownames(data.aroma))) > [1] 35512 FYI, sum(rownames(data.affy)==rownames(data.aroma)) gives you the same. Replacing sum() with summary() will also work. > > The output from both the affy rma and aroma.affymetrix methods retains > the same order of probes and cel files so the two files can be > compared directly. That is probably because they work of the same CDF, but you should never rely on this/assume that this is always the case. If you do, you should at least verify that the unit names (and group names) match. > However, > > dim(data.oligo) > [1] 3555716 > > The normalized data file from the Oligo package includes an additional > 45 Transcript IDs (there's no annotation on what these are but they > contain anywhere from 9 to 489 probes per probeset). For the record, would you mind posting the names of these "additional" 45 units here? (I'm sure someone else will search the web later and find this thread very helpful). > Fixed this problem as follows: > > o <- match(rownames(data.aroma), rownames(data.oligo)) > data.oligo <- data.oligo[o,] > >> length(grep(TRUE, rownames(data.affy)==rownames(data.oligo))) > [1] 35512 >> length(grep(TRUE, rownames(data.aroma)==rownames(data.oligo))) > [1]
Re: Reproducing RMA with Gene ST data (Was: Re: [aroma.affymetrix] Re: How do you analyze Gene ST Data?)
Hi All, Here is the exact code I used to analyze Gene ST data for an experiment performed with the MoGene-1_0-st-v1 array. AROMA.AFFYMETRIX library(aroma.affymetrix) cdf <- AffymetrixCdfFile$fromChipType("MoGene-1_0-st-v1",tags="r3") prefixName <- "Pierson" cs <- AffymetrixCelSet$byName(prefixName, cdf=cdf) bc <- RmaBackgroundCorrection(cs) csBC <- process(bc) qn <- QuantileNormalization(csBC, typesToUpdate="pm") csN <- process(qn) plm <- RmaPlm(csN, flavor="affyPLM") #flavor="oligo", must library (oligo) fit(plm) ces <- getChipEffectSet(plm) getExprs <- function(ces, ...) { cdf <- getCdf(ces) theta <- extractMatrix(ces, ..., returnUgcMap=TRUE) ugcMap <- attr(theta, "unitGroupCellMap") un<-getUnitNames(cdf, ugcMap[,"unit"]) rownames(theta) <- un log2(theta) } data.aroma <- getExprs(ces) Easy! Now to get the same data using the Affy packages: BIOCONDUCTOR AFFY You first need to create or download your mogene10stv1cdf library from the Affy unsupported CDF file (https://stat.ethz.ch/pipermail/bioc- devel/2007-October/001403.html has some detail on how to do this). However, as Mark Robinson pointed out there are potential issues with using the Affy unsupported CDF files. See the following for some details: https://stat.ethz.ch/pipermail/bioconductor/2007-November/020188.html library(affy) AffyRaw <- ReadAffy() AffyEset <- rma(AffyRaw) data.affy <- exprs(AffyEset) BIOCONDUCTOR OLIGO Download all the required Affy annotation files to your Mouse Gene v1 ST array directory: http://www.affymetrix.com/support/technical/byproduct.affx?product=mogene-1_0-st-v1 setwd("P:\\ANNOTATION\\AffyAnnotation\\Mouse\\MoGene-1_0-st-v1") library(pdInfoBuilder) pgfFile <- "MoGene-1_0-st-v1.r3.pgf" clfFile <- "MoGene-1_0-st-v1.r3.clf" transFile <- "MoGene-1_0-st-v1.na26.mm9.transcript.csv" probeFile <- "MoGene-1_0-st-v1.probe.tab" pkg <- new("AffyGenePDInfoPkgSeed", author="Peter White", email="[EMAIL PROTECTED]", version="0.1.3", genomebuild="UCSC mm9, July 2007", chipName="MoGene10stv1", manufacturer="affymetrix", biocViews="AnnotationData", pgfFile=pgfFile, clfFile=clfFile, transFile=transFile, probeFile=probeFile) makePdInfoPackage(pkg, destDir=".") #This takes a little while to make the Package. Once created you will need to install the package from the Windows DOS prompt (navigate to the annotation directory with the newly created pd package to be installed): R CMD INSTALL pd.mogene.1.0.st.v1\ Note for this to work you need RTools and you Path variable set up correctly as described at: http://cran.r-project.org/doc/manuals/R-admin.html#The-Windows-toolset) Now return to R, set the working directory to your CEL file directory: library(pd.mogene.1.0.st.v1) library(oligo) OligoRaw<-read.celfiles(filenames=list.celfiles()) OligoEset<-rma(OligoRaw) data.oligo<-exprs(OligoEset) COMPARING THE TWO DATASETS Here is what I did to compare the data generate by affy, oligo and aroma.affymetrix: dim(data.aroma) [1] 3551216 dim(data.affy) [1] 3551216 length(grep(TRUE, rownames(data.affy)==rownames(data.aroma))) [1] 35512 The output from both the affy rma and aroma.affymetrix methods retains the same order of probes and cel files so the two files can be compared directly. However, dim(data.oligo) [1] 3555716 The normalized data file from the Oligo package includes an additional 45 Transcript IDs (there's no annotation on what these are but they contain anywhere from 9 to 489 probes per probeset). Fixed this problem as follows: o <- match(rownames(data.aroma), rownames(data.oligo)) data.oligo <- data.oligo[o,] > length(grep(TRUE, rownames(data.affy)==rownames(data.oligo))) [1] 35512 > length(grep(TRUE, rownames(data.aroma)==rownames(data.oligo))) [1] 35512 Finally, there was one more issue with the aroma data. All elements in the 18th row of the dataset were flagged Na. This transcript ID for this probeset was 10338063. Looking at the Affy annotation this appears to be a control probeset with 6,515 probes. Could it have been flagged Na by aroma.affymetrix becuase of this (it was OK with the oligo and affy rma analyses)?? e<- (data.aroma - data.affy) > mean(as.vector(e^2), na.rm=T) [1] 0.1253547 > sd(as.vector(e^2), na.rm=T) [1] 0.2717275 e <- (data.aroma - data.oligo) > mean(as.vector(e^2), na.rm=T) [1] 0.1239203 > sd(as.vector(e^2), na.rm=T) [1] 0.2653593 As you can see the data does not pass your mean and sd cutoffs of <0.0001. e<- (data.affy - data.oligo) > mean(as.vector(e^2), na.rm=T) [1] 0.001484371 > sd(as.vector(e^2), na.rm=T) [1] 0.002523521 The difference between the affy and oligo analysis is much less striking. To visualize these differences I did the following plot, as an example I am just showing the data from the first array but it is reflective of all 16 arrays: plot(data.aroma[,1],data.affy[,1],main="Comparison of Aroma and Affy Data",col="red",cex=0.5) abline(0,1, lwd=2) plot(data.aroma[,1],data.oligo[,1],main="Comparison of Aroma and Oligo Data",col="red",cex=0.5
[aroma.affymetrix] Re: Reproducing RMA with Gene ST data (Was: Re: [aroma.affymetrix] Re: How do you analyze Gene ST Data?)
Hi all. Just to follow up on these comments here. 'fitPLM' with default parameters in the affyPLM package should give practically identical results to the 'standard' pipeline (RMA bg corr + quantile + fit) within aroma.affymetrix, assuming the underlying annotation is the same. This was an easy comparison back in the day of 3' IVT arrays. Now, its a little more difficult. If anyone is willing, I'd be keen to know if these two actually do give the same results on the newer chips i.e. is the underlying annotation the same? I seem to recall that because these newer chips occasionally have probes that are shared amongst different probesets, that the older style affy package would not be able to handle it. For example, Jim MacDonald's post: http://article.gmane.org/gmane.science.biology.informatics.conductor/19184 Jim says there that you "don't want to use affy" for these chips (not 100% sure why). He suggests the whole pdInfoBuilder/oligo thing which at one time had some bugs, but is probably more stable now. I haven't dug deeper as to whether the annotation that 'fitPLM' uses by default ('hugene10st.db' presumably?) matches the annotation that would be used by aroma.affymetrix (the converted-to-binary 'unsupported' CDF file). I know Mark Cowley did find some inconsistencies: http://article.gmane.org/gmane.science.biology.informatics.conductor/19738 This makes me think that we may want to alternatively construct the XXGene 1.0 CDFs (XX=Hu,Mo,Ra) directly from the PGF/CLF files instead of from the unsupported CDF. I suspect that there will only be minor changes, so I haven't looked too deeply into it. Anyone want to check? In addition to what Henrik says about flavour="affyPLM" ... for a lot of my work, there is definitely additional value in using the auxiliary information from the PLMs (i.e. weights, residuals) ... you don't get this directly with oligo/median polish. Few more specific notes ... >> library(affy) >> TestData <- ReadAffy() >> TestEset <- rma(TestData) >> >> If you plot(AromaEset[,1],TestEset[,1]) you can visualize how >> different the data is. I assume you ensured the probesets and samples are in the same order? (Or, is this somehow covered by the plot method ...) I can't tell from this sequence of commands. I don't know what this plot looks like, so I don't whether to be alarmed or not. >> However, I noticed that my GeneST normalized data is quite different >> from the data that I produce using the Affy package. When looking at >> the controls on the array I see that the Aroma normalized data is >> between 5-10% lower than that produced by the Bioconductor Affy >> packages. However, for some probes this difference is can be quite >> large (values are averaged across 16 samples): >> >> ProbeBionconductorAroma >> 10341096 (neg_control) 6079 852 >> 10341735 (neg_control) 25 87 >> 10340969 (pos_control) 3953 1758 >> 10338477 (pos_control) 293 611 >> >> Ultimately, when my downstream analysis looks for differential >> expression the differences between the two analysis approaches become >> minimal, but I did notice that the Aroma package seems to call more >> control probes and probes with no known gene as being differentially >> expressed (i.e. it looks noisier). These probes were all pretty close >> to my 2 fold cutoff, and at 3 fold or greater the data looked the >> same. Tough to really tell. You'd probably want to average on the log2 scale, not the linear scale. This could be a difference in median- polish versus robust PLM, or could be a differnce in annotation. Requires some digging. Also, you may want to remove control probesets before doing DE analysis --- for one thing, you pay a slightly lesser penalty for multiple testing. Cheers, Mark -- Mark Robinson Epigenetics Laboratory, Garvan Bioinformatics Division, WEHI e: [EMAIL PROTECTED] e: [EMAIL PROTECTED] p: +61 (0)3 9345 2628 f: +61 (0)3 9347 0852 -- --~--~-~--~~~---~--~~ When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google Groups "aroma.affymetrix" group. To post to this group, send email to aroma-affymetrix@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/aroma-affymetrix?hl=en -~--~~~~--~~--~--~---
[aroma.affymetrix] Re: Reproducing RMA with Gene ST data (Was: Re: [aroma.affymetrix] Re: How do you analyze Gene ST Data?)
Hi. a comment on RmaPlm and argument 'flavor': The RmaPlm class is only summarizing the probe signals - normalization etc are done before RmaPlm. The summarization model is the log-additive model with probe affinities and chip effects. The 'flavor' argument specifies which implementation of the fitting algorithm to use. The default is "affyPLM", which indicates that it uses the implementation in 'affyPLM', which now has moved to the 'preprocessCore' package. Both 'affyPLM' and 'preprocessCore' are developed and maintained by Ben Bolstad. The "oligo" flavor was using the fitting algorithm in the 'oligo' package, which I think in turn originates from the 'affy' package, which in turn was an early version of Ben Bolstad's code. I haven't tried this in awhile, and it appears that 'oligo' has been updated and the internal function for fitting the model is no longer available. This is why you get the error message. Since BB maintain preprocessCore and there has been a lot of improvements in the algorithm (e.g. it supports probe-specific weights and much more), I recommend that you use the "affyPLM" flavor. Since this is default, you do not have to specify the argument 'flavor' at all. For reproducibility of the RMA pipeline in aroma.affymetrix when compared with 'affyPLM', see Page 'Reproducibility of other implementations': http://groups.google.com/group/aroma-affymetrix/web/redundancy-tests As you see, the results are remarkable similar. FYI, for each update of the package this comparison is part of the redundancy testing. Note that only PMs are quantile normalized, this might be a reason for you observing differences. There is more meat on how QuantileNormalization and plotDensity() behaves on Page 'Empirical probe-signal densities and rank-based quantile normalization': http://groups.google.com/group/aroma-affymetrix/web/empirical-probe-signal-densities-and-rank-based-quantile-normalization I am also not sure how similar affyPLM and Affy is, but I trust affyPLM more than Affy, because I see affyPLM as a more up to date version of Affy. Please feel free to contribute with code for comparing toward Affy. If you do, please use the same data set. Cheers Henrik On Wed, Nov 26, 2008 at 12:44 PM, pwhiteusa <[EMAIL PROTECTED]> wrote: > > Hi Manasa, > > So I get exactly the same results with either of the following: > > plm <- RmaPlm(csN, flavor="affyPLM") #according to the documentation > this is the default > plm <- ExonRmaPlm(csN, flavor = "affyPLM", mergeGroups=TRUE) #seems to > take longer to run > > However, I noticed that my GeneST normalized data is quite different > from the data that I produce using the Affy package. When looking at > the controls on the array I see that the Aroma normalized data is > between 5-10% lower than that produced by the Bioconductor Affy > packages. However, for some probes this difference is can be quite > large (values are averaged across 16 samples): > > ProbeBionconductorAroma > 10341096 (neg_control) 6079 852 > 10341735 (neg_control) 25 87 > 10340969 (pos_control) 3953 1758 > 10338477 (pos_control) 293 611 > > Ultimately, when my downstream analysis looks for differential > expression the differences between the two analysis approaches become > minimal, but I did notice that the Aroma package seems to call more > control probes and probes with no known gene as being differentially > expressed (i.e. it looks noisier). These probes were all pretty close > to my 2 fold cutoff, and at 3 fold or greater the data looked the > same. > > Do you know where the packages diverge if their probe summation > approach or is it a difference in background correction? I did attempt > to use the flavor="oligo" (as described in the ?RmaPlm help file) but > this returned the following error: > >> plm2 <- RmaPlm(csN, flavor="oligo") >> fit(plm2) > Exception: The fit function for requested RMA PLM flavor failed: oligo > > Thanks, > > Peter > > P.S. Here is a summary of the code I used for the Bioconductor > approach: > > library(affy) > TestData <- ReadAffy() > TestEset <- rma(TestData) > > If you plot(AromaEset[,1],TestEset[,1]) you can visualize how > different the data is. > > On Nov 24, 11:42 pm, ManasaR <[EMAIL PROTECTED]> wrote: >> Hi, >> >> Just to say that i've been working with some GeneST1.0 data following >> the tips above and it has been a breeze thanks to you guys. I noticed >> something and thought i should mention it - though it might be trivial >> and most users might have already discovered it. Just in case there >> are people like me out there :-) >> >> With regards to the first reply Mark made above and the following >> point, >> >> >1. Process Gene 1.0 ST data much the same as Exon array data, except >> >that you'll need to replace 'ExonRmaPlm' with 'RmaPlm' >> >> RmaPlm does not have "mergeGroups" as an argument. Im assuming t