Re: Reproducing RMA with Gene ST data (Was: Re: [aroma.affymetrix] Re: How do you analyze Gene ST Data?)

Mark Robinson Mon, 12 Jan 2009 18:08:04 -0800

Hi Andy.

I don't think you've gotten a response on this.  Sorry for the delay  
-- holidays.  Some comments below.



On 31/12/2008, at 1:18 AM, Andy_Paparountas wrote:

>
> Hi all ,
>
> I really find this conversation very interesting. I am trying to
> analyze a set of 3 treatment and 3 control samples of MoGeneSt10
> array. Thus far with the code pwhite shared I was able to do RMA
> Background correction , quantile normalization and got QC , RLE ,
> NUSE , density plots.
>
> Q1.  Is there any code to get similar results to affyQCreport? or even
> how can we use affyQCreport to get QC from these arrays?

As far as I know, affyQCreport has not been ported to  
aroma.affymetrix.  I usually make due with RLE, NUSE and density plots  
for my QC.  If there is something specific in affyQCreport that you  
like, it may be easy to port over.  Maybe you'd consider doing the  
implementation.



> Q2. I tried to export my data to an AffyBatch object in order to play
> around with older methods
> ab <- extractAffyBatch(cs)
>
> but I got a Warning message:
> "CDF enviroment package 'mogene10stv1cdf' not installed. The 'affy'
> package will later try to download from Bioconductor and install it."
>
> of course  'mogene10stv1cdf' does not exist as far as I know ,
> instead  we should use "mogene10st.db".
>
> But what should the exact code be to connect the normalized data to
> the annotation contained inside "mogene10st.db" ?

A couple points here.  First, it looks like Bioconductor is not  
currently supporting the 'affy' way of doing things for these new (1.0  
ST) chips.  If you skim the BioC mailing list archives, the suggestion  
is to use the 'oligo' package or 'xps'.  But, then you are outside the  
world of AffyBatch objects.  So, it doesn't make sense to use  
aroma.affymetrix's 'extractAffyBatch' for these chips.

Second, I believe 'mogene10st.db' only really maps the Gene 1.0 ST  
identifiers to GO attributes, UNIGENE ids, chromosome locations and a  
whole host of other things.  I don't think the physical probe  
locations are present within 'mogene10st.db', so it is not a  
replacement for the CDF file/environment.

Hope that helps.

Mark



> I would really appreciate some help here :)
>
> Thanks all.
>
>
> On 5 Ξ”Ξ΅ΞΊ, 17:43, pwhite...@gmail.com wrote:
>> Hi Mark,
>>
>> Thanks for adding flavor="oligo" to RmaPlm. I verified it with the  
>> new
>> release and the HGU133Plus2 data I have and it all looks good.  
>> Pairs plots
>> are attached.
>>
>> Thanks,
>>
>> Peter
>>
>> On Thu, Dec 4, 2008 at 5:41 PM, Mark Robinson  
>> <mrobin...@wehi.edu.au> wrote:
>>
>>> Thanks Peter.
>>
>>> Perhaps you can repeat this comparison after the next release (this
>>> will be very soon!) and split the aroma.affymetrix comparison into:
>>
>>> - aroma.affy.oligo -- with RmaPlm(csN,flavor="oligo")
>>> - aroma.affy.affyPLM -- with flavor="affyPLM" (as you've done  
>>> already)
>>
>>> Perhaps the best way to look at all of this at once is with a single
>>> pairs() plot.
>>
>>> Cheers,
>>> Mark
>>
>>> On 05/12/2008, at 9:01 AM, pwhite...@gmail.com wrote:
>>
>>>> Dear Mark and Henrik,
>>
>>>> I wanted to confirm that your summary was correct regarding the
>>>> different flavors for probeset summarization. I downloaded the MAQC
>>>> HG_U133_Plus_2 array data from the MAQC website:
>>
>>>> http://edkb.fda.gov/MAQC/MainStudy/upload/MAQC_AFX_123456_120CELs.zip
>>
>>>> I then ran the analysis of the arrays from site 1, using just the A
>>>> and B samples, with aroma.affymetrix, affy, affyPLM and oligo (see
>>>> below for the complete code I used to do this). Basically the
>>>> aroma.affymetrix and affyPLM data was essentially identical. The
>>>> affy and oligo data was also essentially identical. As observed  
>>>> with
>>>> the Gene ST array data there were significant differences between
>>>> aroma.affymetrix and affy or oligo. Plots are attached.
>>
>>>> The Gene ST arrays do not have any MM probes - as we are using RMA
>>>> rather than GCRMA this should not have affected anything.
>>
>>>> Thanks,
>>
>>>> Peter
>>
>>>> #OLIGO ANALYSIS
>>
>>>> library(pd.hg.u133.plus.2)
>>>> library(pdInfoBuilder)
>>>> fn <- dir("G:\\BGC_EXPERIMENTS\\MAQC_Data\\HG-
>>>> U133_Plus_2","CEL",full=T)[1:10]
>>>> raw.oligo<-read.celfiles(filenames=fn,pkgname="pd.hg.u133.plus.2")
>>>> eset.oligo<-rma(raw.oligo)
>>>> data.oligo<-exprs(eset.oligo)
>>
>>>> #AFFY ANALYSIS
>>
>>>> library(affy)
>>>> fn <- dir("G:\\BGC_EXPERIMENTS\\MAQC_Data\\HG-
>>>> U133_Plus_2","CEL",full=T)[1:10]
>>>> raw.affy <- ReadAffy(filenames=fn)
>>>> eset.affy <- rma(raw.affy)
>>>> data.affy <- exprs(eset.affy)
>>
>>>> #AFFY PLM ANALYSIS
>>
>>>> library(affyPLM)
>>>> fn <- dir("G:\\BGC_EXPERIMENTS\\MAQC_Data\\HG-
>>>> U133_Plus_2","CEL",full=T)[1:10]
>>>> raw.affyPLM <- ReadAffy(filenames=fn)
>>>> fit.affyPLM <- fitPLM(raw.affyPLM, verbos=9)
>>>> data.affyPLM <- coefs(fit.affyPLM)
>>>> #Analysis of MAQC on Human U113 Plus 2
>>
>>>> setwd("G:\\BGC_EXPERIMENTS\\MAQC_Analysis")
>>>> library(aroma.affymetrix)
>>>> prefixName <- "MAQC_Data"
>>>> chip1 <- "HG-U133_Plus_2"
>>>> cdf <- AffymetrixCdfFile$fromChipType("HG-U133_Plus_2")
>>>> cs <- AffymetrixCelSet$byName(prefixName, cdf=cdf, chipType=chip1)
>>>> pattern <- "AFX_1_[AB]"
>>>> idxs <- grep(pattern, getNames(cs))
>>>> cs <- extract(cs, idxs)
>>>> bc <- RmaBackgroundCorrection(cs)
>>>> csBC <- process(bc)
>>>> qn <- QuantileNormalization(csBC, typesToUpdate="pm")
>>>> csN <- process(qn)
>>>> plm <- RmaPlm(csN, flavor="affyPLM")  #flavor="oligo", must
>>>> library(oligo)
>>>> fit(plm)
>>>> ces <- getChipEffectSet(plm)
>>>> getExprs <- function(ces, ...) {
>>>>   cdf <- getCdf(ces)
>>>>   theta <- extractMatrix(ces, ..., returnUgcMap=TRUE)
>>>>   ugcMap <- attr(theta, "unitGroupCellMap")
>>>>   un<-getUnitNames(cdf, ugcMap[,"unit"])
>>>>   rownames(theta) <- un
>>>>   log2(theta)
>>>> }
>>>> data.aroma <- getExprs(ces)
>>
>>>> #COMPARING THE DATASETS
>>
>>>>> dim(data.affy)
>>>> [1] 54675    10
>>>>> dim(data.affyPLM)
>>>> [1] 54675    10
>>>>> dim(data.oligo)
>>>> [1] 54613    10
>>>>> dim(data.aroma)
>>>> [1] 54675    10
>>
>>>> #Unlike in the Gene ST analysis the packages do not return the
>>>> probes in the same order so be careful to reorder them. Also not
>>>> that Oligo removes the control probes (AFFX*).
>>
>>>> sum(rownames(data.affyPLM)==rownames(data.affy))
>>>> # [1] 54675
>>>> o <- match(rownames(data.oligo), rownames(data.affy))
>>>> data.affy <- data.affy[o,]
>>>> data.affyPLM <- data.affyPLM[o,]
>>>> sum(rownames(data.affy)==rownames(data.oligo))
>>>> # [1] 54613
>>>> o <- match(rownames(data.affy), rownames(data.aroma))
>>>> data.aroma <- data.aroma[o,]
>>>> sum(rownames(data.affy)==rownames(data.aroma))
>>>> # [1] 54613
>>
>>>> e<- (data.aroma - data.affy)
>>>> mean(as.vector(e^2), na.rm=T)
>>>> # [1] 0.2119428
>>>> sd(as.vector(e^2), na.rm=T)
>>>> # [1] 0.3704433
>>
>>>> e <- (data.aroma - data.oligo)
>>>> mean(as.vector(e^2), na.rm=T)
>>>> # [1] 0.2104522
>>>> sd(as.vector(e^2), na.rm=T)
>>>> # [1] 0.3688539
>>
>>>> e<- (data.aroma - data.affyPLM)
>>>> mean(as.vector(e^2), na.rm=T)
>>>> # [1] 1.160118e-05
>>>> sd(as.vector(e^2), na.rm=T)
>>>> # [1] 2.125207e-05
>>
>>>> e<- (data.affy - data.oligo)
>>>> mean(as.vector(e^2), na.rm=T)
>>>> # [1] 1.345037e-05
>>>> sd(as.vector(e^2), na.rm=T)
>>>> # [1] 4.111692e-05
>>
>>>> plot(data.aroma[,1],data.affyPLM[,1],main="Comparison of Aroma and
>>>> AffyPLM Data",
>>>>   col="red",cex=0.5)
>>>> abline(0,1, lwd=2)
>>>> savePlot(file="HGU133Plus2_Aroma_vs_AffyPLM", type="png")
>>
>>>> plot(data.affy[,1],data.oligo[,1],main="Comparison of Affy and  
>>>> Oligo
>>>> Data",
>>>>   col="red",cex=0.5)
>>>> abline(0,1, lwd=2)
>>>> savePlot(file="HGU133Plus2_Affy_vs_Oligo", type="png")
>>
>>>> plot(data.aroma[,1],data.affy[,1],main="Comparison of Aroma and  
>>>> Affy
>>>> Data",
>>>>   col="red",cex=0.5)
>>>> abline(0,1, lwd=2)
>>>> savePlot(file="HGU133Plus2_Aroma_vs_Affy", type="png")
>>
>>>> plot(data.aroma[,1],data.oligo[,1],main="Comparison of Aroma and
>>>> Oligo Data",
>>>>   col="red",cex=0.5)
>>>> abline(0,1, lwd=2)
>>>> savePlot(file="HGU133Plus2_Aroma_vs_Oligo", type="png")
>>
>>>> plot(data.affy[,1],data.affyPLM[,1],main="Comparison of Affy and
>>>> AffyPLM Data",
>>>>   col="red",cex=0.5)
>>>> abline(0,1, lwd=2)
>>>> savePlot(file="HGU133Plus2_Affy_vs_AffyPLM", type="png")
>>
>>>> # FYI CREATING HG_U133_PLUS_2 Oligo Annotation LIbrary
>>
>>>> setwd("P:\\ANNOTATION\\AffyAnnotation\\Human\\HG-U133_Plus_2")
>>>> library(pdInfoBuilder)
>>>> cdfFile <- "HG-U133_Plus_2.cdf"
>>>> csvAnnoFile <- "HG-U133_Plus_2.na27.annot.csv"
>>>> tabSeqFile <- "HG-U133_Plus_2.probe_tab"
>>>> pkg <- new("AffyExpressionPDInfoPkgSeed", author="Peter White",
>>>> email="peter.wh...@nationwidechildrens.org", version="0.2.0",
>>>> genomebuild="UCSC hg18,  June 2006", chipName="hgu133plus2",
>>>> manufacturer="affymetrix", biocViews="AnnotationData",
>>>> cdfFile=cdfFile, csvAnnoFile=csvAnnoFile, tabSeqFile=tabSeqFile)
>>>> makePdInfoPackage(pkg, destDir=".")
>>
>>>> On Thu, Dec 4, 2008 at 4:38 PM, pwhiteusa <pwhite...@gmail.com>  
>>>> wrote:
>>
>>>> On Dec 3, 8:45 pm, Mark Robinson <mrobin...@wehi.edu.au> wrote:
>>>>> On 04/12/2008, at 10:17 AM, Henrik Bengtsson wrote:
>>
>>>>>> So, it all has to do with *how* the log-additive probe-level
>>>> model is
>>>>>> *fitted*, correct?
>>
>>>>> Correct.  Identical linear model, different procedure for fitting.
>>
>>>>> (as a bit of an aside ... I think of these things in terms of
>>>>> influence functions -- median polish will have a different IF than
>>>> the
>>>>> defaults in affyPLM's robust fit)
>>
>>>>> M.
>>
>>>>>> Thus, the model is the same but the algorithms
>>>>>> differ.  That gives us some sense of how much variance we get  
>>>>>> from
>>>>>> using different algorithms regardless of model.   Simulation
>>>> studies
>>>>>> (under the model) could then show if for instance one of the
>>>>>> algorithms is more biased than others.
>>
>>>>>> Thanks for fixing the flavor="oligo".  It will be part of the  
>>>>>> next
>>>>>> release.
>>
>>>>>> Cheers
>>
>>>>>> Henrik
>>
>>>>>> On Wed, Dec 3, 2008 at 1:36 PM, Mark Robinson
>>>>>> <mrobin...@wehi.edu.au> wrote:
>>
>>>>>>> Hi all.
>>
>>>>>>> First of all, thanks Peter for 1) doing this testing and 2) for
>>>>>>> spelling everything out.  I expect to refer people to this
>>>> thread in
>>>>>>> the future, so thanks for that.
>>
>>>>>>> Just wanted to add 3 more tidbits of hopefully useful
>>>> information.
>>
>>>>>>> 1. I dug a bit into why flavor="oligo" doesn't work within
>>>>>>> aroma.affymetrix.  It turns out it was a simple fix.  Since I
>>>> don't
>>>>>>> use it regularly (it doesn't give probe affinities!) and the
>>>>>>> underlying 'oligo' functions had changed, it stopped working.
>>>> Its
>>>>>>> corrected now.  I've checked in the fix, so flavor='oligo' will
>>>> be
>>>>>>> available in the next release.  In my tests, it appears VERY
>>>> close to
>>>>>>> 'affy' ... and since its based on 'oligo' code, it should be  
>>>>>>> VERY
>>>>>>> VERY
>>>>>>> similar.
>>
>>>>>>> ...
>>>>>>> plm1 <- RmaPlm(csN,flavor="oligo")
>>>>>>> fit(plm1,verbose=verbose)
>>>>>>> ces <- getChipEffectSet(plm1)
>>>>>>> data.aroma.oligo <- getExprs(ces)
>>>>>>> ...
>>
>>>>>>>> mean( (data.affy[,1]-data.aroma.oligo[,1])^2 )
>>>>>>> [1] 0.0003193267
>>
>>>>>>> 2. I dug a bit into the unsupported CDF and the 'platformDesign'
>>>>>>> objects from oligo and from what I can tell, the probes used in
>>>> the
>>>>>>> 33252 units (I'm looking at Human) within aroma.affymetrix are
>>>>>>> identical to the probes used within oligo (as built with
>>>>>>> pdInfoBuilder) ... not a single probe no accounted for.  In
>>>> case you
>>>>>>> haven't dug into pdInfoBuilder before and the SQLite db behind,
>>>> here
>>>>>>> are some commands you may find useful ...
>>
>>>>>>> -------
>>>>>>> library(pd.hugene.1.0.st.v1)
>>>>>>> library(pdInfoBuilder)
>>>>>>> fn <- dir("rawData/tissues/HuGene-1_0-st-v1","CEL",full=TRUE)
>>>> [1:3]
>>>>>>> x <- read.celfiles(filenames=fn,pkgname="pd.hugene.1.0.st.v1")
>>
>> ...
>>
>> Ξ΄ΞΉΞ±Ξ²Ξ¬ΟƒΟ„Ξ΅  
>> Ο€Ξ΅Ο ΞΉΟƒΟƒΟŒΟ„Ξ΅Ο Ξ± Β»
>>
>>  HGU133Plus2_Aroma_vs_Affy_Pairs.png
>> 15KΞ•ΞΌΟ†Ξ¬Ξ½ΞΉΟƒΞ·ΞœΞ΅Ο„Ξ 
>> ±Ο†ΟŒΟ Ο„Ο‰ΟƒΞ·
>>
>>  HGU133Plus2_Aroma_vs_AffyPLM_Pairs.png
>> 15KΞ•ΞΌΟ†Ξ¬Ξ½ΞΉΟƒΞ·ΞœΞ΅Ο„Ξ 
>> ±Ο†ΟŒΟ Ο„Ο‰ΟƒΞ·
>>
>>  HGU133Plus2_Aroma_vs_Oligo_Pairs.png
>> 15KΞ•ΞΌΟ†Ξ¬Ξ½ΞΉΟƒΞ·ΞœΞ΅Ο„Ξ 
>> ±Ο†ΟŒΟ Ο„Ο‰ΟƒΞ·
>
> >

------------------------------
Mark Robinson
Epigenetics Laboratory, Garvan
Bioinformatics Division, WEHI
e: m.robin...@garvan.org.au
e: mrobin...@wehi.edu.au
p: +61 (0)3 9345 2628
f: +61 (0)3 9347 0852
------------------------------





--~--~---------~--~----~------------~-------~--~----~
When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
version of the package, 2) to report the output of sessionInfo() and 
traceback(), and 3) to post a complete code example.


You received this message because you are subscribed to the Google Groups 
"aroma.affymetrix" group.
To post to this group, send email to aroma-affymetrix@googlegroups.com
To unsubscribe from this group, send email to 
aroma-affymetrix-unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/aroma-affymetrix?hl=en
-~----------~----~----~----~------~----~------~--~---

Re: Reproducing RMA with Gene ST data (Was: Re: [aroma.affymetrix] Re: How do you analyze Gene ST Data?)

Reply via email to