Re: Reproducing RMA with Gene ST data (Was: Re: [aroma.affymetrix] Re: How do you analyze Gene ST Data?)

2009-01-12 Thread Mark Robinson

Hi Andy.

I don't think you've gotten a response on this.  Sorry for the delay  
-- holidays.  Some comments below.


On 31/12/2008, at 1:18 AM, Andy_Paparountas wrote:


 Hi all ,

 I really find this conversation very interesting. I am trying to
 analyze a set of 3 treatment and 3 control samples of MoGeneSt10
 array. Thus far with the code pwhite shared I was able to do RMA
 Background correction , quantile normalization and got QC , RLE ,
 NUSE , density plots.

 Q1.  Is there any code to get similar results to affyQCreport? or even
 how can we use affyQCreport to get QC from these arrays?

As far as I know, affyQCreport has not been ported to  
aroma.affymetrix.  I usually make due with RLE, NUSE and density plots  
for my QC.  If there is something specific in affyQCreport that you  
like, it may be easy to port over.  Maybe you'd consider doing the  
implementation.



 Q2. I tried to export my data to an AffyBatch object in order to play
 around with older methods
 ab - extractAffyBatch(cs)

 but I got a Warning message:
 CDF enviroment package 'mogene10stv1cdf' not installed. The 'affy'
 package will later try to download from Bioconductor and install it.

 of course  'mogene10stv1cdf' does not exist as far as I know ,
 instead  we should use mogene10st.db.

 But what should the exact code be to connect the normalized data to
 the annotation contained inside mogene10st.db ?

A couple points here.  First, it looks like Bioconductor is not  
currently supporting the 'affy' way of doing things for these new (1.0  
ST) chips.  If you skim the BioC mailing list archives, the suggestion  
is to use the 'oligo' package or 'xps'.  But, then you are outside the  
world of AffyBatch objects.  So, it doesn't make sense to use  
aroma.affymetrix's 'extractAffyBatch' for these chips.

Second, I believe 'mogene10st.db' only really maps the Gene 1.0 ST  
identifiers to GO attributes, UNIGENE ids, chromosome locations and a  
whole host of other things.  I don't think the physical probe  
locations are present within 'mogene10st.db', so it is not a  
replacement for the CDF file/environment.

Hope that helps.

Mark



 I would really appreciate some help here :)

 Thanks all.


 On 5 ΔΡκ, 17:43, pwhite...@gmail.com wrote:
 Hi Mark,

 Thanks for adding flavor=oligo to RmaPlm. I verified it with the  
 new
 release and the HGU133Plus2 data I have and it all looks good.  
 Pairs plots
 are attached.

 Thanks,

 Peter

 On Thu, Dec 4, 2008 at 5:41 PM, Mark Robinson  
 mrobin...@wehi.edu.au wrote:

 Thanks Peter.

 Perhaps you can repeat this comparison after the next release (this
 will be very soon!) and split the aroma.affymetrix comparison into:

 - aroma.affy.oligo -- with RmaPlm(csN,flavor=oligo)
 - aroma.affy.affyPLM -- with flavor=affyPLM (as you've done  
 already)

 Perhaps the best way to look at all of this at once is with a single
 pairs() plot.

 Cheers,
 Mark

 On 05/12/2008, at 9:01 AM, pwhite...@gmail.com wrote:

 Dear Mark and Henrik,

 I wanted to confirm that your summary was correct regarding the
 different flavors for probeset summarization. I downloaded the MAQC
 HG_U133_Plus_2 array data from the MAQC website:

 http://edkb.fda.gov/MAQC/MainStudy/upload/MAQC_AFX_123456_120CELs.zip

 I then ran the analysis of the arrays from site 1, using just the A
 and B samples, with aroma.affymetrix, affy, affyPLM and oligo (see
 below for the complete code I used to do this). Basically the
 aroma.affymetrix and affyPLM data was essentially identical. The
 affy and oligo data was also essentially identical. As observed  
 with
 the Gene ST array data there were significant differences between
 aroma.affymetrix and affy or oligo. Plots are attached.

 The Gene ST arrays do not have any MM probes - as we are using RMA
 rather than GCRMA this should not have affected anything.

 Thanks,

 Peter

 #OLIGO ANALYSIS

 library(pd.hg.u133.plus.2)
 library(pdInfoBuilder)
 fn - dir(G:\\BGC_EXPERIMENTS\\MAQC_Data\\HG-
 U133_Plus_2,CEL,full=T)[1:10]
 raw.oligo-read.celfiles(filenames=fn,pkgname=pd.hg.u133.plus.2)
 eset.oligo-rma(raw.oligo)
 data.oligo-exprs(eset.oligo)

 #AFFY ANALYSIS

 library(affy)
 fn - dir(G:\\BGC_EXPERIMENTS\\MAQC_Data\\HG-
 U133_Plus_2,CEL,full=T)[1:10]
 raw.affy - ReadAffy(filenames=fn)
 eset.affy - rma(raw.affy)
 data.affy - exprs(eset.affy)

 #AFFY PLM ANALYSIS

 library(affyPLM)
 fn - dir(G:\\BGC_EXPERIMENTS\\MAQC_Data\\HG-
 U133_Plus_2,CEL,full=T)[1:10]
 raw.affyPLM - ReadAffy(filenames=fn)
 fit.affyPLM - fitPLM(raw.affyPLM, verbos=9)
 data.affyPLM - coefs(fit.affyPLM)
 #Analysis of MAQC on Human U113 Plus 2

 setwd(G:\\BGC_EXPERIMENTS\\MAQC_Analysis)
 library(aroma.affymetrix)
 prefixName - MAQC_Data
 chip1 - HG-U133_Plus_2
 cdf - AffymetrixCdfFile$fromChipType(HG-U133_Plus_2)
 cs - AffymetrixCelSet$byName(prefixName, cdf=cdf, chipType=chip1)
 pattern - AFX_1_[AB]
 idxs - grep(pattern, getNames(cs))
 cs - extract(cs, idxs)
 bc - RmaBackgroundCorrection(cs)
 csBC - process(bc)
 qn - 

Re: Reproducing RMA with Gene ST data (Was: Re: [aroma.affymetrix] Re: How do you analyze Gene ST Data?)

2008-12-30 Thread Andy_Paparountas

Hi all ,

I really find this conversation very interesting. I am trying to
analyze a set of 3 treatment and 3 control samples of MoGeneSt10
array. Thus far with the code pwhite shared I was able to do RMA
Background correction , quantile normalization and got QC , RLE ,
NUSE , density plots.

Q1.  Is there any code to get similar results to affyQCreport? or even
how can we use affyQCreport to get QC from these arrays?

Q2. I tried to export my data to an AffyBatch object in order to play
around with older methods
ab - extractAffyBatch(cs)

but I got a Warning message:
CDF enviroment package 'mogene10stv1cdf' not installed. The 'affy'
package will later try to download from Bioconductor and install it.

of course  'mogene10stv1cdf' does not exist as far as I know ,
instead  we should use mogene10st.db.

But what should the exact code be to connect the normalized data to
the annotation contained inside mogene10st.db ?

I would really appreciate some help here :)

Thanks all.


On 5 Δεκ, 17:43, pwhite...@gmail.com wrote:
 Hi Mark,

 Thanks for adding flavor=oligo to RmaPlm. I verified it with the new
 release and the HGU133Plus2 data I have and it all looks good. Pairs plots
 are attached.

 Thanks,

 Peter

 On Thu, Dec 4, 2008 at 5:41 PM, Mark Robinson mrobin...@wehi.edu.au wrote:

  Thanks Peter.

  Perhaps you can repeat this comparison after the next release (this
  will be very soon!) and split the aroma.affymetrix comparison into:

  - aroma.affy.oligo -- with RmaPlm(csN,flavor=oligo)
  - aroma.affy.affyPLM -- with flavor=affyPLM (as you've done already)

  Perhaps the best way to look at all of this at once is with a single
  pairs() plot.

  Cheers,
  Mark

  On 05/12/2008, at 9:01 AM, pwhite...@gmail.com wrote:

   Dear Mark and Henrik,

   I wanted to confirm that your summary was correct regarding the
   different flavors for probeset summarization. I downloaded the MAQC
   HG_U133_Plus_2 array data from the MAQC website:

  http://edkb.fda.gov/MAQC/MainStudy/upload/MAQC_AFX_123456_120CELs.zip

   I then ran the analysis of the arrays from site 1, using just the A
   and B samples, with aroma.affymetrix, affy, affyPLM and oligo (see
   below for the complete code I used to do this). Basically the
   aroma.affymetrix and affyPLM data was essentially identical. The
   affy and oligo data was also essentially identical. As observed with
   the Gene ST array data there were significant differences between
   aroma.affymetrix and affy or oligo. Plots are attached.

   The Gene ST arrays do not have any MM probes - as we are using RMA
   rather than GCRMA this should not have affected anything.

   Thanks,

   Peter

   #OLIGO ANALYSIS

   library(pd.hg.u133.plus.2)
   library(pdInfoBuilder)
   fn - dir(G:\\BGC_EXPERIMENTS\\MAQC_Data\\HG-
   U133_Plus_2,CEL,full=T)[1:10]
   raw.oligo-read.celfiles(filenames=fn,pkgname=pd.hg.u133.plus.2)
   eset.oligo-rma(raw.oligo)
   data.oligo-exprs(eset.oligo)

   #AFFY ANALYSIS

   library(affy)
   fn - dir(G:\\BGC_EXPERIMENTS\\MAQC_Data\\HG-
   U133_Plus_2,CEL,full=T)[1:10]
   raw.affy - ReadAffy(filenames=fn)
   eset.affy - rma(raw.affy)
   data.affy - exprs(eset.affy)

   #AFFY PLM ANALYSIS

   library(affyPLM)
   fn - dir(G:\\BGC_EXPERIMENTS\\MAQC_Data\\HG-
   U133_Plus_2,CEL,full=T)[1:10]
   raw.affyPLM - ReadAffy(filenames=fn)
   fit.affyPLM - fitPLM(raw.affyPLM, verbos=9)
   data.affyPLM - coefs(fit.affyPLM)
   #Analysis of MAQC on Human U113 Plus 2

   setwd(G:\\BGC_EXPERIMENTS\\MAQC_Analysis)
   library(aroma.affymetrix)
   prefixName - MAQC_Data
   chip1 - HG-U133_Plus_2
   cdf - AffymetrixCdfFile$fromChipType(HG-U133_Plus_2)
   cs - AffymetrixCelSet$byName(prefixName, cdf=cdf, chipType=chip1)
   pattern - AFX_1_[AB]
   idxs - grep(pattern, getNames(cs))
   cs - extract(cs, idxs)
   bc - RmaBackgroundCorrection(cs)
   csBC - process(bc)
   qn - QuantileNormalization(csBC, typesToUpdate=pm)
   csN - process(qn)
   plm - RmaPlm(csN, flavor=affyPLM)  #flavor=oligo, must
   library(oligo)
   fit(plm)
   ces - getChipEffectSet(plm)
   getExprs - function(ces, ...) {
     cdf - getCdf(ces)
     theta - extractMatrix(ces, ..., returnUgcMap=TRUE)
     ugcMap - attr(theta, unitGroupCellMap)
     un-getUnitNames(cdf, ugcMap[,unit])
     rownames(theta) - un
     log2(theta)
   }
   data.aroma - getExprs(ces)

   #COMPARING THE DATASETS

dim(data.affy)
   [1] 54675    10
dim(data.affyPLM)
   [1] 54675    10
dim(data.oligo)
   [1] 54613    10
dim(data.aroma)
   [1] 54675    10

   #Unlike in the Gene ST analysis the packages do not return the
   probes in the same order so be careful to reorder them. Also not
   that Oligo removes the control probes (AFFX*).

   sum(rownames(data.affyPLM)==rownames(data.affy))
   # [1] 54675
   o - match(rownames(data.oligo), rownames(data.affy))
   data.affy - data.affy[o,]
   data.affyPLM - data.affyPLM[o,]
   sum(rownames(data.affy)==rownames(data.oligo))
   # [1] 54613
   o - match(rownames(data.affy), 

Re: Reproducing RMA with Gene ST data (Was: Re: [aroma.affymetrix] Re: How do you analyze Gene ST Data?)

2008-12-04 Thread pwhiteusa
Dear Mark and Henrik,

I wanted to confirm that your summary was correct regarding the different
flavors for probeset summarization. I downloaded the MAQC HG_U133_Plus_2
array data from the MAQC website:

http://edkb.fda.gov/MAQC/MainStudy/upload/MAQC_AFX_123456_120CELs.zip

I then ran the analysis of the arrays from site 1, using just the A and B
samples, with aroma.affymetrix, affy, affyPLM and oligo (see below for the
complete code I used to do this). Basically the aroma.affymetrix and affyPLM
data was essentially identical. The affy and oligo data was also essentially
identical. As observed with the Gene ST array data there were significant
differences between aroma.affymetrix and affy or oligo. Plots are attached.

The Gene ST arrays do not have any MM probes - as we are using RMA rather
than GCRMA this should not have affected anything.

Thanks,

Peter

#OLIGO ANALYSIS

library(pd.hg.u133.plus.2)
library(pdInfoBuilder)
fn -
dir(G:\\BGC_EXPERIMENTS\\MAQC_Data\\HG-U133_Plus_2,CEL,full=T)[1:10]
raw.oligo-read.celfiles(filenames=fn,pkgname=pd.hg.u133.plus.2)
eset.oligo-rma(raw.oligo)
data.oligo-exprs(eset.oligo)


#AFFY ANALYSIS

library(affy)
fn -
dir(G:\\BGC_EXPERIMENTS\\MAQC_Data\\HG-U133_Plus_2,CEL,full=T)[1:10]
raw.affy - ReadAffy(filenames=fn)
eset.affy - rma(raw.affy)
data.affy - exprs(eset.affy)


#AFFY PLM ANALYSIS

library(affyPLM)
fn -
dir(G:\\BGC_EXPERIMENTS\\MAQC_Data\\HG-U133_Plus_2,CEL,full=T)[1:10]
raw.affyPLM - ReadAffy(filenames=fn)
fit.affyPLM - fitPLM(raw.affyPLM, verbos=9)
data.affyPLM - coefs(fit.affyPLM)
#Analysis of MAQC on Human U113 Plus 2

setwd(G:\\BGC_EXPERIMENTS\\MAQC_Analysis)
library(aroma.affymetrix)
prefixName - MAQC_Data
chip1 - HG-U133_Plus_2
cdf - AffymetrixCdfFile$fromChipType(HG-U133_Plus_2)
cs - AffymetrixCelSet$byName(prefixName, cdf=cdf, chipType=chip1)
pattern - AFX_1_[AB]
idxs - grep(pattern, getNames(cs))
cs - extract(cs, idxs)
bc - RmaBackgroundCorrection(cs)
csBC - process(bc)
qn - QuantileNormalization(csBC, typesToUpdate=pm)
csN - process(qn)
plm - RmaPlm(csN, flavor=affyPLM)  #flavor=oligo, must library(oligo)
fit(plm)
ces - getChipEffectSet(plm)
getExprs - function(ces, ...) {
  cdf - getCdf(ces)
  theta - extractMatrix(ces, ..., returnUgcMap=TRUE)
  ugcMap - attr(theta, unitGroupCellMap)
  un-getUnitNames(cdf, ugcMap[,unit])
  rownames(theta) - un
  log2(theta)
}
data.aroma - getExprs(ces)


#COMPARING THE DATASETS

 dim(data.affy)
[1] 5467510
 dim(data.affyPLM)
[1] 5467510
 dim(data.oligo)
[1] 5461310
 dim(data.aroma)
[1] 5467510

#Unlike in the Gene ST analysis the packages do not return the probes in the
same order so be careful to reorder them. Also not that Oligo removes the
control probes (AFFX*).

sum(rownames(data.affyPLM)==rownames(data.affy))
# [1] 54675
o - match(rownames(data.oligo), rownames(data.affy))
data.affy - data.affy[o,]
data.affyPLM - data.affyPLM[o,]
sum(rownames(data.affy)==rownames(data.oligo))
# [1] 54613
o - match(rownames(data.affy), rownames(data.aroma))
data.aroma - data.aroma[o,]
sum(rownames(data.affy)==rownames(data.aroma))
# [1] 54613

e- (data.aroma - data.affy)
mean(as.vector(e^2), na.rm=T)
# [1] 0.2119428
sd(as.vector(e^2), na.rm=T)
# [1] 0.3704433

e - (data.aroma - data.oligo)
mean(as.vector(e^2), na.rm=T)
# [1] 0.2104522
sd(as.vector(e^2), na.rm=T)
# [1] 0.3688539

e- (data.aroma - data.affyPLM)
mean(as.vector(e^2), na.rm=T)
# [1] 1.160118e-05
sd(as.vector(e^2), na.rm=T)
# [1] 2.125207e-05

e- (data.affy - data.oligo)
mean(as.vector(e^2), na.rm=T)
# [1] 1.345037e-05
sd(as.vector(e^2), na.rm=T)
# [1] 4.111692e-05

plot(data.aroma[,1],data.affyPLM[,1],main=Comparison of Aroma and AffyPLM
Data,
  col=red,cex=0.5)
abline(0,1, lwd=2)
savePlot(file=HGU133Plus2_Aroma_vs_AffyPLM, type=png)

plot(data.affy[,1],data.oligo[,1],main=Comparison of Affy and Oligo Data,
  col=red,cex=0.5)
abline(0,1, lwd=2)
savePlot(file=HGU133Plus2_Affy_vs_Oligo, type=png)

plot(data.aroma[,1],data.affy[,1],main=Comparison of Aroma and Affy Data,
  col=red,cex=0.5)
abline(0,1, lwd=2)
savePlot(file=HGU133Plus2_Aroma_vs_Affy, type=png)

plot(data.aroma[,1],data.oligo[,1],main=Comparison of Aroma and Oligo
Data,
  col=red,cex=0.5)
abline(0,1, lwd=2)
savePlot(file=HGU133Plus2_Aroma_vs_Oligo, type=png)

plot(data.affy[,1],data.affyPLM[,1],main=Comparison of Affy and AffyPLM
Data,
  col=red,cex=0.5)
abline(0,1, lwd=2)
savePlot(file=HGU133Plus2_Affy_vs_AffyPLM, type=png)


# FYI CREATING HG_U133_PLUS_2 Oligo Annotation LIbrary

setwd(P:\\ANNOTATION\\AffyAnnotation\\Human\\HG-U133_Plus_2)
library(pdInfoBuilder)
cdfFile - HG-U133_Plus_2.cdf
csvAnnoFile - HG-U133_Plus_2.na27.annot.csv
tabSeqFile - HG-U133_Plus_2.probe_tab
pkg - new(AffyExpressionPDInfoPkgSeed, author=Peter White, email=
[EMAIL PROTECTED], version=0.2.0, genomebuild=UCSC
hg18,  June 2006, chipName=hgu133plus2, manufacturer=affymetrix,
biocViews=AnnotationData, cdfFile=cdfFile, csvAnnoFile=csvAnnoFile,
tabSeqFile=tabSeqFile)
makePdInfoPackage(pkg, destDir=.)


On Thu, Dec 

Re: Reproducing RMA with Gene ST data (Was: Re: [aroma.affymetrix] Re: How do you analyze Gene ST Data?)

2008-12-03 Thread Henrik Bengtsson
Hi,

thanks for sharing all this.

On Tue, Dec 2, 2008 at 11:54 AM, pwhiteusa [EMAIL PROTECTED] wrote:

 Hi All,

 Here is the exact code I used to analyze Gene ST data for an
 experiment performed with the MoGene-1_0-st-v1 array.

 AROMA.AFFYMETRIX

 library(aroma.affymetrix)
 cdf - AffymetrixCdfFile$fromChipType(MoGene-1_0-st-v1,tags=r3)
 prefixName - Pierson
 cs - AffymetrixCelSet$byName(prefixName, cdf=cdf)
 bc - RmaBackgroundCorrection(cs)
 csBC - process(bc)
 qn - QuantileNormalization(csBC, typesToUpdate=pm)
 csN - process(qn)
 plm - RmaPlm(csN, flavor=affyPLM)  #flavor=oligo, must library
 (oligo)
 fit(plm)
 ces - getChipEffectSet(plm)
 getExprs - function(ces, ...) {
  cdf - getCdf(ces)
  theta - extractMatrix(ces, ..., returnUgcMap=TRUE)
  ugcMap - attr(theta, unitGroupCellMap)
  un-getUnitNames(cdf, ugcMap[,unit])
  rownames(theta) - un
  log2(theta)
 }
 data.aroma - getExprs(ces)

I think that getExprs() call can be replaced by:

data.aroma - extractDataFrame(ces, addNames=TRUE)

The difference is that the unit names will be in a separate column and
not as row names. You will also get group names and more, but those
you can drop if you want to.

Mark, is it correct that we can deprecate the suggestion to use
getExprs()?  Btw, is this documented somewhere online, or is this
knowledge only from the mailing list?


 Easy! Now to get the same data using the Affy packages:


 BIOCONDUCTOR AFFY

 You first need to create or download your mogene10stv1cdf library from
 the Affy unsupported CDF file (https://stat.ethz.ch/pipermail/bioc-
 devel/2007-October/001403.html has some detail on how to do this).
 However, as Mark Robinson pointed out there are potential issues with
 using the Affy unsupported CDF files. See the following for some
 details:

 https://stat.ethz.ch/pipermail/bioconductor/2007-November/020188.html

 library(affy)
 AffyRaw - ReadAffy()
 AffyEset - rma(AffyRaw)
 data.affy - exprs(AffyEset)


 BIOCONDUCTOR OLIGO

 Download all the required Affy annotation files to your Mouse Gene v1
 ST array directory:

 http://www.affymetrix.com/support/technical/byproduct.affx?product=mogene-1_0-st-v1

 setwd(P:\\ANNOTATION\\AffyAnnotation\\Mouse\\MoGene-1_0-st-v1)
 library(pdInfoBuilder)
 pgfFile - MoGene-1_0-st-v1.r3.pgf
 clfFile - MoGene-1_0-st-v1.r3.clf
 transFile - MoGene-1_0-st-v1.na26.mm9.transcript.csv
 probeFile - MoGene-1_0-st-v1.probe.tab
 pkg - new(AffyGenePDInfoPkgSeed, author=Peter White,
 email=[EMAIL PROTECTED], version=0.1.3,
 genomebuild=UCSC mm9,  July 2007, chipName=MoGene10stv1,
 manufacturer=affymetrix, biocViews=AnnotationData,
 pgfFile=pgfFile, clfFile=clfFile, transFile=transFile,
 probeFile=probeFile)
 makePdInfoPackage(pkg, destDir=.)

 #This takes a little while to make the Package. Once created you will
 need to install the package from the Windows DOS prompt (navigate to
 the annotation directory with the newly created pd package to be
 installed):

 R CMD INSTALL pd.mogene.1.0.st.v1\

 Note for this to work you need RTools and you Path variable set up
 correctly as described at:

 http://cran.r-project.org/doc/manuals/R-admin.html#The-Windows-toolset)

 Now return to R, set the working directory to your CEL file directory:

 library(pd.mogene.1.0.st.v1)
 library(oligo)
 OligoRaw-read.celfiles(filenames=list.celfiles())
 OligoEset-rma(OligoRaw)
 data.oligo-exprs(OligoEset)


 COMPARING THE TWO DATASETS

 Here is what I did to compare the data generate by affy, oligo and
 aroma.affymetrix:

 dim(data.aroma)
 [1] 3551216
 dim(data.affy)
 [1] 3551216
 length(grep(TRUE, rownames(data.affy)==rownames(data.aroma)))
 [1] 35512

FYI, sum(rownames(data.affy)==rownames(data.aroma)) gives you the
same.  Replacing sum() with summary() will also work.


 The output from both the affy rma and aroma.affymetrix methods retains
 the same order of probes and cel files so the two files can be
 compared directly.

That is probably because they work of the same CDF, but you should
never rely on this/assume that this is always the case.  If you do,
you should at least verify that the unit names (and group names)
match.

 However,

 dim(data.oligo)
 [1] 3555716

 The normalized data file from the Oligo package includes an additional
 45 Transcript IDs (there's no annotation on what these are but they
 contain anywhere from 9 to 489 probes per probeset).

For the record, would you mind posting the names of these additional
45 units here?  (I'm sure someone else will search the web later and
find this thread very helpful).

 Fixed this problem as follows:

 o - match(rownames(data.aroma), rownames(data.oligo))
 data.oligo - data.oligo[o,]

 length(grep(TRUE, rownames(data.affy)==rownames(data.oligo)))
 [1] 35512
 length(grep(TRUE, rownames(data.aroma)==rownames(data.oligo)))
 [1] 35512

 Finally, there was one more issue with the aroma data. All elements in
 the 18th row of the dataset were flagged Na. This transcript ID for
 this probeset was 10338063. Looking at the Affy 

Re: Reproducing RMA with Gene ST data (Was: Re: [aroma.affymetrix] Re: How do you analyze Gene ST Data?)

2008-12-03 Thread Mark Robinson

Hi all.

First of all, thanks Peter for 1) doing this testing and 2) for  
spelling everything out.  I expect to refer people to this thread in  
the future, so thanks for that.

Just wanted to add 3 more tidbits of hopefully useful information.

1. I dug a bit into why flavor=oligo doesn't work within  
aroma.affymetrix.  It turns out it was a simple fix.  Since I don't  
use it regularly (it doesn't give probe affinities!) and the  
underlying 'oligo' functions had changed, it stopped working.  Its  
corrected now.  I've checked in the fix, so flavor='oligo' will be  
available in the next release.  In my tests, it appears VERY close to  
'affy' ... and since its based on 'oligo' code, it should be VERY VERY  
similar.

...
plm1 - RmaPlm(csN,flavor=oligo)
fit(plm1,verbose=verbose)
ces - getChipEffectSet(plm1)
data.aroma.oligo - getExprs(ces)
...

  mean( (data.affy[,1]-data.aroma.oligo[,1])^2 )
[1] 0.0003193267

2. I dug a bit into the unsupported CDF and the 'platformDesign'  
objects from oligo and from what I can tell, the probes used in the  
33252 units (I'm looking at Human) within aroma.affymetrix are  
identical to the probes used within oligo (as built with  
pdInfoBuilder) ... not a single probe no accounted for.  In case you  
haven't dug into pdInfoBuilder before and the SQLite db behind, here  
are some commands you may find useful ...

---
library(pd.hugene.1.0.st.v1)
library(pdInfoBuilder)
fn - dir(rawData/tissues/HuGene-1_0-st-v1,CEL,full=TRUE)[1:3]
x - read.celfiles(filenames=fn,pkgname=pd.hugene.1.0.st.v1)
pd -getPlatformDesign(x)
ff - dbGetQuery(db(pd), select * from pmfeature)

# three 3 lines speed up the splitting ...
ffs - split(ff, substr(ff$fsetid,1,4))
ffs - unlist( lapply(ffs, FUN=function(u) split(u,u$fsetid)),  
recursive=FALSE)
names(ffs) - substr(names(ffs), 6,nchar(names(ffs)))

cs - AffymetrixCelSet$fromName(name, chipType=chip)
cdf - getCdf(cs)
cdfCells - readCdfUnits(getPathname(cdf), units=NULL, readXY=FALSE,  
readBases=FALSE, readExpos=FALSE,
  readType=FALSE,  
readDirection=FALSE,stratifyBy=c(pm), readIndices=TRUE, verbose=0)

un - unique(ff$fsetid)
ids - intersect(un,names(cdfCells))

mf - match(ids,names(ffs))
mc - match(ids,names(cdfCells))

matches - matrix(NA,nr=length(ids),nc=3)
rownames(matches) - ids
colnames(matches) - c(pdInfoBuilder,unsupportedCDF,union)

for(i in 1:nrow(matches)) {
   a - cdfCells[[ ids[i] ]]$groups[[1]]$indices
   b - ffs[[ ids[i] ]]$fid
   matches[i,1] - length(b)
   matches[i,2] - length(a)
   matches[i,3] - length(union(a,b))
   cat(ids[i],\n)
}
---

... this gives ..
  matches[1:5,]
 pdInfoBuilder unsupportedCDF union
798132627 2727
809500542 4242
810031010 1010
794811715 1515
815587725 2525
  table(matches[,3]-matches[,1])
 0
33252
  table(matches[,3]-matches[,2])
 0
33252

3.  This doesn't address the problem of the missing probesets.  I'm  
happy to go and collect these if people want them, but based on the  
reply from Affymetrix (thanks to Mark Cowley for the leg work here),  
they are probably 'low-coverage transcript clusters' that can be  
'safely ignored'.  See:

http://article.gmane.org/gmane.science.biology.informatics.conductor/19738


SUMMARY:

[aroma.affymetrix flavor='oligo'] = rma in affy, rma in oligo

[aroma.affymetrix flavor='affyPLM'] = fitPLM in affyPLM

... where '=' means 'for all intents and purposes equivalent', not  
strictly equal.

Cheers,
Mark





On 04/12/2008, at 7:43 AM, Henrik Bengtsson wrote:

 Hi,

 thanks for sharing all this.

 On Tue, Dec 2, 2008 at 11:54 AM, pwhiteusa [EMAIL PROTECTED]  
 wrote:

 Hi All,

 Here is the exact code I used to analyze Gene ST data for an
 experiment performed with the MoGene-1_0-st-v1 array.

 AROMA.AFFYMETRIX

 library(aroma.affymetrix)
 cdf - AffymetrixCdfFile$fromChipType(MoGene-1_0-st-v1,tags=r3)
 prefixName - Pierson
 cs - AffymetrixCelSet$byName(prefixName, cdf=cdf)
 bc - RmaBackgroundCorrection(cs)
 csBC - process(bc)
 qn - QuantileNormalization(csBC, typesToUpdate=pm)
 csN - process(qn)
 plm - RmaPlm(csN, flavor=affyPLM)  #flavor=oligo, must library
 (oligo)
 fit(plm)
 ces - getChipEffectSet(plm)
 getExprs - function(ces, ...) {
 cdf - getCdf(ces)
 theta - extractMatrix(ces, ..., returnUgcMap=TRUE)
 ugcMap - attr(theta, unitGroupCellMap)
 un-getUnitNames(cdf, ugcMap[,unit])
 rownames(theta) - un
 log2(theta)
 }
 data.aroma - getExprs(ces)

 I think that getExprs() call can be replaced by:

 data.aroma - extractDataFrame(ces, addNames=TRUE)

 The difference is that the unit names will be in a separate column and
 not as row names. You will also get group names and more, but those
 you can drop if you want to.

 Mark, is it correct that we can deprecate the suggestion to use
 getExprs()?  Btw, is this documented somewhere online, or is this
 knowledge only from the 

Re: Reproducing RMA with Gene ST data (Was: Re: [aroma.affymetrix] Re: How do you analyze Gene ST Data?)

2008-12-03 Thread Henrik Bengtsson

Hi,

thanks Mark for this.

So, it all has to do with *how* the log-additive probe-level model is
*fitted*, correct?  Thus, the model is the same but the algorithms
differ.  That gives us some sense of how much variance we get from
using different algorithms regardless of model.   Simulation studies
(under the model) could then show if for instance one of the
algorithms is more biased than others.

Thanks for fixing the flavor=oligo.  It will be part of the next release.

Cheers

Henrik

On Wed, Dec 3, 2008 at 1:36 PM, Mark Robinson [EMAIL PROTECTED] wrote:

 Hi all.

 First of all, thanks Peter for 1) doing this testing and 2) for
 spelling everything out.  I expect to refer people to this thread in
 the future, so thanks for that.

 Just wanted to add 3 more tidbits of hopefully useful information.

 1. I dug a bit into why flavor=oligo doesn't work within
 aroma.affymetrix.  It turns out it was a simple fix.  Since I don't
 use it regularly (it doesn't give probe affinities!) and the
 underlying 'oligo' functions had changed, it stopped working.  Its
 corrected now.  I've checked in the fix, so flavor='oligo' will be
 available in the next release.  In my tests, it appears VERY close to
 'affy' ... and since its based on 'oligo' code, it should be VERY VERY
 similar.

 ...
 plm1 - RmaPlm(csN,flavor=oligo)
 fit(plm1,verbose=verbose)
 ces - getChipEffectSet(plm1)
 data.aroma.oligo - getExprs(ces)
 ...

   mean( (data.affy[,1]-data.aroma.oligo[,1])^2 )
 [1] 0.0003193267

 2. I dug a bit into the unsupported CDF and the 'platformDesign'
 objects from oligo and from what I can tell, the probes used in the
 33252 units (I'm looking at Human) within aroma.affymetrix are
 identical to the probes used within oligo (as built with
 pdInfoBuilder) ... not a single probe no accounted for.  In case you
 haven't dug into pdInfoBuilder before and the SQLite db behind, here
 are some commands you may find useful ...

 ---
 library(pd.hugene.1.0.st.v1)
 library(pdInfoBuilder)
 fn - dir(rawData/tissues/HuGene-1_0-st-v1,CEL,full=TRUE)[1:3]
 x - read.celfiles(filenames=fn,pkgname=pd.hugene.1.0.st.v1)
 pd -getPlatformDesign(x)
 ff - dbGetQuery(db(pd), select * from pmfeature)

 # three 3 lines speed up the splitting ...
 ffs - split(ff, substr(ff$fsetid,1,4))
 ffs - unlist( lapply(ffs, FUN=function(u) split(u,u$fsetid)),
 recursive=FALSE)
 names(ffs) - substr(names(ffs), 6,nchar(names(ffs)))

 cs - AffymetrixCelSet$fromName(name, chipType=chip)
 cdf - getCdf(cs)
 cdfCells - readCdfUnits(getPathname(cdf), units=NULL, readXY=FALSE,
 readBases=FALSE, readExpos=FALSE,
  readType=FALSE,
 readDirection=FALSE,stratifyBy=c(pm), readIndices=TRUE, verbose=0)

 un - unique(ff$fsetid)
 ids - intersect(un,names(cdfCells))

 mf - match(ids,names(ffs))
 mc - match(ids,names(cdfCells))

 matches - matrix(NA,nr=length(ids),nc=3)
 rownames(matches) - ids
 colnames(matches) - c(pdInfoBuilder,unsupportedCDF,union)

 for(i in 1:nrow(matches)) {
   a - cdfCells[[ ids[i] ]]$groups[[1]]$indices
   b - ffs[[ ids[i] ]]$fid
   matches[i,1] - length(b)
   matches[i,2] - length(a)
   matches[i,3] - length(union(a,b))
   cat(ids[i],\n)
 }
 ---

 ... this gives ..
   matches[1:5,]
 pdInfoBuilder unsupportedCDF union
 798132627 2727
 809500542 4242
 810031010 1010
 794811715 1515
 815587725 2525
   table(matches[,3]-matches[,1])
 0
 33252
   table(matches[,3]-matches[,2])
 0
 33252

 3.  This doesn't address the problem of the missing probesets.  I'm
 happy to go and collect these if people want them, but based on the
 reply from Affymetrix (thanks to Mark Cowley for the leg work here),
 they are probably 'low-coverage transcript clusters' that can be
 'safely ignored'.  See:

 http://article.gmane.org/gmane.science.biology.informatics.conductor/19738


 SUMMARY:

 [aroma.affymetrix flavor='oligo'] = rma in affy, rma in oligo

 [aroma.affymetrix flavor='affyPLM'] = fitPLM in affyPLM

 ... where '=' means 'for all intents and purposes equivalent', not
 strictly equal.

 Cheers,
 Mark





 On 04/12/2008, at 7:43 AM, Henrik Bengtsson wrote:

 Hi,

 thanks for sharing all this.

 On Tue, Dec 2, 2008 at 11:54 AM, pwhiteusa [EMAIL PROTECTED]
 wrote:

 Hi All,

 Here is the exact code I used to analyze Gene ST data for an
 experiment performed with the MoGene-1_0-st-v1 array.

 AROMA.AFFYMETRIX

 library(aroma.affymetrix)
 cdf - AffymetrixCdfFile$fromChipType(MoGene-1_0-st-v1,tags=r3)
 prefixName - Pierson
 cs - AffymetrixCelSet$byName(prefixName, cdf=cdf)
 bc - RmaBackgroundCorrection(cs)
 csBC - process(bc)
 qn - QuantileNormalization(csBC, typesToUpdate=pm)
 csN - process(qn)
 plm - RmaPlm(csN, flavor=affyPLM)  #flavor=oligo, must library
 (oligo)
 fit(plm)
 ces - getChipEffectSet(plm)
 getExprs - function(ces, ...) {
 cdf - getCdf(ces)
 theta - extractMatrix(ces, ..., 

Re: Reproducing RMA with Gene ST data (Was: Re: [aroma.affymetrix] Re: How do you analyze Gene ST Data?)

2008-12-03 Thread Mark Robinson


On 04/12/2008, at 10:17 AM, Henrik Bengtsson wrote:

 So, it all has to do with *how* the log-additive probe-level model is
 *fitted*, correct?


Correct.  Identical linear model, different procedure for fitting.

(as a bit of an aside ... I think of these things in terms of  
influence functions -- median polish will have a different IF than the  
defaults in affyPLM's robust fit)

M.


 Thus, the model is the same but the algorithms
 differ.  That gives us some sense of how much variance we get from
 using different algorithms regardless of model.   Simulation studies
 (under the model) could then show if for instance one of the
 algorithms is more biased than others.

 Thanks for fixing the flavor=oligo.  It will be part of the next  
 release.

 Cheers

 Henrik

 On Wed, Dec 3, 2008 at 1:36 PM, Mark Robinson  
 [EMAIL PROTECTED] wrote:

 Hi all.

 First of all, thanks Peter for 1) doing this testing and 2) for
 spelling everything out.  I expect to refer people to this thread in
 the future, so thanks for that.

 Just wanted to add 3 more tidbits of hopefully useful information.

 1. I dug a bit into why flavor=oligo doesn't work within
 aroma.affymetrix.  It turns out it was a simple fix.  Since I don't
 use it regularly (it doesn't give probe affinities!) and the
 underlying 'oligo' functions had changed, it stopped working.  Its
 corrected now.  I've checked in the fix, so flavor='oligo' will be
 available in the next release.  In my tests, it appears VERY close to
 'affy' ... and since its based on 'oligo' code, it should be VERY  
 VERY
 similar.

 ...
 plm1 - RmaPlm(csN,flavor=oligo)
 fit(plm1,verbose=verbose)
 ces - getChipEffectSet(plm1)
 data.aroma.oligo - getExprs(ces)
 ...

 mean( (data.affy[,1]-data.aroma.oligo[,1])^2 )
 [1] 0.0003193267

 2. I dug a bit into the unsupported CDF and the 'platformDesign'
 objects from oligo and from what I can tell, the probes used in the
 33252 units (I'm looking at Human) within aroma.affymetrix are
 identical to the probes used within oligo (as built with
 pdInfoBuilder) ... not a single probe no accounted for.  In case you
 haven't dug into pdInfoBuilder before and the SQLite db behind, here
 are some commands you may find useful ...

 ---
 library(pd.hugene.1.0.st.v1)
 library(pdInfoBuilder)
 fn - dir(rawData/tissues/HuGene-1_0-st-v1,CEL,full=TRUE)[1:3]
 x - read.celfiles(filenames=fn,pkgname=pd.hugene.1.0.st.v1)
 pd -getPlatformDesign(x)
 ff - dbGetQuery(db(pd), select * from pmfeature)

 # three 3 lines speed up the splitting ...
 ffs - split(ff, substr(ff$fsetid,1,4))
 ffs - unlist( lapply(ffs, FUN=function(u) split(u,u$fsetid)),
 recursive=FALSE)
 names(ffs) - substr(names(ffs), 6,nchar(names(ffs)))

 cs - AffymetrixCelSet$fromName(name, chipType=chip)
 cdf - getCdf(cs)
 cdfCells - readCdfUnits(getPathname(cdf), units=NULL, readXY=FALSE,
 readBases=FALSE, readExpos=FALSE,
 readType=FALSE,
 readDirection=FALSE,stratifyBy=c(pm), readIndices=TRUE, verbose=0)

 un - unique(ff$fsetid)
 ids - intersect(un,names(cdfCells))

 mf - match(ids,names(ffs))
 mc - match(ids,names(cdfCells))

 matches - matrix(NA,nr=length(ids),nc=3)
 rownames(matches) - ids
 colnames(matches) - c(pdInfoBuilder,unsupportedCDF,union)

 for(i in 1:nrow(matches)) {
  a - cdfCells[[ ids[i] ]]$groups[[1]]$indices
  b - ffs[[ ids[i] ]]$fid
  matches[i,1] - length(b)
  matches[i,2] - length(a)
  matches[i,3] - length(union(a,b))
  cat(ids[i],\n)
 }
 ---

 ... this gives ..
 matches[1:5,]
pdInfoBuilder unsupportedCDF union
 798132627 2727
 809500542 4242
 810031010 1010
 794811715 1515
 815587725 2525
 table(matches[,3]-matches[,1])
0
 33252
 table(matches[,3]-matches[,2])
0
 33252

 3.  This doesn't address the problem of the missing probesets.  I'm
 happy to go and collect these if people want them, but based on the
 reply from Affymetrix (thanks to Mark Cowley for the leg work here),
 they are probably 'low-coverage transcript clusters' that can be
 'safely ignored'.  See:

 http://article.gmane.org/gmane.science.biology.informatics.conductor/19738


 SUMMARY:

 [aroma.affymetrix flavor='oligo'] = rma in affy, rma in oligo

 [aroma.affymetrix flavor='affyPLM'] = fitPLM in affyPLM

 ... where '=' means 'for all intents and purposes equivalent', not
 strictly equal.

 Cheers,
 Mark





 On 04/12/2008, at 7:43 AM, Henrik Bengtsson wrote:

 Hi,

 thanks for sharing all this.

 On Tue, Dec 2, 2008 at 11:54 AM, pwhiteusa [EMAIL PROTECTED]
 wrote:

 Hi All,

 Here is the exact code I used to analyze Gene ST data for an
 experiment performed with the MoGene-1_0-st-v1 array.

 AROMA.AFFYMETRIX

 library(aroma.affymetrix)
 cdf - AffymetrixCdfFile$fromChipType(MoGene-1_0-st-v1,tags=r3)
 prefixName - Pierson
 cs - AffymetrixCelSet$byName(prefixName, cdf=cdf)
 bc - RmaBackgroundCorrection(cs)
 csBC - 

[aroma.affymetrix] Re: Reproducing RMA with Gene ST data (Was: Re: [aroma.affymetrix] Re: How do you analyze Gene ST Data?)

2008-11-26 Thread Henrik Bengtsson

Hi.

a comment on RmaPlm and argument 'flavor':  The RmaPlm class is only
summarizing the probe signals - normalization etc are done before
RmaPlm.   The summarization model is the log-additive model with probe
affinities and chip effects.   The 'flavor' argument specifies which
implementation of the fitting algorithm to use.

The default is affyPLM, which indicates that it uses the
implementation in 'affyPLM', which now has moved to the
'preprocessCore' package.  Both 'affyPLM' and 'preprocessCore' are
developed and maintained by Ben Bolstad.

The oligo flavor was using the fitting algorithm in the 'oligo'
package, which I think in turn originates from the 'affy' package,
which in turn was an early version of Ben Bolstad's code.  I haven't
tried this in awhile, and it appears that 'oligo' has been updated and
the internal function for fitting the model is no longer available.
This is why you get the error message.

Since BB maintain preprocessCore and there has been a lot of
improvements in the algorithm (e.g. it supports probe-specific weights
and much more), I recommend that you use the affyPLM flavor.  Since
this is default, you do not have to specify the argument 'flavor' at
all.

For reproducibility of the RMA pipeline in aroma.affymetrix when
compared with 'affyPLM', see Page 'Reproducibility of other
implementations':

  http://groups.google.com/group/aroma-affymetrix/web/redundancy-tests

As you see, the results are remarkable similar.  FYI, for each update
of the package this comparison is part of the redundancy testing.
Note that only PMs are quantile normalized, this might be a reason for
you observing differences.  There is more meat on how
QuantileNormalization and plotDensity() behaves on Page 'Empirical
probe-signal densities and rank-based quantile normalization':

  
http://groups.google.com/group/aroma-affymetrix/web/empirical-probe-signal-densities-and-rank-based-quantile-normalization

I am also not sure how similar affyPLM and Affy is, but I trust
affyPLM more than Affy, because I see affyPLM as a more up to date
version of Affy.  Please feel free to contribute with code for
comparing toward Affy.  If you do, please use the same data set.

Cheers

Henrik


On Wed, Nov 26, 2008 at 12:44 PM, pwhiteusa [EMAIL PROTECTED] wrote:

 Hi Manasa,

 So I get exactly the same results with either of the following:

 plm - RmaPlm(csN, flavor=affyPLM) #according to the documentation
 this is the default
 plm - ExonRmaPlm(csN, flavor = affyPLM, mergeGroups=TRUE) #seems to
 take longer to run

 However, I noticed that my GeneST normalized data is quite different
 from the data that I produce using the Affy package. When looking at
 the controls on the array I see that the Aroma normalized data is
 between 5-10% lower than that produced by the Bioconductor Affy
 packages. However, for some probes this difference is can be quite
 large (values are averaged across 16 samples):

 ProbeBionconductorAroma
 10341096 (neg_control)   6079 852
 10341735 (neg_control)   25   87
 10340969 (pos_control)   3953 1758
 10338477 (pos_control)   293  611

 Ultimately, when my downstream analysis looks for differential
 expression the differences between the two analysis approaches become
 minimal, but I did notice that the Aroma package seems to call more
 control probes and probes with no known gene as being differentially
 expressed (i.e. it looks noisier). These probes were all pretty close
 to my 2 fold cutoff, and at 3 fold or greater the data looked the
 same.

 Do you know where the packages diverge if their probe summation
 approach or is it a difference in background correction? I did attempt
 to use the flavor=oligo (as described in the ?RmaPlm help file) but
 this returned the following error:

 plm2 - RmaPlm(csN, flavor=oligo)
 fit(plm2)
 Exception: The fit function for requested RMA PLM flavor failed: oligo

 Thanks,

 Peter

 P.S. Here is a summary of the code I used for the Bioconductor
 approach:

 library(affy)
 TestData - ReadAffy()
 TestEset - rma(TestData)

 If you plot(AromaEset[,1],TestEset[,1]) you can visualize how
 different the data is.

 On Nov 24, 11:42 pm, ManasaR [EMAIL PROTECTED] wrote:
 Hi,

 Just to say that i've been working with some GeneST1.0 data following
 the tips above and it has been a breeze thanks to you guys. I noticed
 something and thought i should mention it - though it might be trivial
 and most users might have already discovered it. Just in case there
 are people like me out there :-)

 With regards to the first reply Mark made above and the following
 point,

 1. Process Gene 1.0 ST data much the same as Exon array data, except
 that you'll need to replace 'ExonRmaPlm' with 'RmaPlm'

 RmaPlm does not have mergeGroups as an argument. Im assuming this
 might be a default but do correct me if im wrong since i do want
 values for each transcript separately. So 

[aroma.affymetrix] Re: Reproducing RMA with Gene ST data (Was: Re: [aroma.affymetrix] Re: How do you analyze Gene ST Data?)

2008-11-26 Thread Mark Robinson

Hi all.

Just to follow up on these comments here.

'fitPLM' with default parameters in the affyPLM package should give  
practically identical results to the 'standard' pipeline (RMA bg corr  
+ quantile + fit) within aroma.affymetrix, assuming the underlying  
annotation is the same.  This was an easy comparison back in the day  
of 3' IVT arrays.  Now, its a little more difficult.

If anyone is willing, I'd be keen to know if these two actually do  
give the same results on the newer chips i.e. is the underlying  
annotation the same?  I seem to recall that because these newer chips  
occasionally have probes that are shared amongst different probesets,  
that the older style affy package would not be able to handle it.  For  
example, Jim MacDonald's post:

http://article.gmane.org/gmane.science.biology.informatics.conductor/19184

Jim says there that you don't want to use affy for these chips (not  
100% sure why).  He suggests the whole pdInfoBuilder/oligo thing which  
at one time had some bugs, but is probably more stable now.  I haven't  
dug deeper as to whether the annotation that 'fitPLM' uses by default  
('hugene10st.db' presumably?) matches the annotation that would be  
used by aroma.affymetrix (the converted-to-binary 'unsupported' CDF  
file).  I know Mark Cowley did find some inconsistencies:

http://article.gmane.org/gmane.science.biology.informatics.conductor/19738

This makes me think that we may want to alternatively construct the  
XXGene 1.0 CDFs (XX=Hu,Mo,Ra) directly from the PGF/CLF files instead  
of from the unsupported CDF.  I suspect that there will only be minor  
changes, so I haven't looked too deeply into it.

Anyone want to check?

In addition to what Henrik says about flavour=affyPLM ... for a lot  
of my work, there is definitely additional value in using the  
auxiliary information from the PLMs (i.e. weights, residuals) ... you  
don't get this directly with oligo/median polish.


Few more specific notes ...

 library(affy)
 TestData - ReadAffy()
 TestEset - rma(TestData)

 If you plot(AromaEset[,1],TestEset[,1]) you can visualize how
 different the data is.


I assume you ensured the probesets and samples are in the same order?   
(Or, is this somehow covered by the plot method ...)  I can't tell  
from this sequence of commands.  I don't know what this plot looks  
like, so I don't whether to be alarmed or not.


 However, I noticed that my GeneST normalized data is quite different
 from the data that I produce using the Affy package. When looking at
 the controls on the array I see that the Aroma normalized data is
 between 5-10% lower than that produced by the Bioconductor Affy
 packages. However, for some probes this difference is can be quite
 large (values are averaged across 16 samples):

 ProbeBionconductorAroma
 10341096 (neg_control)   6079 852
 10341735 (neg_control)   25   87
 10340969 (pos_control)   3953 1758
 10338477 (pos_control)   293  611

 Ultimately, when my downstream analysis looks for differential
 expression the differences between the two analysis approaches become
 minimal, but I did notice that the Aroma package seems to call more
 control probes and probes with no known gene as being differentially
 expressed (i.e. it looks noisier). These probes were all pretty close
 to my 2 fold cutoff, and at 3 fold or greater the data looked the
 same.


Tough to really tell.  You'd probably want to average on the log2  
scale, not the linear scale.  This could be a difference in median- 
polish versus robust PLM, or could be a differnce in annotation.   
Requires some digging.

Also, you may want to remove control probesets before doing DE  
analysis --- for one thing, you pay a slightly lesser penalty for  
multiple testing.

Cheers,
Mark


--
Mark Robinson
Epigenetics Laboratory, Garvan
Bioinformatics Division, WEHI
e: [EMAIL PROTECTED]
e: [EMAIL PROTECTED]
p: +61 (0)3 9345 2628
f: +61 (0)3 9347 0852
--





--~--~-~--~~~---~--~~
When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
version of the package, 2) to report the output of sessionInfo() and 
traceback(), and 3) to post a complete code example.


You received this message because you are subscribed to the Google Groups 
aroma.affymetrix group.
To post to this group, send email to aroma-affymetrix@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/aroma-affymetrix?hl=en
-~--~~~~--~~--~--~---