[aroma.affymetrix] Re: Gene-Level Summarization of Expression Data

Mark Robinson Sun, 17 Jan 2010 01:59:39 -0800

Hi Randy.

From that error message, it looks like there was a mix of CDF filesbeing used (my guess is 54675 corresponds to the number of Affymetrixprobesets, whereas 30625 corresponds to the Refseq reorganization ofprobesets). Can you post the code you ran?


Cheers,
Mark

On 16-Jan-10, at 11:41 AM, Randy Gobbel wrote:

I'm also trying to get gene-level expression values, using HG-
U133_Plus_2 data.  I downloaded the custom CDF that combines probes
into probesets that correspond to RefSeq genes, linked from the
aroma.affymetrix group page for this chip type (Hs133P_Hs_REFSEQ.cdf),
and ran the same set of commands. It works up to the point of trying
to extract expression values, then dies with:

Exception: Range of argument 'indices' is out of range [1,30625]:
[1,54675]

At this point, I'm not sure what to do next. Suggestions?  It looks
like you were the creator of the CDF--is it the right one for this?

-Randy

On Jun 19 2009, 10:08 pm, Mark Robinson <mrobin...@wehi.edu.au> wrote:

Hi Steve.

I don't know how common this is.  Basically, a colleague found agene
that was very differentially expressed when analyzing using the

Affymetrix probesets definition and found virtually nothing whenusing

the custom CDF that bundles all the probes for agenetogether.  The

reason was simple. There were several probesets designed for thisgeneand presumably they measure different isoforms. The probes for

the DE probeset showed the difference, but all the other probesets
didn't.  When you use a robust linear model like RMA, outliers get
downweighted.  Because the DE probes accounted for a small proportion
of the probes (I think there was 3 or 4 other probesets at this
locus), their effect got washed out.

So, its a tradeoff.  Sometimes (perhaps most of the time) you gain by
lumping them all together ... more information, more power to detect
changes.  But, sometimes (perhaps rarely) it can mislead.  I'm sure
I'm not the only one to observe such things.  The probe-level data
(usually?) doesn't lie.  But, since you are comparing across
platforms, you will undoubtedly find this as you go along.  Different
microarray designs often measure slightly different things.

One other thing.  Be sure to convert your CDF to binary if it is not
already using affxparser's convertCdf().  Having this info stored in
binary format will make the processing much faster.  I think the MBNI
custom CDFs are text.

Cheers,
Mark

On 20/06/2009, at 6:55 AM, Steve P wrote:

Mark,

Thanks for the information. That is very helpful.

I want to do the latter, which is to "combine probesets such thatall
probes for a givengene(by some definition -- RefSeq, Ensembl, etc)
are used to arise at the summarize value."

I was able to obtain a custom CDF for the U133-A array. So I willtrythat approach. But part of the reason I want to do this is to beable
to compare values across platforms, so I may need to find/build a
custom CDF for the other platform.

I would appreciate any cautionary advice you have aboutsummarizing at
thegenelevel.

Regards,
-Steve

On Jun 17, 9:56 am, Steve Piccolo <steve.picc...@gmail.com> wrote:

Yesterday I posted this question to the list, but the spam blocker
didn't
let it through. Below my question is a response from Mark Robinson.

--------------------------------------------------------------------------- 
-----------------------------------

Following the example provided 
athttp://groups.google.com/group/aroma-affymetrix/web/gene-1-0-st-array
...
,
I am running the following code:

chipType <- "HT_HG-U133A"
dataSet = "myData"

library(aroma.affymetrix)
verbose <- Arguments$getVerbose(-8, timestamp=TRUE)

cdf <- AffymetrixCdfFile$byChipType(chipType)
cs <- AffymetrixCelSet$byName(dataSet, cdf=cdf)

bc <- RmaBackgroundCorrection(cs)
csBC <- process(bc,verbose=verbose)
qn <- QuantileNormalization(csBC)
csN <- process(qn, verbose=verbose)

plm <- RmaPlm(csN)
fit(plm, verbose=verbose)

ces <- getChipEffectSet(plm)
gExprs <- extractDataFrame(ces, units=NULL, addNames=TRUE)

This seems to be working beautifully.

However, I'm doing an analysis that requires my expression valuesto
be summarized at thegenelevel rather than the probeset level.

In the gExprs object that results from the above analysis, I get a

data.frame object in which each row contains expression valuesfor a

given probeset across all samples. What I would love to see in each
row is an expression value for a givengene. I believe RMA has the
ability to do this, but I'm not sure how to do it via
aroma.affymetrix.

Any suggestions? I'm happy to provide any more details that wouldbe
helpful.

Regards,
-Steve

--------------------------------------------------------------------------- 
-----------------------------------

Hi Steve.

As to your question, it depends on what you need.  When you say you
want
every row to be agene, do you just want to know thegenename that
goes
with the probeset identifier, or do you want to combine probesets
such that
all probes for a givengene(by some definition -- RefSeq, Ensembl,
etc) are
used to arise at the summarize value (a la the MBNI CustomCDF)?

If the former, then there are annotation packages within R.

If the latter, I have a few cautionary tales of doing this, sincethe

different probesets for a given locus can be measuring different
variants.
 But if you still want to do this, we need to make a CDF file
specific to
the annotation you want.  For the standard HG-U133 arrays, I know
the MBNI
guys made the CDFs and we could use those within aroma.affymetrix.
I don't
know if they build custom CDFs for the HT- arrays.

Hope that gets you started.

Cheers,
Mark- Show quoted text -
------------------------------
Mark Robinson, PhD (Melb)
Epigenetics Laboratory, Garvan
Bioinformatics Division, WEHI
e: m.robin...@garvan.org.au
e: mrobin...@wehi.edu.au
p: +61 (0)3 9345 2628
f: +61 (0)3 9347 0852
------------------------------


------------------------------
Mark Robinson, PhD (Melb)
Epigenetics Laboratory, Garvan
Bioinformatics Division, WEHI
e: m.robin...@garvan.org.au
e: mrobin...@wehi.edu.au
p: +61 (0)3 9345 2628
f: +61 (0)3 9347 0852
------------------------------


------------------------------
Mark Robinson, PhD (Melb)
Epigenetics Laboratory, Garvan
Bioinformatics Division, WEHI
e: m.robin...@garvan.org.au
e: mrobin...@wehi.edu.au
p: +61 (0)3 9345 2628
f: +61 (0)3 9347 0852
------------------------------






______________________________________________________________________
The information in this email is confidential and intended solely for the 
addressee.
You must not disclose, forward, print or use it without the permission of the 
sender.
______________________________________________________________________

-- 
When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
version of the package, 2) to report the output of sessionInfo() and 
traceback(), and 3) to post a complete code example.


You received this message because you are subscribed to the Google Groups 
"aroma.affymetrix" group.
To post to this group, send email to aroma-affymetrix@googlegroups.com
To unsubscribe from this group, send email to 
aroma-affymetrix-unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/aroma-affymetrix?hl=en

[aroma.affymetrix] Re: Gene-Level Summarization of Expression Data

Reply via email to