Re: getAverageFile() or calculateBaseline() for the CN reference? (Was: Re: [aroma.affymetrix] AromaUnitTotalCnBinarySet vs CnChipEffectSet)

Henrik Bengtsson Fri, 15 Apr 2011 17:17:29 -0700

Hi again.

On Tue, Apr 12, 2011 at 3:44 PM, Kai <wangz...@gmail.com> wrote:
> Hi Henrik,


[snip]

> Another confusion I have is the difference between
> "calculateBaseline()" and "getAverageFile()". I am trying to process a
> set of Hapmap samples and use their pooled average as a general copy
> number reference. So which function should I use?

They basically differ in how the copy-neutral baseline is estimated
for ChrX and ChrY.

The getAverageFile() don't treat these chromosome differently than any
others and therefore also ignores whether a sample is XX or XY.  Thus,
when calculating relative test-versus-reference CNs, C = 2*T/R, using
reference signals based on getAverageFile() is likely yo be biased
from ChrX and ChrY.  For instance, if you T is an XX sample and all
other samples in the data set are XY:s, then the 'R' signals based on
getAverageFile() will basically correspond to CN=1.  Since the 'T'
signals are roughly twice as large as these 'R' signals, C = 2*T/R ~=
2*2/1 = 4.  That is, the estimated CN ratios for T on ChrX will be 4
instead of 2.

The calculateBaseline() method can control for this bias, if (and only
if) it gets some clues about which samples are XX and XY.  The details
for this method is explained Section 3.2.7 'Reference signals' in the
CRMA manuscript:

H. Bengtsson, R. Irizarry, B. Carvalho & T.P. Speed, Estimation and
assessment of raw copy numbers at the single locus level,
Bioinformatics 2008, 24, pp759-767.
http://aroma-project.org/publications/

The easiest way to let calculateBaseline() know which samples are XX
and which are XY, is to add XX and XY tags to the names of the data
files, e.g. NA12003,XY.CEL and NA12004,XX.CEL.  Alternatively, you can
create a so called Sample Annotation File (SAF) and place it in
annotationData/samples/.  I've created one for the HapMap 270 data
set, which I'm happy to share with you:

http://aroma-project.org/data/annotationData/samples/HapMap270.saf

If you create similar SAF files for other HapMap samples, please
consider sharing them with us since it is a major work to set them up.

To summarize, if you don't mind a bias in the ChrX & ChrY CNs, that
is, a global shift on the log-ratio scale, then you can equally well
use getAverageFile().  The CBS segmentation method will still pick up
the same change points.  If you mind the bias, you need to use the
bias-corrected calculateBaseline() method.

Hope this helps

Henrik

>
> Thank you very much.
>
> Best,
> Kai
>
> --
> When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
> version of the package, 2) to report the output of sessionInfo() and 
> traceback(), and 3) to post a complete code example.
>
>
> You received this message because you are subscribed to the Google Groups 
> "aroma.affymetrix" group with website http://www.aroma-project.org/.
> To post to this group, send email to aroma-affymetrix@googlegroups.com
> To unsubscribe and other options, go to http://www.aroma-project.org/forum/
>

-- 
When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
version of the package, 2) to report the output of sessionInfo() and 
traceback(), and 3) to post a complete code example.


You received this message because you are subscribed to the Google Groups 
"aroma.affymetrix" group with website http://www.aroma-project.org/.
To post to this group, send email to aroma-affymetrix@googlegroups.com
To unsubscribe and other options, go to http://www.aroma-project.org/forum/

Re: getAverageFile() or calculateBaseline() for the CN reference? (Was: Re: [aroma.affymetrix] AromaUnitTotalCnBinarySet vs CnChipEffectSet)

Reply via email to