Hi again. On Tue, Apr 12, 2011 at 3:44 PM, Kai <wangz...@gmail.com> wrote: > Hi Henrik,
[snip] > Another confusion I have is the difference between > "calculateBaseline()" and "getAverageFile()". I am trying to process a > set of Hapmap samples and use their pooled average as a general copy > number reference. So which function should I use? They basically differ in how the copy-neutral baseline is estimated for ChrX and ChrY. The getAverageFile() don't treat these chromosome differently than any others and therefore also ignores whether a sample is XX or XY. Thus, when calculating relative test-versus-reference CNs, C = 2*T/R, using reference signals based on getAverageFile() is likely yo be biased from ChrX and ChrY. For instance, if you T is an XX sample and all other samples in the data set are XY:s, then the 'R' signals based on getAverageFile() will basically correspond to CN=1. Since the 'T' signals are roughly twice as large as these 'R' signals, C = 2*T/R ~= 2*2/1 = 4. That is, the estimated CN ratios for T on ChrX will be 4 instead of 2. The calculateBaseline() method can control for this bias, if (and only if) it gets some clues about which samples are XX and XY. The details for this method is explained Section 3.2.7 'Reference signals' in the CRMA manuscript: H. Bengtsson, R. Irizarry, B. Carvalho & T.P. Speed, Estimation and assessment of raw copy numbers at the single locus level, Bioinformatics 2008, 24, pp759-767. http://aroma-project.org/publications/ The easiest way to let calculateBaseline() know which samples are XX and which are XY, is to add XX and XY tags to the names of the data files, e.g. NA12003,XY.CEL and NA12004,XX.CEL. Alternatively, you can create a so called Sample Annotation File (SAF) and place it in annotationData/samples/. I've created one for the HapMap 270 data set, which I'm happy to share with you: http://aroma-project.org/data/annotationData/samples/HapMap270.saf If you create similar SAF files for other HapMap samples, please consider sharing them with us since it is a major work to set them up. To summarize, if you don't mind a bias in the ChrX & ChrY CNs, that is, a global shift on the log-ratio scale, then you can equally well use getAverageFile(). The CBS segmentation method will still pick up the same change points. If you mind the bias, you need to use the bias-corrected calculateBaseline() method. Hope this helps Henrik > > Thank you very much. > > Best, > Kai > > -- > When reporting problems on aroma.affymetrix, make sure 1) to run the latest > version of the package, 2) to report the output of sessionInfo() and > traceback(), and 3) to post a complete code example. > > > You received this message because you are subscribed to the Google Groups > "aroma.affymetrix" group with website http://www.aroma-project.org/. > To post to this group, send email to aroma-affymetrix@googlegroups.com > To unsubscribe and other options, go to http://www.aroma-project.org/forum/ > -- When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google Groups "aroma.affymetrix" group with website http://www.aroma-project.org/. To post to this group, send email to aroma-affymetrix@googlegroups.com To unsubscribe and other options, go to http://www.aroma-project.org/forum/