[aroma.affymetrix] Re: Clarification of Raw CN estimates and CBS mean
CORRECTION: ref.theta - extractMatrix(Normal) theta - extractMatrix(Tumor) C - 2 * theta/ref.theta On Mon, Dec 12, 2011 at 1:09 PM, Gregory W greg.d.w...@gmail.com wrote: Hello, Many thanks for the site and quick feedback to the discussion board. I was hoping to get a little clarification about the difference between these two statistics: Raw CN and CBS mean. After running CBS I get the following data frame: cbs - CbsModel(tumor, normal, min.width=5, alpha = .01) fit(cbs, min.width=5, verbose=0, force=TRUE) regions - getRegions(cbs) regions - regions[[1]][,1:5] head(regions) chromosomestart stopmean count 1 151599 25455929 0.0870 14665 2 1 25465716 25519534 -1.275029 *** 3 1 25519574 26308518 0.0994 369 4 1 26313794 27104892 0.4272 336 5 1 27108799 38042779 0.1022 6161 6 1 38044082 38063269 -0.6952 8 Where column mean is the result from DNAcopy segmentation for the particular region -- which is just the mean log-ratio-value of all probes in the region. And I'm assuming that since I ran doCRMAv2 on the data, the CBS code above is performed on normalized tumor and normal data. Focusing on regions 2 show above with trailing ***, if I run: ref.theta - extractMatrix(ref.average) theta - extractMatrix(file.group) C - 2 * theta/ref.theta And then calculate the mean raw CN using matrix C for the same probes in regions 2 used to calculate the CBS mean of -1.2750: does this CN mean relate in any way to the CBS mean? I realize the log ratio is logged and centered around 0 for the segmented data, whereas, CN is around 2, but aside from simple transformations can you help me understand the difference between these two statistics? Does the mean CN for probes included in a particular CBS segmented regions have any value? Many thanks! Greg -- When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google Groups aroma.affymetrix group with website http://www.aroma-project.org/. To post to this group, send email to aroma-affymetrix@googlegroups.com To unsubscribe and other options, go to http://www.aroma-project.org/forum/
Re: [aroma.affymetrix] Re: Clarification of Raw CN estimates and CBS mean
Hi, On Mon, Dec 12, 2011 at 1:15 PM, Greg Wall greg.d.w...@gmail.com wrote: CORRECTION: ref.theta - extractMatrix(Normal) theta - extractMatrix(Tumor) C - 2 * theta/ref.theta On Mon, Dec 12, 2011 at 1:09 PM, Gregory W greg.d.w...@gmail.com wrote: Hello, Many thanks for the site and quick feedback to the discussion board. I was hoping to get a little clarification about the difference between these two statistics: Raw CN and CBS mean. After running CBS I get the following data frame: cbs - CbsModel(tumor, normal, min.width=5, alpha = .01) fit(cbs, min.width=5, verbose=0, force=TRUE) regions - getRegions(cbs) regions - regions[[1]][,1:5] head(regions) chromosome start stop mean count 1 1 51599 25455929 0.0870 14665 2 1 25465716 25519534 -1.2750 29 *** 3 1 25519574 26308518 0.0994 369 4 1 26313794 27104892 0.4272 336 5 1 27108799 38042779 0.1022 6161 6 1 38044082 38063269 -0.6952 8 Where column mean is the result from DNAcopy segmentation for the particular region -- which is just the mean log-ratio-value of all probes in the region. And I'm assuming that since I ran doCRMAv2 on the data, the CBS code above is performed on normalized tumor and normal data. Focusing on regions 2 show above with trailing ***, if I run: ref.theta - extractMatrix(ref.average) theta - extractMatrix(file.group) C - 2 * theta/ref.theta And then calculate the mean raw CN using matrix C for the same probes in regions 2 used to calculate the CBS mean of -1.2750: does this CN mean relate in any way to the CBS mean? I realize the log ratio is logged and centered around 0 for the segmented data, whereas, CN is around 2, but aside from simple transformations can you help me understand the difference between these two statistics? There are two CN metrics here: non-logged CN ratios: C - 2 * theta/ref.theta log2 CN ratios: M - log2(theta/ref.theta) Ignoring non-defined signals, the bijective relationship is M - log2(C/2) and C - 2 * 2^M. The CbsModel segments M (log2 CN ratios) and reports the mean values on these M values per segment. The reason for this is because that is what is used in the CBS papers. In the Aroma framework, we try to represent everything on the original scale, because that way we avoid issues of that log of non-positive values etc. Thus, 'theta' and 'ref.theta' are on the non-log scale. Moreover, and maybe less of value to this explanation, but personally I prefer to think of CN data on the non-log scale, especially since we can indeed have zero or very small copy numbers and then M is undefined or minus infinity. Does the mean CN for probes included in a particular CBS segmented regions have any value? I don't understand this question. /Henrik Many thanks! Greg -- When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google Groups aroma.affymetrix group with website http://www.aroma-project.org/. To post to this group, send email to aroma-affymetrix@googlegroups.com To unsubscribe and other options, go to http://www.aroma-project.org/forum/ -- When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google Groups aroma.affymetrix group with website http://www.aroma-project.org/. To post to this group, send email to aroma-affymetrix@googlegroups.com To unsubscribe and other options, go to http://www.aroma-project.org/forum/
Re: [aroma.affymetrix] Combining data from multiple chip types
Hi. On Fri, Dec 9, 2011 at 7:02 PM, Steven McKinney smckin...@bccrc.ca wrote: Dear Henrik Thanks to your instructions, I have set up and run initial processing on my tumour - normal pairs for copy number analysis. However I am as yet unable to determine how to hand off the processed data to CbsModel or related functions. If I put together a list of tumour and normal sample file names, it looks like this: # Split data set in (tumour, normal) pairs allNames - NULL allFullNames - NULL for (chipType in names(dsList)) { + ces - dsList[[chipType]]; + allNames - unique(c(allNames, getNames(ces))) + allFullNames - unique(c(allFullNames, getFullNames(ces))) + } pairNames - vector(mode = list, length = length(allNames)) names(pairNames) - allNames for ( ni in seq(along = allNames) ) { + namei - allNames[ni] + pairNames[[namei]][[Tum]] - allFullNames[grepl(Tum, allFullNames) grepl(namei, allFullNames)] + pairNames[[namei]][[Nor]] - allFullNames[grepl(Nor, allFullNames) grepl(namei, allFullNames)] + } pairNames[c(1, 13, 30)] $`8920330` $`8920330`$Tum [1] 8920330,Tum,s1,Mapping250K_Nsp,total [2] 8920330,Tum,s2,Mapping250K_Sty,total $`8920330`$Nor [1] 8920330,Nor,s1,GenomeWideSNP_6,total $`9027278` Tum Nor 9027278,Tum,s2,Mapping250K_Sty,total 9027278,Nor,s1,GenomeWideSNP_6,total $`8401618` $`8401618`$Tum [1] 8401618,Tum,s1,Mapping250K_Nsp,total [2] 8401618,Tum,s1,Mapping250K_Sty,total $`8401618`$Nor [1] 8401618,Nor,s1,Mapping250K_Nsp,total [2] 8401618,Nor,s1,Mapping250K_Sty,total So for patient 8920330, the tumour sample was processed on both Nsp and Sty 250K chips while the normal sample was processed on a GenomeWideSNP_6 chip. For patient 9027278, the tumour sample succeeded only on the Sty 250K chip and the normal sample was processed on a GenomeWideSNP_6 chip. For patient 8401618, the tumour sample was processed on both Nsp and Sty 250K chips and the normal sample was also processed on both Nsp and Sty 250K chips. The above three scenarios encapsulate all combinations - I have several patients in each of the above three scenarios. At this point I am not able to figure out how to hand off such lists of samples to CbsModel and related functions. If I run the following code sets - list(Tum=list(), Nor=list()); for (chipType in names(dsList)) { ces - dsList[[chipType]]; for (type in names(sets)) { idxs - grep(type, getFullNames(ces)); sets[[type]][[chipType]] - extract(ces, idxs); } } cns - CbsModel(sets$Tum, sets$Nor); I get the following error message: Error in list(`CbsModel(sets$Tum, sets$Nor)` = environment, `extend(CopyNumberSegmentationModel(cesTuple = cesTuple, ...), CbsModel, .` = environment, : [2011-12-09 17:31:46] Exception: Argument 'x' is of length 1 although the range ([0,0]) implies that is should be empty. So, the above was new information to me. When you do tumor-normal paired segmentation the tumor and the normal CNs need to be for the same set of loci, which they are not when they originate from different chip types. The error is a side effect due to this. In order to do paired tumor-normal segmentation based with tumor and normals coming from different chip types/platforms, you have to map one of them to the set of loci of the other, or both of them to a common set of loci. That is not readily available in the Aroma framework, meaning you have to do the mapping and segmentation manually (outside aroma). How many samples do you need to process? /Henrik Full error message at end of this email. Can you suggest an alternate strategy to hand off patients to routines such as CbsModel for the above scenarios? Any information appreciated Steve Details: sessionInfo() R version 2.13.2 (2011-09-30) Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit) locale: [1] C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] DNAcopy_1.26.0 sfit_0.2.0 aroma.affymetrix_2.3.0 [4] affxparser_1.24.0 aroma.apd_0.2.0 R.huge_0.3.0 [7] aroma.core_2.3.2 aroma.light_1.22.0 matrixStats_0.4.0 [10] R.rsp_0.6.9 R.cache_0.5.2 R.filesets_1.1.3 [13] digest_0.5.1 R.utils_1.9.3 R.oo_1.8.3 [16] R.methodsS3_1.2.1 loaded via a namespace (and not attached): [1] tools_2.13.2 Warning message: 'DESCRIPTION' file has 'Encoding' field and re-encoding is not possible log - verbose - Arguments$getVerbose(-10, timestamp=TRUE) options(digits=5) cdf - AffymetrixCdfFile$byChipType(GenomeWideSNP_6, tags = Full) print(cdf) AffymetrixCdfFile: Path: annotationData/chipTypes/GenomeWideSNP_6 Filename: GenomeWideSNP_6,Full.cdf Filesize: 470.44MB Chip type: GenomeWideSNP_6,Full RAM: 0.00MB File format: v4 (binary; XDA) Dimension: 2572x2680 Number of cells: