[aroma.affymetrix] Re: Clarification of Raw CN estimates and CBS mean

2011-12-12 Thread Greg Wall
CORRECTION:

ref.theta - extractMatrix(Normal)
theta - extractMatrix(Tumor)
C - 2 * theta/ref.theta


On Mon, Dec 12, 2011 at 1:09 PM, Gregory W greg.d.w...@gmail.com wrote:

 Hello,

 Many thanks for the site and quick feedback to the discussion board.

 I was hoping to get a little clarification about the difference
 between these two statistics: Raw CN and CBS mean.

 After running CBS I get the following data frame:

 cbs  - CbsModel(tumor, normal, min.width=5, alpha = .01)
 fit(cbs, min.width=5, verbose=0, force=TRUE)
 regions - getRegions(cbs)
 regions - regions[[1]][,1:5]
 head(regions)

  chromosomestart stopmean count
 1  151599 25455929  0.0870 14665
 2  1 25465716 25519534 -1.275029 ***
 3  1 25519574 26308518  0.0994   369
 4  1 26313794 27104892  0.4272   336
 5  1 27108799 38042779  0.1022  6161
 6  1 38044082 38063269 -0.6952 8


 Where column mean is the result from DNAcopy segmentation for the
 particular region -- which is just the mean log-ratio-value of all
 probes in the region.  And I'm assuming that since I ran doCRMAv2 on
 the data, the CBS code above is performed on normalized tumor and
 normal data.


 Focusing on regions 2 show above with trailing ***, if I run:

 ref.theta - extractMatrix(ref.average)
 theta - extractMatrix(file.group)
 C  - 2 * theta/ref.theta

 And then calculate the mean raw CN using matrix C for the same probes
 in regions 2 used to calculate the CBS mean of -1.2750: does this CN
 mean relate in any way to the CBS mean?  I realize the log ratio is
 logged and centered around 0 for the segmented data, whereas, CN is
 around 2, but aside from simple transformations can you help me
 understand the difference between these two statistics?

 Does the mean CN for probes included in a particular CBS segmented
 regions have any value?

 Many thanks!
 Greg





-- 
When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
version of the package, 2) to report the output of sessionInfo() and 
traceback(), and 3) to post a complete code example.


You received this message because you are subscribed to the Google Groups 
aroma.affymetrix group with website http://www.aroma-project.org/.
To post to this group, send email to aroma-affymetrix@googlegroups.com
To unsubscribe and other options, go to http://www.aroma-project.org/forum/


Re: [aroma.affymetrix] Re: Clarification of Raw CN estimates and CBS mean

2011-12-12 Thread Henrik Bengtsson
Hi,



On Mon, Dec 12, 2011 at 1:15 PM, Greg Wall greg.d.w...@gmail.com wrote:
 CORRECTION:

 ref.theta - extractMatrix(Normal)
 theta     - extractMatrix(Tumor)
 C         - 2 * theta/ref.theta


 On Mon, Dec 12, 2011 at 1:09 PM, Gregory W greg.d.w...@gmail.com wrote:

 Hello,

 Many thanks for the site and quick feedback to the discussion board.

 I was hoping to get a little clarification about the difference
 between these two statistics: Raw CN and CBS mean.

 After running CBS I get the following data frame:

 cbs  - CbsModel(tumor, normal, min.width=5, alpha = .01)
 fit(cbs, min.width=5, verbose=0, force=TRUE)
 regions - getRegions(cbs)
 regions - regions[[1]][,1:5]
 head(regions)

  chromosome    start     stop    mean count
 1          1    51599 25455929  0.0870 14665
 2          1 25465716 25519534 -1.2750    29 ***
 3          1 25519574 26308518  0.0994   369
 4          1 26313794 27104892  0.4272   336
 5          1 27108799 38042779  0.1022  6161
 6          1 38044082 38063269 -0.6952     8


 Where column mean is the result from DNAcopy segmentation for the
 particular region -- which is just the mean log-ratio-value of all
 probes in the region.  And I'm assuming that since I ran doCRMAv2 on
 the data, the CBS code above is performed on normalized tumor and
 normal data.


 Focusing on regions 2 show above with trailing ***, if I run:

 ref.theta - extractMatrix(ref.average)
 theta     - extractMatrix(file.group)
 C          - 2 * theta/ref.theta

 And then calculate the mean raw CN using matrix C for the same probes
 in regions 2 used to calculate the CBS mean of -1.2750: does this CN
 mean relate in any way to the CBS mean?  I realize the log ratio is
 logged and centered around 0 for the segmented data, whereas, CN is
 around 2, but aside from simple transformations can you help me
 understand the difference between these two statistics?

There are two CN metrics here:

non-logged CN ratios: C - 2 * theta/ref.theta
log2 CN ratios: M - log2(theta/ref.theta)

Ignoring non-defined signals, the bijective relationship is M -
log2(C/2) and C - 2 * 2^M.

The CbsModel segments M (log2 CN ratios) and reports the mean values
on these M values per segment.  The reason for this is because that is
what is used in the CBS papers.

In the Aroma framework, we try to represent everything on the original
scale, because that way we avoid issues of that log of non-positive
values etc.  Thus, 'theta' and 'ref.theta' are on the non-log scale.

Moreover, and maybe less of value to this explanation, but personally
I prefer to think of CN data on the non-log scale, especially since we
can indeed have zero or very small copy numbers and then M is
undefined or minus infinity.


 Does the mean CN for probes included in a particular CBS segmented
 regions have any value?

I don't understand this question.

/Henrik


 Many thanks!
 Greg




 --
 When reporting problems on aroma.affymetrix, make sure 1) to run the latest
 version of the package, 2) to report the output of sessionInfo() and
 traceback(), and 3) to post a complete code example.


 You received this message because you are subscribed to the Google Groups
 aroma.affymetrix group with website http://www.aroma-project.org/.
 To post to this group, send email to aroma-affymetrix@googlegroups.com
 To unsubscribe and other options, go to http://www.aroma-project.org/forum/

-- 
When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
version of the package, 2) to report the output of sessionInfo() and 
traceback(), and 3) to post a complete code example.


You received this message because you are subscribed to the Google Groups 
aroma.affymetrix group with website http://www.aroma-project.org/.
To post to this group, send email to aroma-affymetrix@googlegroups.com
To unsubscribe and other options, go to http://www.aroma-project.org/forum/


Re: [aroma.affymetrix] Combining data from multiple chip types

2011-12-12 Thread Henrik Bengtsson
Hi.

On Fri, Dec 9, 2011 at 7:02 PM, Steven McKinney smckin...@bccrc.ca wrote:
 Dear Henrik

 Thanks to your instructions, I have set up and run initial processing on my
 tumour - normal pairs for copy number analysis.  However I am as yet
 unable to determine how to hand off the processed data to CbsModel
 or related functions.

 If I put together a list of tumour and normal sample file names,
 it looks like this:

 # Split data set in (tumour, normal) pairs
 allNames - NULL
 allFullNames - NULL
 for (chipType in names(dsList)) {
 +   ces - dsList[[chipType]];
 +   allNames - unique(c(allNames, getNames(ces)))
 +   allFullNames - unique(c(allFullNames, getFullNames(ces)))
 + }
 pairNames - vector(mode = list, length = length(allNames))
 names(pairNames) - allNames
 for ( ni in seq(along = allNames) ) {
 +   namei - allNames[ni]
 +   pairNames[[namei]][[Tum]] - allFullNames[grepl(Tum, allFullNames)  
 grepl(namei, allFullNames)]
 +   pairNames[[namei]][[Nor]] - allFullNames[grepl(Nor, allFullNames)  
 grepl(namei, allFullNames)]
 + }
 pairNames[c(1, 13, 30)]
 $`8920330`
 $`8920330`$Tum
 [1] 8920330,Tum,s1,Mapping250K_Nsp,total
 [2] 8920330,Tum,s2,Mapping250K_Sty,total

 $`8920330`$Nor
 [1] 8920330,Nor,s1,GenomeWideSNP_6,total


 $`9027278`
                                   Tum                                    Nor
 9027278,Tum,s2,Mapping250K_Sty,total 9027278,Nor,s1,GenomeWideSNP_6,total

 $`8401618`
 $`8401618`$Tum
 [1] 8401618,Tum,s1,Mapping250K_Nsp,total
 [2] 8401618,Tum,s1,Mapping250K_Sty,total

 $`8401618`$Nor
 [1] 8401618,Nor,s1,Mapping250K_Nsp,total
 [2] 8401618,Nor,s1,Mapping250K_Sty,total

 So for patient 8920330, the tumour sample was processed on both Nsp and Sty 
 250K chips
 while the normal sample was processed on a GenomeWideSNP_6 chip.

 For patient 9027278, the tumour sample succeeded only on the Sty 250K chip and
 the normal sample was processed on a GenomeWideSNP_6 chip.

 For patient 8401618, the tumour sample was processed on both Nsp and Sty 250K 
 chips
 and the normal sample was also processed on both Nsp and Sty 250K chips.

 The above three scenarios encapsulate all combinations - I have several 
 patients in
 each of the above three scenarios.

 At this point I am not able to figure out how to hand off such lists of 
 samples to
 CbsModel and related functions.

 If I run the following code


 sets - list(Tum=list(), Nor=list());
 for (chipType in names(dsList)) {
  ces - dsList[[chipType]];
  for (type in names(sets)) {
    idxs - grep(type, getFullNames(ces));
    sets[[type]][[chipType]] - extract(ces, idxs);
  }
 }

 cns - CbsModel(sets$Tum, sets$Nor);


 I get the following error message:

 Error in list(`CbsModel(sets$Tum, sets$Nor)` = environment, 
 `extend(CopyNumberSegmentationModel(cesTuple = cesTuple, ...), CbsModel, .` 
 = environment,  :

 [2011-12-09 17:31:46] Exception: Argument 'x' is of length 1 although the 
 range ([0,0]) implies that is should be empty.

So, the above was new information to me.  When you do tumor-normal
paired segmentation the tumor and the normal CNs need to be for the
same set of loci, which they are not when they originate from
different chip types.  The error is a side effect due to this.

In order to do paired tumor-normal segmentation based with tumor and
normals coming from different chip types/platforms, you have to map
one of them to the set of loci of the other, or both of them to a
common set of loci.  That is not readily available in the Aroma
framework, meaning you have to do the mapping and segmentation
manually (outside aroma).  How many samples do you need to process?

/Henrik



 Full error message at end of this email.

 Can you suggest an alternate strategy to hand off patients to routines
 such as CbsModel for the above scenarios?


 Any information appreciated

 Steve



 Details:

 sessionInfo()
 R version 2.13.2 (2011-09-30)
 Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)

 locale:
 [1] C

 attached base packages:
 [1] stats     graphics  grDevices utils     datasets  methods   base

 other attached packages:
  [1] DNAcopy_1.26.0         sfit_0.2.0             aroma.affymetrix_2.3.0
  [4] affxparser_1.24.0      aroma.apd_0.2.0        R.huge_0.3.0
  [7] aroma.core_2.3.2       aroma.light_1.22.0     matrixStats_0.4.0
 [10] R.rsp_0.6.9            R.cache_0.5.2          R.filesets_1.1.3
 [13] digest_0.5.1           R.utils_1.9.3          R.oo_1.8.3
 [16] R.methodsS3_1.2.1

 loaded via a namespace (and not attached):
 [1] tools_2.13.2
 Warning message:
 'DESCRIPTION' file has 'Encoding' field and re-encoding is not possible




 log - verbose - Arguments$getVerbose(-10, timestamp=TRUE)
 options(digits=5)
 cdf - AffymetrixCdfFile$byChipType(GenomeWideSNP_6, tags = Full)
 print(cdf)
 AffymetrixCdfFile:
 Path: annotationData/chipTypes/GenomeWideSNP_6
 Filename: GenomeWideSNP_6,Full.cdf
 Filesize: 470.44MB
 Chip type: GenomeWideSNP_6,Full
 RAM: 0.00MB
 File format: v4 (binary; XDA)
 Dimension: 2572x2680
 Number of cells: