Re: [aroma.affymetrix] Combining data from multiple chip types

Henrik Bengtsson Mon, 12 Dec 2011 21:30:02 -0800

Hi.

On Fri, Dec 9, 2011 at 7:02 PM, Steven McKinney <smckin...@bccrc.ca> wrote:
> Dear Henrik
>
> Thanks to your instructions, I have set up and run initial processing on my
> tumour - normal pairs for copy number analysis.  However I am as yet
> unable to determine how to hand off the processed data to CbsModel
> or related functions.
>
> If I put together a list of tumour and normal sample file names,
> it looks like this:
>
>> # Split data set in (tumour, normal) pairs
>> allNames <- NULL
>> allFullNames <- NULL
>> for (chipType in names(dsList)) {
> +   ces <- dsList[[chipType]];
> +   allNames <- unique(c(allNames, getNames(ces)))
> +   allFullNames <- unique(c(allFullNames, getFullNames(ces)))
> + }
>> pairNames <- vector(mode = "list", length = length(allNames))
>> names(pairNames) <- allNames
>> for ( ni in seq(along = allNames) ) {
> +   namei <- allNames[ni]
> +   pairNames[[namei]][["Tum"]] <- allFullNames[grepl("Tum", allFullNames) & 
> grepl(namei, allFullNames)]
> +   pairNames[[namei]][["Nor"]] <- allFullNames[grepl("Nor", allFullNames) & 
> grepl(namei, allFullNames)]
> + }
>> pairNames[c(1, 13, 30)]
> $`8920330`
> $`8920330`$Tum
> [1] "8920330,Tum,s1,Mapping250K_Nsp,total"
> [2] "8920330,Tum,s2,Mapping250K_Sty,total"
>
> $`8920330`$Nor
> [1] "8920330,Nor,s1,GenomeWideSNP_6,total"
>
>
> $`9027278`
>                                   Tum                                    Nor
> "9027278,Tum,s2,Mapping250K_Sty,total" "9027278,Nor,s1,GenomeWideSNP_6,total"
>
> $`8401618`
> $`8401618`$Tum
> [1] "8401618,Tum,s1,Mapping250K_Nsp,total"
> [2] "8401618,Tum,s1,Mapping250K_Sty,total"
>
> $`8401618`$Nor
> [1] "8401618,Nor,s1,Mapping250K_Nsp,total"
> [2] "8401618,Nor,s1,Mapping250K_Sty,total"
>
> So for patient 8920330, the tumour sample was processed on both Nsp and Sty 
> 250K chips
> while the normal sample was processed on a GenomeWideSNP_6 chip.
>
> For patient 9027278, the tumour sample succeeded only on the Sty 250K chip and
> the normal sample was processed on a GenomeWideSNP_6 chip.
>
> For patient 8401618, the tumour sample was processed on both Nsp and Sty 250K 
> chips
> and the normal sample was also processed on both Nsp and Sty 250K chips.
>
> The above three scenarios encapsulate all combinations - I have several 
> patients in
> each of the above three scenarios.
>
> At this point I am not able to figure out how to hand off such lists of 
> samples to
> CbsModel and related functions.
>
> If I run the following code
>
>
> sets <- list(Tum=list(), Nor=list());
> for (chipType in names(dsList)) {
>  ces <- dsList[[chipType]];
>  for (type in names(sets)) {
>    idxs <- grep(type, getFullNames(ces));
>    sets[[type]][[chipType]] <- extract(ces, idxs);
>  }
> }
>
> cns <- CbsModel(sets$Tum, sets$Nor);
>
>
> I get the following error message:
>
> Error in list(`CbsModel(sets$Tum, sets$Nor)` = <environment>, 
> `extend(CopyNumberSegmentationModel(cesTuple = cesTuple, ...), "CbsModel", .` 
> = <environment>,  :
>
> [2011-12-09 17:31:46] Exception: Argument 'x' is of length 1 although the 
> range ([0,0]) implies that is should be empty.


So, the above was new information to me.  When you do tumor-normal
paired segmentation the tumor and the normal CNs need to be for the
same set of loci, which they are not when they originate from
different chip types.  The error is a side effect due to this.

In order to do paired tumor-normal segmentation based with tumor and
normals coming from different chip types/platforms, you have to map
one of them to the set of loci of the other, or both of them to a
common set of loci.  That is not readily available in the Aroma
framework, meaning you have to do the mapping and segmentation
"manually" (outside aroma).  How many samples do you need to process?

/Henrik

>
>
> Full error message at end of this email.
>
> Can you suggest an alternate strategy to hand off patients to routines
> such as CbsModel for the above scenarios?
>
>
> Any information appreciated
>
> Steve
>
>
>
> Details:
>
>> sessionInfo()
> R version 2.13.2 (2011-09-30)
> Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)
>
> locale:
> [1] C
>
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base
>
> other attached packages:
>  [1] DNAcopy_1.26.0         sfit_0.2.0             aroma.affymetrix_2.3.0
>  [4] affxparser_1.24.0      aroma.apd_0.2.0        R.huge_0.3.0
>  [7] aroma.core_2.3.2       aroma.light_1.22.0     matrixStats_0.4.0
> [10] R.rsp_0.6.9            R.cache_0.5.2          R.filesets_1.1.3
> [13] digest_0.5.1           R.utils_1.9.3          R.oo_1.8.3
> [16] R.methodsS3_1.2.1
>
> loaded via a namespace (and not attached):
> [1] tools_2.13.2
> Warning message:
> 'DESCRIPTION' file has 'Encoding' field and re-encoding is not possible
>>
>
>
>
>> log <- verbose <- Arguments$getVerbose(-10, timestamp=TRUE)
>> options(digits=5)
>> cdf <- AffymetrixCdfFile$byChipType("GenomeWideSNP_6", tags = "Full")
>> print(cdf)
> AffymetrixCdfFile:
> Path: annotationData/chipTypes/GenomeWideSNP_6
> Filename: GenomeWideSNP_6,Full.cdf
> Filesize: 470.44MB
> Chip type: GenomeWideSNP_6,Full
> RAM: 0.00MB
> File format: v4 (binary; XDA)
> Dimension: 2572x2680
> Number of cells: 6892960
> Number of units: 1881415
> Cells per unit: 3.66
> Number of QC units: 4
>> gi <- getGenomeInformation(cdf)
>> print(gi)
> UgpGenomeInformation:
> Name: GenomeWideSNP_6
> Tags: Full,na31,hg19,HB20110328
> Full name: GenomeWideSNP_6,Full,na31,hg19,HB20110328
> Pathname: 
> annotationData/chipTypes/GenomeWideSNP_6/GenomeWideSNP_6,Full,na31,hg19,HB20110328.ugp
> File size: 8.97 MB (9407867 bytes)
> RAM: 0.00 MB
> Chip type: GenomeWideSNP_6,Full
>> si <- getSnpInformation(cdf)
>> print(si)
> UflSnpInformation:
> Name: GenomeWideSNP_6
> Tags: Full,na31,hg19,HB20110328
> Full name: GenomeWideSNP_6,Full,na31,hg19,HB20110328
> Pathname: 
> annotationData/chipTypes/GenomeWideSNP_6/GenomeWideSNP_6,Full,na31,hg19,HB20110328.ufl
> File size: 7.18 MB (7526452 bytes)
> RAM: 0.00 MB
> Chip type: GenomeWideSNP_6,Full
> Number of enzymes: 2
>> acs <- AromaCellSequenceFile$byChipType(getChipType(cdf, fullname = FALSE))
>> print(acs)
> AromaCellSequenceFile:
> Name: GenomeWideSNP_6
> Tags: HB20080710
> Full name: GenomeWideSNP_6,HB20080710
> Pathname: 
> annotationData/chipTypes/GenomeWideSNP_6/GenomeWideSNP_6,HB20080710.acs
> File size: 170.92 MB (179217531 bytes)
> RAM: 0.00 MB
> Number of data rows: 6892960
> File format: v1
> Dimensions: 6892960x26
> Column classes: raw, raw, raw, raw, raw, raw, raw, raw, raw, raw, raw, raw, 
> raw, raw, raw, raw, raw, raw, raw, raw, raw, raw, raw, raw, raw, raw
> Number of bytes per column: 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 
> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1
> Footer: <createdOn>20080710 22:47:02 
> PDT</createdOn><platform>Affymetrix</platform><chipType>GenomeWideSNP_6</chipType><srcFile><filename>GenomeWideSNP_6.probe_tab</filename><filesize>341479928</filesize><checksum>2037c033c09fd8f7c06bd042a77aef15</checksum></srcFile><srcFile2><filename>GenomeWideSNP_6.CN_probe_tab</filename><filesize>96968290</filesize><checksum>3dc2d3178f5eafdbea9c8b6eca88a89c</checksum></srcFile2>
> Chip type: GenomeWideSNP_6
> Platform: Affymetrix
>
>> csR <- AffymetrixCelSet$byName("TumourNormal", cdf = cdf)
> print(csR)
>> AffymetrixCelSet:
> Name: TumourNormal
> Tags:
> Path: rawData/TumourNormal/GenomeWideSNP_6
> Platform: Affymetrix
> Chip type: GenomeWideSNP_6,Full
> Number of arrays: 29
> Names: 8920330, 8922989, 8923725, ..., 9325860 [29]
> Time period: 2008-06-18 11:11:40 -- 2009-04-08 00:55:10
> Total file size: 1910.30MB
> RAM: 0.03MB
>> cs <- csR
>> cdf5N <- AffymetrixCdfFile$byChipType("Mapping250K_Nsp")
>> print(cdf5N)
> AffymetrixCdfFile:
> Path: annotationData/chipTypes/Mapping250K_Nsp
> Filename: Mapping250K_Nsp.cdf
> Filesize: 185.45MB
> Chip type: Mapping250K_Nsp
> RAM: 0.00MB
> File format: v4 (binary; XDA)
> Dimension: 2560x2560
> Number of cells: 6553600
> Number of units: 262338
> Cells per unit: 24.98
> Number of QC units: 6
>> gi5N <- getGenomeInformation(cdf5N)
>> print(gi5N)
> UgpGenomeInformation:
> Name: Mapping250K_Nsp
> Tags: na31,HB20101007
> Full name: Mapping250K_Nsp,na31,HB20101007
> Pathname: 
> annotationData/chipTypes/Mapping250K_Nsp/Mapping250K_Nsp,na31,HB20101007.ugp
> File size: 1.25 MB (1312308 bytes)
> RAM: 0.00 MB
> Chip type: Mapping250K_Nsp
>> si5N <- getSnpInformation(cdf5N)
>> print(si5N)
> UflSnpInformation:
> Name: Mapping250K_Nsp
> Tags: na31,HB20101007
> Full name: Mapping250K_Nsp,na31,HB20101007
> Pathname: 
> annotationData/chipTypes/Mapping250K_Nsp/Mapping250K_Nsp,na31,HB20101007.ufl
> File size: 512.98 kB (525291 bytes)
> RAM: 0.00 MB
> Chip type: Mapping250K_Nsp
> Number of enzymes: 1
>> acs5N <- AromaCellSequenceFile$byChipType(getChipType(cdf5N, fullname = 
>> FALSE))
>> print(acs5N)
> AromaCellSequenceFile:
> Name: Mapping250K_Nsp
> Tags: HB20080710
> Full name: Mapping250K_Nsp,HB20080710
> Pathname: 
> annotationData/chipTypes/Mapping250K_Nsp/Mapping250K_Nsp,HB20080710.acs
> File size: 162.50 MB (170394014 bytes)
> RAM: 0.00 MB
> Number of data rows: 6553600
> File format: v1
> Dimensions: 6553600x26
> Column classes: raw, raw, raw, raw, raw, raw, raw, raw, raw, raw, raw, raw, 
> raw, raw, raw, raw, raw, raw, raw, raw, raw, raw, raw, raw, raw, raw
> Number of bytes per column: 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 
> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1
> Footer: <createdOn>20080710 20:51:42 
> PDT</createdOn><platform>Affymetrix</platform><chipType>Mapping250K_Nsp</chipType><srcFile><filename>Mapping250K_Nsp_probe_tab</filename><filesize>192825014</filesize><checksum>bf2921824b9c14285f12b4bea18babda</checksum></srcFile>
> Chip type: Mapping250K_Nsp
> Platform: Affymetrix
>> csR5N <- AffymetrixCelSet$byName("TumourNormal", cdf = cdf5N)
> print(csR5N)
> cs5N <- csR5N
> cdf5S <- AffymetrixCdfFile$byChipType("Mapping250K_Sty")
> print(cdf5S)
> gi5S <- getGenomeInformation(cdf5S)
> print(gi5S)
> si5S <- getSnpInformation(cdf5S)
> print(si5S)
> acs5S <- AromaCellSequenceFile$byChipType(getChipType(cdf5S, fullname = 
> FALSE))
> print(acs5S)
> csR5S <- AffymetrixCelSet$byName("TumourNormal", cdf = cdf5S)
> print(csR5S)
> cs5S <- csR5S
>> AffymetrixCelSet:
> Name: TumourNormal
> Tags:
> Path: rawData/TumourNormal/Mapping250K_Nsp
> Platform: Affymetrix
> Chip type: Mapping250K_Nsp
> Number of arrays: 44
> Names: 8401618, 8401618, 8826931, ..., 9325860 [44]
> Time period: 2009-07-21 13:16:08 -- 2011-07-05 12:02:25
> Total file size: 2758.55MB
> RAM: 0.05MB
>> > > AffymetrixCdfFile:
> Path: annotationData/chipTypes/Mapping250K_Sty
> Filename: Mapping250K_Sty.cdf
> Filesize: 175.83MB
> Chip type: Mapping250K_Sty
> RAM: 0.00MB
> File format: v4 (binary; XDA)
> Dimension: 2560x2560
> Number of cells: 6553600
> Number of units: 238378
> Cells per unit: 27.49
> Number of QC units: 6
>> > UgpGenomeInformation:
> Name: Mapping250K_Sty
> Tags: na31,HB20101007
> Full name: Mapping250K_Sty,na31,HB20101007
> Pathname: 
> annotationData/chipTypes/Mapping250K_Sty/Mapping250K_Sty,na31,HB20101007.ugp
> File size: 1.14 MB (1192508 bytes)
> RAM: 0.00 MB
> Chip type: Mapping250K_Sty
>> > UflSnpInformation:
> Name: Mapping250K_Sty
> Tags: na31,HB20101007
> Full name: Mapping250K_Sty,na31,HB20101007
> Pathname: 
> annotationData/chipTypes/Mapping250K_Sty/Mapping250K_Sty,na31,HB20101007.ufl
> File size: 466.18 kB (477371 bytes)
> RAM: 0.00 MB
> Chip type: Mapping250K_Sty
> Number of enzymes: 1
>> > AromaCellSequenceFile:
> Name: Mapping250K_Sty
> Tags: HB20080710
> Full name: Mapping250K_Sty,HB20080710
> Pathname: 
> annotationData/chipTypes/Mapping250K_Sty/Mapping250K_Sty,HB20080710.acs
> File size: 162.50 MB (170394014 bytes)
> RAM: 0.00 MB
> Number of data rows: 6553600
> File format: v1
> Dimensions: 6553600x26
> Column classes: raw, raw, raw, raw, raw, raw, raw, raw, raw, raw, raw, raw, 
> raw, raw, raw, raw, raw, raw, raw, raw, raw, raw, raw, raw, raw, raw
> Number of bytes per column: 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 
> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1
> Footer: <createdOn>20080710 21:11:58 
> PDT</createdOn><platform>Affymetrix</platform><chipType>Mapping250K_Sty</chipType><srcFile><filename>Mapping250K_Sty_probe_tab</filename><filesize>192816862</filesize><checksum>3e359c89f80ecdf1185a3dee55e40d3e</checksum></srcFile>
> Chip type: Mapping250K_Sty
> Platform: Affymetrix
>
>> dsList <- list()
>> dsList[["GenomeWideSNP_6"]] <- doCRMAv2("TumourNormal", chipType = 
>> "GenomeWideSNP_6,Full")
>
> Loading required package: sfit
> sfit v0.2.0 (2011-05-15) successfully loaded. See ?sfit for help.
>
>> dsList[["Mapping250K_Nsp"]] <- doCRMAv2("TumourNormal", chipType = 
>> "Mapping250K_Nsp")
>> dsList[["Mapping250K_Sty"]] <- doCRMAv2("TumourNormal", chipType = 
>> "Mapping250K_Sty")
>
>
>
>
>
> There were 50 or more warnings (use warnings() to see the first 50)
>> There were 50 or more warnings (use warnings() to see the first 50)
>> There were 50 or more warnings (use warnings() to see the first 50)
>> warnings()
> Warning messages:
> 1: In readBin(con, what = "integer", size = 4, n = 1, signed = FALSE,  ... :
>  'signed = FALSE' is only valid for integers of sizes 1 and 2
> 2: In readBin(con, what = "integer", size = 4, n = 1, signed = FALSE,  ... :
>  'signed = FALSE' is only valid for integers of sizes 1 and 2
> ...
> 50: In readBin(con, what = "integer", size = 4, n = 1, signed = FALSE,  ... :
>  'signed = FALSE' is only valid for integers of sizes 1 and 2
>
>> print(dsList)
> $GenomeWideSNP_6
> AromaUnitTotalCnBinarySet:
> Name: TumourNormal
> Tags: ACC,ra,-XY,BPN,-XY,AVG,A+B,FLN,-XY
> Full name: TumourNormal,ACC,ra,-XY,BPN,-XY,AVG,A+B,FLN,-XY
> Number of files: 29
> Names: 8920330, 8922989, 8923725, ..., 9325860 [29]
> Path (to the first file): 
> totalAndFracBData/TumourNormal,ACC,ra,-XY,BPN,-XY,AVG,A+B,FLN,-XY/GenomeWideSNP_6
> Total file size: 208.15 MB
> RAM: 0.03MB
>
> $Mapping250K_Nsp
> AromaUnitTotalCnBinarySet:
> Name: TumourNormal
> Tags: ACC,-XY,BPN,-XY,AVG,A+B,FLN,-XY
> Full name: TumourNormal,ACC,-XY,BPN,-XY,AVG,A+B,FLN,-XY
> Number of files: 44
> Names: 8401618, 8401618, 8826931, ..., 9325860 [44]
> Path (to the first file): 
> totalAndFracBData/TumourNormal,ACC,-XY,BPN,-XY,AVG,A+B,FLN,-XY/Mapping250K_Nsp
> Total file size: 44.05 MB
> RAM: 0.05MB
>
> $Mapping250K_Sty
> AromaUnitTotalCnBinarySet:
> Name: TumourNormal
> Tags: ACC,-XY,BPN,-XY,AVG,A+B,FLN,-XY
> Full name: TumourNormal,ACC,-XY,BPN,-XY,AVG,A+B,FLN,-XY
> Number of files: 45
> Names: 8401618, 8401618, 8826931, ..., 9325860 [45]
> Path (to the first file): 
> totalAndFracBData/TumourNormal,ACC,-XY,BPN,-XY,AVG,A+B,FLN,-XY/Mapping250K_Sty
> Total file size: 40.94 MB
> RAM: 0.05MB
>
>
>
>
> sets <- list(Tum=list(), Nor=list());
> for (chipType in names(dsList)) {
>  ces <- dsList[[chipType]];
>  for (type in names(sets)) {
>    idxs <- grep(type, getFullNames(ces));
>    sets[[type]][[chipType]] <- extract(ces, idxs);
>  }
> }
>
>
>> sets
> $Tum
> $Tum$GenomeWideSNP_6
> AromaUnitTotalCnBinarySet:
> Name: NA
> Full name: NA
> Number of files: 0
> Path (to the first file): NA
> Total file size: 0.00 MB
> RAM: 0.00MB
>
> $Tum$Mapping250K_Nsp
> AromaUnitTotalCnBinarySet:
> Name: TumourNormal
> Tags: ACC,-XY,BPN,-XY,AVG,A+B,FLN,-XY
> Full name: TumourNormal,ACC,-XY,BPN,-XY,AVG,A+B,FLN,-XY
> Number of files: 36
> Names: 8401618, 8826931, 8920330, ..., 9325860 [36]
> Path (to the first file): 
> totalAndFracBData/TumourNormal,ACC,-XY,BPN,-XY,AVG,A+B,FLN,-XY/Mapping250K_Nsp
> Total file size: 36.04 MB
> RAM: 0.04MB
>
> $Tum$Mapping250K_Sty
> AromaUnitTotalCnBinarySet:
> Name: TumourNormal
> Tags: ACC,-XY,BPN,-XY,AVG,A+B,FLN,-XY
> Full name: TumourNormal,ACC,-XY,BPN,-XY,AVG,A+B,FLN,-XY
> Number of files: 37
> Names: 8401618, 8826931, 8920330, ..., 9325860 [37]
> Path (to the first file): 
> totalAndFracBData/TumourNormal,ACC,-XY,BPN,-XY,AVG,A+B,FLN,-XY/Mapping250K_Sty
> Total file size: 33.66 MB
> RAM: 0.04MB
>
>
> $Nor
> $Nor$GenomeWideSNP_6
> AromaUnitTotalCnBinarySet:
> Name: TumourNormal
> Tags: ACC,ra,-XY,BPN,-XY,AVG,A+B,FLN,-XY
> Full name: TumourNormal,ACC,ra,-XY,BPN,-XY,AVG,A+B,FLN,-XY
> Number of files: 29
> Names: 8920330, 8922989, 8923725, ..., 9325860 [29]
> Path (to the first file): 
> totalAndFracBData/TumourNormal,ACC,ra,-XY,BPN,-XY,AVG,A+B,FLN,-XY/GenomeWideSNP_6
> Total file size: 208.15 MB
> RAM: 0.03MB
>
> $Nor$Mapping250K_Nsp
> AromaUnitTotalCnBinarySet:
> Name: TumourNormal
> Tags: ACC,-XY,BPN,-XY,AVG,A+B,FLN,-XY
> Full name: TumourNormal,ACC,-XY,BPN,-XY,AVG,A+B,FLN,-XY
> Number of files: 8
> Names: 8401618, 8826931, 9000365, ..., 9144105 [8]
> Path (to the first file): 
> totalAndFracBData/TumourNormal,ACC,-XY,BPN,-XY,AVG,A+B,FLN,-XY/Mapping250K_Nsp
> Total file size: 8.01 MB
> RAM: 0.01MB
>
> $Nor$Mapping250K_Sty
> AromaUnitTotalCnBinarySet:
> Name: TumourNormal
> Tags: ACC,-XY,BPN,-XY,AVG,A+B,FLN,-XY
> Full name: TumourNormal,ACC,-XY,BPN,-XY,AVG,A+B,FLN,-XY
> Number of files: 8
> Names: 8401618, 8826931, 9000365, ..., 9144105 [8]
> Path (to the first file): 
> totalAndFracBData/TumourNormal,ACC,-XY,BPN,-XY,AVG,A+B,FLN,-XY/Mapping250K_Sty
> Total file size: 7.28 MB
> RAM: 0.01MB
>
>
>> cns <- CbsModel(sets$Tum, sets$Nor);
> Loading required package: DNAcopy
>
> **************************************************************************
>   The plan to change the data format for CNA object has been postponed
>  in order to ensure backward compatibility with older versions of DNAcopy
> **************************************************************************
>
> Error in list(`CbsModel(sets$Tum, sets$Nor)` = <environment>, 
> `extend(CopyNumberSegmentationModel(cesTuple = cesTuple, ...), "CbsModel", .` 
> = <environment>,  :
>
> [2011-12-09 17:31:46] Exception: Argument 'x' is of length 1 although the 
> range ([0,0]) implies that is should be empty.
>  at throw(Exception(...))
>  at throw.default(sprintf("Argument 'x' is of length %d although the range 
> ([%s
>  at throw(sprintf("Argument 'x' is of length %d although the range ([%s,%s]) 
> im
>  at getIndices.Arguments(static, ..., length = length)
>  at getIndices(static, ..., length = length)
>  at method(static, ...)
>  at Arguments$getIndex(idx, max = n)
>  at getFile.GenericDataFileSet(this, 1)
>  at getFile(this, 1)
>  at getChipType(getFile(this, 1), ...)
>  at getChipType.AromaUnitSignalBinarySet(X[[1]], ...)
>  at FUN(X[[1]], ...)
>  at lapply(X, FUN, ...)
>  at sapply.default(res, FUN = getChipType)
>  at sapply(res, FUN = getChipType)
>  at getSets.AromaMicroarrayDataSetTuple(this)
>  at getSets(this)
>  at hasAlleleBFractions.CopyNumberDataSetTuple(cesTuple)
>  at hasAlleleBFractions(cesTuple)
>  at CopyNumberChromosomalModel(...)
>  at extend(CopyNumberChromosomal
>>
>
>
>
> Steven McKinney, Ph.D.
>
> Statistician
> Molecular Oncology and Breast Cancer Program
> British Columbia Cancer Research Centre
>
> email: smckinney +at+ bccrc +dot+ ca
>
> tel: 604-675-8000 x7561
>
> BCCRC
> Molecular Oncology
> 675 West 10th Ave, Floor 4
> Vancouver B.C.
> V5Z 1L3
> Canada
>
> ________________________________________
> From: aroma-affymetrix@googlegroups.com [aroma-affymetrix@googlegroups.com] 
> On Behalf Of Henrik Bengtsson [henrik.bengts...@gmail.com]
> Sent: December 2, 2011 12:48 PM
> To: aroma-affymetrix@googlegroups.com
> Subject: Re: [aroma.affymetrix] Combining data from multiple chip types
>
> Hi.
>
> Yes, the Aroma framework can handle this.
>
> On Fri, Dec 2, 2011 at 12:19 PM, Steven McKinney <smckin...@bccrc.ca> wrote:
>> Hi all,
>>
>> I am running an analysis on Affymetrix SNP6, 250K Nsp and 250K Sty chip 
>> types.
>> For various reasons, patient samples were assessed either on SNP6 chips or
>> on 500K chipsets (250K Nsp and 250K Sty).  To further complicate things,
>> an occasional 250K Nsp chip processing failed, so some patients have data
>> only on a 250K Sty chip.
>
> Ok, so each sample is processed on either of:
>
> 1. GenomeWideSNP_6
> 2. Mapping250K_Nsp
> 3. Mapping250K_Sty & Mapping250K_Nsp
>
>>
>> I see on the web page
>>
>>   http://www.aroma-project.org/features
>>
>> the description
>>
>> COPY-NUMBER ANALYSIS:
>> * Paired & non-paired copy-number analysis: All generations, i.e. 10K, 100K, 
>> 500K, 5.0 & 6.0. CBS & GLAD * segmentation methods.  Combine data from 
>> multiple chip types.
>>
>>
>> My question is, at what point can data from multiple chip types be combined?
>>
>> As I start my aroma.affymetrix analytic pipeline (shown below), I first 
>> process the
>> GenomeWideSNP_6 chips, then the 250K Nsp, then the 250K Sty.  Is this 
>> appropriate,
>> or is there a way to combine processing of all chip types from the start?
>>
>> If not from the start, at what step can I combine data?
>
> You can safely preprocess the different chip types independently.  For
> simplicity, use doCRMAv2();
>
>  http://aroma-project.org/blocks/doCRMAv2
>
> Note argument 'plm'.   Also, as mention, if you are interested
> allele-specific analysis (e.g. LOH), use doASCRMAv2() in place of
> doCRMAv2().
>
> It is for at the segmentation step you need to care about merging chip
> types.  The segmentation model classes of the Aroma framework (e.g.
> CbsModel), will take care of the merging by simply interweaving the
> loci/total CN estimates from multiple chip types (if such are
> available for the sample currently being segmented).  Using
> do[AS]CRMAv2(), you will basically get an AromaUnitTotalCnBinarySet
> for each chip type.  If you place those in an R list, e.g.
>
> dsList <- list();
> dsList[["GenomeWideSNP_6"]] <- doCRMAv2(..., chipType="GenomeWideSNP_6");
> dsList[["Mapping250K_Nsp"]] <- doCRMAv2(...,
> chipType="Mapping250K_Nsp", plm="RmaPlm");
> dsList[["Mapping250K_Sty"]] <- doCRMAv2(...,
> chipType="Mapping250K_Sty", plm="RmaPlm");
>
> You can simply do
>
> sm <- CbsModel(dsList);
>
> and proceed as illustrated in vignette 'Total copy-number segmentation
> (non-paired CBS)' [http://aroma-project.org/vignettes/NonPairedCBS].
> This idea of merging chip types, is also used in vignette 'Vignette:
> Total copy number analysis using CRMA v1 (10K, 100K, 500K)'
> [http://aroma-project.org/vignettes/CRMAv1].
>
> What you need to be careful about is how your array files are named,
> because that is key for CbsModel to be able to identify which array
> files map to the same sample/individual.  This is also mention in the
> "CRMAv1" vignette.  Note that you do not physically have to rename
> your array/CEL files.  Instead you can utilize so called full-name
> translators, cf. how-to page 'How to: Use fullname translators to
> rename data files'
> [http://aroma-project.org/howtos/setFullNamesTranslator].  These can
> be applied after doing preprocessing (e.g. CRMAv2), so you don't have
> to worry about that until segmentation.
>
>
> Potential problems: In the merging step, there is nothing specific
> that is done to make sure that the CN estimates from the different
> chip types to be merged are on the same scale, i.e. same observed CN
> mean levels for the same underlying/true CN level.  It simply assumes
> that this has been taken care of by the preprocessing method.  I'd
> say, small discrepancies are alright because merging will still
> increase the power to detect change points, which is the number one
> objective of segmentation methods such as CBS.  If there are large
> discrepancies (which I doubt you'll see), you may have to normalize CN
> estimates to be one the same linear scale, cf. vignette 'MSCN:
> Multi-source copy-number normalization'
> [http://aroma-project.org/vignettes/MSCN].  As you can see in the MSCN
> paper (Bengtsson et al. 2009; http://aroma-project.org/publications/),
> bringing estimates on the same scale improves the power to detect
> change points compared to not doing before merging.
>
> Hope this helps get you started
>
> Henrik
>
>>
>> Any advice, or pointers to documentation on this issue of combining data 
>> from multiple chip types that
>> I have not yet found, would be appreciated.
>>
>> Best
>>
>> Steve
>>
>>
>> require("aroma.affymetrix")
>>
>> log <- verbose <- Arguments$getVerbose(-9, timestamp=TRUE)
>> ## Don't display too many decimals.
>> options(digits=5)
>>
>> cdf <- AffymetrixCdfFile$byChipType("GenomeWideSNP_6", tags = "Full")
>> print(cdf)
>>
>> gi <- getGenomeInformation(cdf)
>> print(gi)
>>
>> si <- getSnpInformation(cdf)
>> print(si)
>>
>> acs <- AromaCellSequenceFile$byChipType(getChipType(cdf, fullname = FALSE))
>> print(acs)
>>
>> csR <- AffymetrixCelSet$byName("Primary", cdf = cdf)
>> print(csR)
>>
>> cs <- csR
>>
>> par(mar = c(4, 4, 4, 1) + 0.1)
>> plotDensity(cs, lwd = 2, ylim = c(-0.1, 0.80))
>> stext(side = 3, pos = 0, getFullName(cs))
>> filename <- sprintf("%s,%s,plotDensity.pdf", getFullName(cs), 
>> getChipType(cs))
>> dev.print(pdf, file = filename, width = 7, height = 5)
>>
>> ### 500K
>>
>>
>> cdf5N <- AffymetrixCdfFile$byChipType("Mapping250K_Nsp")
>> print(cdf5N)
>>
>> gi5N <- getGenomeInformation(cdf5N)
>> print(gi5N)
>>
>> si5N <- getSnpInformation(cdf5N)
>> print(si5N)
>>
>> acs5N <- AromaCellSequenceFile$byChipType(getChipType(cdf5N, fullname = 
>> FALSE))
>> print(acs5N)
>>
>> csR5N <- AffymetrixCelSet$byName("Primary", cdf = cdf5N)
>> print(csR5N)
>>
>> cs5N <- csR5N
>>
>> par(mar = c(4, 4, 4, 1) + 0.1)
>> plotDensity(cs5N, lwd = 2, ylim = c(-0.1, 0.80))
>> stext(side = 3, pos = 0, getFullName(cs5N))
>> filename5N <- sprintf("%s,%s,plotDensity.pdf", getFullName(cs5N), 
>> getChipType(cs5N))
>> dev.print(pdf, file = filename5N, width = 7, height = 5)
>>
>> ... etc...
>>
>>
>>
>> Steven McKinney, Ph.D.
>>
>> Statistician
>> Molecular Oncology and Breast Cancer Program
>> British Columbia Cancer Research Centre
>>
>> email: smckinney +at+ bccrc +dot+ ca
>>
>>
>> BCCRC
>> Molecular Oncology
>> 675 West 10th Ave, Floor 4
>> Vancouver B.C.
>> V5Z 1L3
>> Canada
>>
>> --
>> When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
>> version of the package, 2) to report the output of sessionInfo() and 
>> traceback(), and 3) to post a complete code example.
>>
>>
>> You received this message because you are subscribed to the Google Groups 
>> "aroma.affymetrix" group with website http://www.aroma-project.org/.
>> To post to this group, send email to aroma-affymetrix@googlegroups.com
>> To unsubscribe and other options, go to http://www.aroma-project.org/forum/
>
> --
> When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
> version of the package, 2) to report the output of sessionInfo() and 
> traceback(), and 3) to post a complete code example.
>
>
> You received this message because you are subscribed to the Google Groups 
> "aroma.affymetrix" group with website http://www.aroma-project.org/.
> To post to this group, send email to aroma-affymetrix@googlegroups.com
> To unsubscribe and other options, go to http://www.aroma-project.org/forum/
>
> --
> When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
> version of the package, 2) to report the output of sessionInfo() and 
> traceback(), and 3) to post a complete code example.
>
>
> You received this message because you are subscribed to the Google Groups 
> "aroma.affymetrix" group with website http://www.aroma-project.org/.
> To post to this group, send email to aroma-affymetrix@googlegroups.com
> To unsubscribe and other options, go to http://www.aroma-project.org/forum/

-- 
When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
version of the package, 2) to report the output of sessionInfo() and 
traceback(), and 3) to post a complete code example.


You received this message because you are subscribed to the Google Groups 
"aroma.affymetrix" group with website http://www.aroma-project.org/.
To post to this group, send email to aroma-affymetrix@googlegroups.com
To unsubscribe and other options, go to http://www.aroma-project.org/forum/

Re: [aroma.affymetrix] Combining data from multiple chip types

Reply via email to