Re: [aroma.affymetrix] Combining data from multiple chip types

Henrik Bengtsson Fri, 02 Dec 2011 14:55:09 -0800

Thank you.  Much appreciated.  /Henrik

On Fri, Dec 2, 2011 at 1:23 PM, Steven McKinney <smckin...@bccrc.ca> wrote:
> This indeed does help get me started.
>
> I have one follow-up question:
> How do I nominate you for Sainthood?
>
> Thanks very much for your excellent package
> and valuable guidance.
>
>
> Steven McKinney
>
>
>> -----Original Message-----
>> From: aroma-affymetrix@googlegroups.com [mailto:aroma-
>> affymet...@googlegroups.com] On Behalf Of Henrik Bengtsson
>> Sent: December-02-11 12:49 PM
>> To: aroma-affymetrix@googlegroups.com
>> Subject: Re: [aroma.affymetrix] Combining data from multiple chip types
>>
>> Hi.
>>
>> Yes, the Aroma framework can handle this.
>>
>> On Fri, Dec 2, 2011 at 12:19 PM, Steven McKinney <smckin...@bccrc.ca>
>> wrote:
>> > Hi all,
>> >
>> > I am running an analysis on Affymetrix SNP6, 250K Nsp and 250K Sty chip
>> types.
>> > For various reasons, patient samples were assessed either on SNP6 chips
>> or
>> > on 500K chipsets (250K Nsp and 250K Sty).  To further complicate things,
>> > an occasional 250K Nsp chip processing failed, so some patients have data
>> > only on a 250K Sty chip.
>>
>> Ok, so each sample is processed on either of:
>>
>> 1. GenomeWideSNP_6
>> 2. Mapping250K_Nsp
>> 3. Mapping250K_Sty & Mapping250K_Nsp
>>
>> >
>> > I see on the web page
>> >
>> >   http://www.aroma-project.org/features
>> >
>> > the description
>> >
>> > COPY-NUMBER ANALYSIS:
>> > * Paired & non-paired copy-number analysis: All generations, i.e. 10K,
>> 100K, 500K, 5.0 & 6.0. CBS & GLAD * segmentation methods.  Combine data
>> from multiple chip types.
>> >
>> >
>> > My question is, at what point can data from multiple chip types be
>> combined?
>> >
>> > As I start my aroma.affymetrix analytic pipeline (shown below), I first
>> process the
>> > GenomeWideSNP_6 chips, then the 250K Nsp, then the 250K Sty.  Is this
>> appropriate,
>> > or is there a way to combine processing of all chip types from the start?
>> >
>> > If not from the start, at what step can I combine data?
>>
>> You can safely preprocess the different chip types independently.  For
>> simplicity, use doCRMAv2();
>>
>>   http://aroma-project.org/blocks/doCRMAv2
>>
>> Note argument 'plm'.   Also, as mention, if you are interested
>> allele-specific analysis (e.g. LOH), use doASCRMAv2() in place of
>> doCRMAv2().
>>
>> It is for at the segmentation step you need to care about merging chip
>> types.  The segmentation model classes of the Aroma framework (e.g.
>> CbsModel), will take care of the merging by simply interweaving the
>> loci/total CN estimates from multiple chip types (if such are
>> available for the sample currently being segmented).  Using
>> do[AS]CRMAv2(), you will basically get an AromaUnitTotalCnBinarySet
>> for each chip type.  If you place those in an R list, e.g.
>>
>> dsList <- list();
>> dsList[["GenomeWideSNP_6"]] <- doCRMAv2(..., chipType="GenomeWideSNP_6");
>> dsList[["Mapping250K_Nsp"]] <- doCRMAv2(...,
>> chipType="Mapping250K_Nsp", plm="RmaPlm");
>> dsList[["Mapping250K_Sty"]] <- doCRMAv2(...,
>> chipType="Mapping250K_Sty", plm="RmaPlm");
>>
>> You can simply do
>>
>> sm <- CbsModel(dsList);
>>
>> and proceed as illustrated in vignette 'Total copy-number segmentation
>> (non-paired CBS)' [http://aroma-project.org/vignettes/NonPairedCBS].
>> This idea of merging chip types, is also used in vignette 'Vignette:
>> Total copy number analysis using CRMA v1 (10K, 100K, 500K)'
>> [http://aroma-project.org/vignettes/CRMAv1].
>>
>> What you need to be careful about is how your array files are named,
>> because that is key for CbsModel to be able to identify which array
>> files map to the same sample/individual.  This is also mention in the
>> "CRMAv1" vignette.  Note that you do not physically have to rename
>> your array/CEL files.  Instead you can utilize so called full-name
>> translators, cf. how-to page 'How to: Use fullname translators to
>> rename data files'
>> [http://aroma-project.org/howtos/setFullNamesTranslator].  These can
>> be applied after doing preprocessing (e.g. CRMAv2), so you don't have
>> to worry about that until segmentation.
>>
>>
>> Potential problems: In the merging step, there is nothing specific
>> that is done to make sure that the CN estimates from the different
>> chip types to be merged are on the same scale, i.e. same observed CN
>> mean levels for the same underlying/true CN level.  It simply assumes
>> that this has been taken care of by the preprocessing method.  I'd
>> say, small discrepancies are alright because merging will still
>> increase the power to detect change points, which is the number one
>> objective of segmentation methods such as CBS.  If there are large
>> discrepancies (which I doubt you'll see), you may have to normalize CN
>> estimates to be one the same linear scale, cf. vignette 'MSCN:
>> Multi-source copy-number normalization'
>> [http://aroma-project.org/vignettes/MSCN].  As you can see in the MSCN
>> paper (Bengtsson et al. 2009; http://aroma-project.org/publications/),
>> bringing estimates on the same scale improves the power to detect
>> change points compared to not doing before merging.
>>
>> Hope this helps get you started
>>
>> Henrik
>>
>> >
>> > Any advice, or pointers to documentation on this issue of combining data
>> from multiple chip types that
>> > I have not yet found, would be appreciated.
>> >
>> > Best
>> >
>> > Steve
>> >
>> >
>> > require("aroma.affymetrix")
>> >
>> > log <- verbose <- Arguments$getVerbose(-9, timestamp=TRUE)
>> > ## Don't display too many decimals.
>> > options(digits=5)
>> >
>> > cdf <- AffymetrixCdfFile$byChipType("GenomeWideSNP_6", tags = "Full")
>> > print(cdf)
>> >
>> > gi <- getGenomeInformation(cdf)
>> > print(gi)
>> >
>> > si <- getSnpInformation(cdf)
>> > print(si)
>> >
>> > acs <- AromaCellSequenceFile$byChipType(getChipType(cdf, fullname =
>> FALSE))
>> > print(acs)
>> >
>> > csR <- AffymetrixCelSet$byName("Primary", cdf = cdf)
>> > print(csR)
>> >
>> > cs <- csR
>> >
>> > par(mar = c(4, 4, 4, 1) + 0.1)
>> > plotDensity(cs, lwd = 2, ylim = c(-0.1, 0.80))
>> > stext(side = 3, pos = 0, getFullName(cs))
>> > filename <- sprintf("%s,%s,plotDensity.pdf", getFullName(cs),
>> getChipType(cs))
>> > dev.print(pdf, file = filename, width = 7, height = 5)
>> >
>> > ### 500K
>> >
>> >
>> > cdf5N <- AffymetrixCdfFile$byChipType("Mapping250K_Nsp")
>> > print(cdf5N)
>> >
>> > gi5N <- getGenomeInformation(cdf5N)
>> > print(gi5N)
>> >
>> > si5N <- getSnpInformation(cdf5N)
>> > print(si5N)
>> >
>> > acs5N <- AromaCellSequenceFile$byChipType(getChipType(cdf5N, fullname =
>> FALSE))
>> > print(acs5N)
>> >
>> > csR5N <- AffymetrixCelSet$byName("Primary", cdf = cdf5N)
>> > print(csR5N)
>> >
>> > cs5N <- csR5N
>> >
>> > par(mar = c(4, 4, 4, 1) + 0.1)
>> > plotDensity(cs5N, lwd = 2, ylim = c(-0.1, 0.80))
>> > stext(side = 3, pos = 0, getFullName(cs5N))
>> > filename5N <- sprintf("%s,%s,plotDensity.pdf", getFullName(cs5N),
>> getChipType(cs5N))
>> > dev.print(pdf, file = filename5N, width = 7, height = 5)
>> >
>> > . etc.
>> >
>> >
>> >
>> > Steven McKinney, Ph.D.
>> >
>> > Statistician
>> > Molecular Oncology and Breast Cancer Program
>> > British Columbia Cancer Research Centre
>> >
>> > email: smckinney +at+ bccrc +dot+ ca
>> >
>> >
>> > BCCRC
>> > Molecular Oncology
>> > 675 West 10th Ave, Floor 4
>> > Vancouver B.C.
>> > V5Z 1L3
>> > Canada
>> >
>> > --
>> > When reporting problems on aroma.affymetrix, make sure 1) to run the
>> latest version of the package, 2) to report the output of sessionInfo() and
>> traceback(), and 3) to post a complete code example.
>> >
>> >
>> > You received this message because you are subscribed to the Google Groups
>> "aroma.affymetrix" group with website http://www.aroma-project.org/.
>> > To post to this group, send email to aroma-affymetrix@googlegroups.com
>> > To unsubscribe and other options, go to http://www.aroma-
>> project.org/forum/
>>
>> --
>> When reporting problems on aroma.affymetrix, make sure 1) to run the latest
>> version of the package, 2) to report the output of sessionInfo() and
>> traceback(), and 3) to post a complete code example.
>>
>>
>> You received this message because you are subscribed to the Google Groups
>> "aroma.affymetrix" group with website http://www.aroma-project.org/.
>> To post to this group, send email to aroma-affymetrix@googlegroups.com
>> To unsubscribe and other options, go to http://www.aroma-project.org/forum/
>
> --
> When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
> version of the package, 2) to report the output of sessionInfo() and 
> traceback(), and 3) to post a complete code example.
>
>
> You received this message because you are subscribed to the Google Groups 
> "aroma.affymetrix" group with website http://www.aroma-project.org/.
> To post to this group, send email to aroma-affymetrix@googlegroups.com
> To unsubscribe and other options, go to http://www.aroma-project.org/forum/


-- 
When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
version of the package, 2) to report the output of sessionInfo() and 
traceback(), and 3) to post a complete code example.


You received this message because you are subscribed to the Google Groups 
"aroma.affymetrix" group with website http://www.aroma-project.org/.
To post to this group, send email to aroma-affymetrix@googlegroups.com
To unsubscribe and other options, go to http://www.aroma-project.org/forum/

Re: [aroma.affymetrix] Combining data from multiple chip types

Reply via email to