Re: [aroma.affymetrix] parallel computing with aroma copy number segmentation

Henrik Bengtsson Tue, 04 Mar 2014 17:41:35 -0800

On Mon, Mar 3, 2014 at 10:46 PM, Andrew Smith <andrewksm...@gmail.com> wrote:
> I would also like to run segmentation in parallel, but how do you do this if
> you are using a separate reference set.


It depends what type of reference set; if it's for a tumor-normal pair
analysis nothing needs to be done, but if it's a pool of arrays for
which the average should be calculated and used as a reference for all
arrays, you need to calculate this average *before* launching any
parallel processing, e.g.

dfR <- getAverageFile(reference)
cbs <- CbsModel(cesN, dfR)

>
> This is what I am running in a single process (that I would like to
> parallelize):
>
> cbs <- CbsModel(cesN, reference)
> ce <- ChromosomeExplorer(cbs)
> process(ce, chromosomes = 1:23, zooms = c(1, 16))
>
> Could I do the same as your previous response, i.e. have each processor
> extract out a subset of the arrays and process them
> with the above, then run the above on a single CPU over all arrays to get
> the final results (which should run fast since all the
> computationally intensive intermediate results will have been memoized on
> disk)?

Yes, then you can simply do:

process(ce, arrays=arraysForOneComputeNode, chromosomes = 1:23, zooms
= c(1, 16))

changing arraysForOneComputeNode for each compute node.  This is
actually ignoring the fact that each of the nodes will compete
updating a few common files, but it should be alright.  You can always
refresh those at the end by running:

updateSetupExplorerFile(ce)
writeRegions(ce, chromosomes=1:23)

at the very end.

Hope this helps,

/Henrik

>
> Overall, I am running the first 4 steps of the CRMAv2 vignette
> (http://www.aroma-project.org/vignettes/CRMAv2) and then the above
> for segmentation. I want to parallelize this, and my strategy is to have
> each processor extract a subset of the arrays and run this on the subset,
> then run it again on a single CPU for all the arrays. And I'm doing the
> parallelization using the multicore package's mclapply.
>
> Does this seem okay or would you recommend any changes?
>
> thanks,
> Andrew Smith
>
> On Friday, December 3, 2010 1:18:23 AM UTC-5, Henrik Bengtsson wrote:
>>
>> Hi.
>>
>> On Thu, Dec 2, 2010 at 10:24 AM, Kai <wang...@gmail.com> wrote:
>> > Hi Henrik,
>> >
>> > I was trying to run segmentation on a large set of ~300 SNP genotyping
>> > array profiles. Currently I am loading all the profiles into aroma and
>> > run the segmenter on them one by one, which takes a really long time.
>> > I was wondering whether there is a way to break the computation into
>> > parts and run all the parts simultaneously. Specifically:
>> >
>> > 1) Is there a way to load only a subset of profiles in a project into
>> > a "AromaUnitTotalCnBinarySet"?
>>
>> It is true that ds <- AromaUnitTotalCnBinarySet$byName(...) will setup
>> a set containing all data files.  However, after that you can always
>> subset using extract(), e.g. ds <- extract(ds, 1:5);
>>
>> > 2) Is there a way to run segmentation on single, or a subset of
>> > profiles in a "AromaUnitTotalCnBinarySet"?
>>
>> Yes, either by subsetting as above already from start or simply by
>> specifying the 'arrays' argument to fit()/process() [below].  Note
>> that the latter is a more convenient approach for various reasons.
>> First, if the reference used to calculate CN ratios is the robust
>> average across all samples, then you do not have to worry about
>> getting it correct.  (This is not a problem in your particular case
>> because you use that cbs$.calculateRatios <- FALSE feature).  Second,
>> the final ChromosomeExplorer HTML page correctly list all samples.  If
>> you subset immediately after setting up the data set (as above), then
>> you basically have to make sure to process() one ChromosomeExplorer on
>> all arrays.
>>
>> > 3) Assuming that one can generate the CBS segmentation model on single
>> > profiles, is there a way to load them together back in to aroma and
>> > pass to the "ChromosomeExplorer"?
>>
>> Yes. The simplest way is to simply run what you are doing below ones
>> at the very end.  Already segmented and process samples will be
>> skipped, also already generated image files etc.
>>
>> >
>> > The current process I have (which performs segmentation one sample at
>> > a time) is implemented as follows:
>> >
>> > # segment by CBS model
>> > ds = AromaUnitTotalCnBinarySet
>> > $byName("dataset,paired",chipType="HumanOmni1-Quad");
>> >
>> > cbs = CbsModel(ds);
>> > cbs$.calculateRatios = FALSE;
>> >
>> > fit(cbs, chromosomes=c(1:23), min.width=5, undo.splits="sdundo",
>> > undo.SD=1, verbose=2);
>> >
>> > # display data and segmentation in ChromosomeExplorer
>> > ce = ChromosomeExplorer(cbs);
>> > process(ce,chromosomes=c(1:23));
>> >
>> > Any other suggestion you may provide is also highly appreciated. Thank
>> > you very much.
>>
>> library("aroma.core");
>> verbose <- Arguments$getVerbose(-4, timestamp=TRUE);
>>
>> # Arrays that your host should process
>> arrays <- 5:8;  # Change for different hosts
>>
>> # Setup the complete data set
>> ds <-
>> AromaUnitTotalCnBinarySet$byName("dataset,paired",chipType="HumanOmni1-Quad");
>>
>> # Setup the "complete" segmentation model
>> cbs <- CbsModel(ds, min.width=5, undo.splits="sdundo", undo.SD=1);
>> cbs$.calculateRatios <- FALSE;
>>
>> # Fit a subset of the arrays
>> fit(cbs, arrays=arrays, chromosomes=c(1:23), verbose=verbose);
>>
>> # Setup the "complete" ChromosomeExplorer, ...
>> ce <- ChromosomeExplorer(cbs);
>>
>> # ...but generate PNG image files only for a subset
>> process(ce, arrays=arrays, chromosomes=c(1:23), verbose=verbose);
>>
>> Note that you actually do not have to call fit() explicitly, because
>> process() will do it implicitly (with the same arguments).
>>
>> If you want to call the above script from the command line, then
>> replace the arrays <- ... line with
>>
>> arrays <- commandArgs(asValue=TRUE)$arrays;
>> arrays <- eval(parse(text=arrays));
>> arrays <- Arguments$getIndices(arrays);
>>
>> and then you can run the above by:
>>
>> R --args --arrays=1:5 < script.R
>>
>> Hope this helps
>>
>> Henrik
>>
>> >
>> > Best,
>> > Kai
>> >
>> > --
>> > When reporting problems on aroma.affymetrix, make sure 1) to run the
>> > latest version of the package, 2) to report the output of sessionInfo() and
>> > traceback(), and 3) to post a complete code example.
>> >
>> >
>> > You received this message because you are subscribed to the Google
>> > Groups "aroma.affymetrix" group with website http://www.aroma-project.org/.
>> > To post to this group, send email to aroma-af...@googlegroups.com
>>
>> > To unsubscribe and other options, go to
>> > http://www.aroma-project.org/forum/
>> >

-- 
-- 
When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
version of the package, 2) to report the output of sessionInfo() and 
traceback(), and 3) to post a complete code example.


You received this message because you are subscribed to the Google Groups 
"aroma.affymetrix" group with website http://www.aroma-project.org/.
To post to this group, send email to aroma-affymetrix@googlegroups.com
To unsubscribe and other options, go to http://www.aroma-project.org/forum/

--- 
You received this message because you are subscribed to the Google Groups 
"aroma.affymetrix" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to aroma-affymetrix+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Re: [aroma.affymetrix] parallel computing with aroma copy number segmentation

Reply via email to