On Mon, Mar 3, 2014 at 10:46 PM, Andrew Smith <andrewksm...@gmail.com> wrote: > I would also like to run segmentation in parallel, but how do you do this if > you are using a separate reference set.
It depends what type of reference set; if it's for a tumor-normal pair analysis nothing needs to be done, but if it's a pool of arrays for which the average should be calculated and used as a reference for all arrays, you need to calculate this average *before* launching any parallel processing, e.g. dfR <- getAverageFile(reference) cbs <- CbsModel(cesN, dfR) > > This is what I am running in a single process (that I would like to > parallelize): > > cbs <- CbsModel(cesN, reference) > ce <- ChromosomeExplorer(cbs) > process(ce, chromosomes = 1:23, zooms = c(1, 16)) > > Could I do the same as your previous response, i.e. have each processor > extract out a subset of the arrays and process them > with the above, then run the above on a single CPU over all arrays to get > the final results (which should run fast since all the > computationally intensive intermediate results will have been memoized on > disk)? Yes, then you can simply do: process(ce, arrays=arraysForOneComputeNode, chromosomes = 1:23, zooms = c(1, 16)) changing arraysForOneComputeNode for each compute node. This is actually ignoring the fact that each of the nodes will compete updating a few common files, but it should be alright. You can always refresh those at the end by running: updateSetupExplorerFile(ce) writeRegions(ce, chromosomes=1:23) at the very end. Hope this helps, /Henrik > > Overall, I am running the first 4 steps of the CRMAv2 vignette > (http://www.aroma-project.org/vignettes/CRMAv2) and then the above > for segmentation. I want to parallelize this, and my strategy is to have > each processor extract a subset of the arrays and run this on the subset, > then run it again on a single CPU for all the arrays. And I'm doing the > parallelization using the multicore package's mclapply. > > Does this seem okay or would you recommend any changes? > > thanks, > Andrew Smith > > On Friday, December 3, 2010 1:18:23 AM UTC-5, Henrik Bengtsson wrote: >> >> Hi. >> >> On Thu, Dec 2, 2010 at 10:24 AM, Kai <wang...@gmail.com> wrote: >> > Hi Henrik, >> > >> > I was trying to run segmentation on a large set of ~300 SNP genotyping >> > array profiles. Currently I am loading all the profiles into aroma and >> > run the segmenter on them one by one, which takes a really long time. >> > I was wondering whether there is a way to break the computation into >> > parts and run all the parts simultaneously. Specifically: >> > >> > 1) Is there a way to load only a subset of profiles in a project into >> > a "AromaUnitTotalCnBinarySet"? >> >> It is true that ds <- AromaUnitTotalCnBinarySet$byName(...) will setup >> a set containing all data files. However, after that you can always >> subset using extract(), e.g. ds <- extract(ds, 1:5); >> >> > 2) Is there a way to run segmentation on single, or a subset of >> > profiles in a "AromaUnitTotalCnBinarySet"? >> >> Yes, either by subsetting as above already from start or simply by >> specifying the 'arrays' argument to fit()/process() [below]. Note >> that the latter is a more convenient approach for various reasons. >> First, if the reference used to calculate CN ratios is the robust >> average across all samples, then you do not have to worry about >> getting it correct. (This is not a problem in your particular case >> because you use that cbs$.calculateRatios <- FALSE feature). Second, >> the final ChromosomeExplorer HTML page correctly list all samples. If >> you subset immediately after setting up the data set (as above), then >> you basically have to make sure to process() one ChromosomeExplorer on >> all arrays. >> >> > 3) Assuming that one can generate the CBS segmentation model on single >> > profiles, is there a way to load them together back in to aroma and >> > pass to the "ChromosomeExplorer"? >> >> Yes. The simplest way is to simply run what you are doing below ones >> at the very end. Already segmented and process samples will be >> skipped, also already generated image files etc. >> >> > >> > The current process I have (which performs segmentation one sample at >> > a time) is implemented as follows: >> > >> > # segment by CBS model >> > ds = AromaUnitTotalCnBinarySet >> > $byName("dataset,paired",chipType="HumanOmni1-Quad"); >> > >> > cbs = CbsModel(ds); >> > cbs$.calculateRatios = FALSE; >> > >> > fit(cbs, chromosomes=c(1:23), min.width=5, undo.splits="sdundo", >> > undo.SD=1, verbose=2); >> > >> > # display data and segmentation in ChromosomeExplorer >> > ce = ChromosomeExplorer(cbs); >> > process(ce,chromosomes=c(1:23)); >> > >> > Any other suggestion you may provide is also highly appreciated. Thank >> > you very much. >> >> library("aroma.core"); >> verbose <- Arguments$getVerbose(-4, timestamp=TRUE); >> >> # Arrays that your host should process >> arrays <- 5:8; # Change for different hosts >> >> # Setup the complete data set >> ds <- >> AromaUnitTotalCnBinarySet$byName("dataset,paired",chipType="HumanOmni1-Quad"); >> >> # Setup the "complete" segmentation model >> cbs <- CbsModel(ds, min.width=5, undo.splits="sdundo", undo.SD=1); >> cbs$.calculateRatios <- FALSE; >> >> # Fit a subset of the arrays >> fit(cbs, arrays=arrays, chromosomes=c(1:23), verbose=verbose); >> >> # Setup the "complete" ChromosomeExplorer, ... >> ce <- ChromosomeExplorer(cbs); >> >> # ...but generate PNG image files only for a subset >> process(ce, arrays=arrays, chromosomes=c(1:23), verbose=verbose); >> >> Note that you actually do not have to call fit() explicitly, because >> process() will do it implicitly (with the same arguments). >> >> If you want to call the above script from the command line, then >> replace the arrays <- ... line with >> >> arrays <- commandArgs(asValue=TRUE)$arrays; >> arrays <- eval(parse(text=arrays)); >> arrays <- Arguments$getIndices(arrays); >> >> and then you can run the above by: >> >> R --args --arrays=1:5 < script.R >> >> Hope this helps >> >> Henrik >> >> > >> > Best, >> > Kai >> > >> > -- >> > When reporting problems on aroma.affymetrix, make sure 1) to run the >> > latest version of the package, 2) to report the output of sessionInfo() and >> > traceback(), and 3) to post a complete code example. >> > >> > >> > You received this message because you are subscribed to the Google >> > Groups "aroma.affymetrix" group with website http://www.aroma-project.org/. >> > To post to this group, send email to aroma-af...@googlegroups.com >> >> > To unsubscribe and other options, go to >> > http://www.aroma-project.org/forum/ >> > -- -- When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google Groups "aroma.affymetrix" group with website http://www.aroma-project.org/. To post to this group, send email to aroma-affymetrix@googlegroups.com To unsubscribe and other options, go to http://www.aroma-project.org/forum/ --- You received this message because you are subscribed to the Google Groups "aroma.affymetrix" group. To unsubscribe from this group and stop receiving emails from it, send an email to aroma-affymetrix+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.