Re: [aroma.affymetrix] Speeding up RmaBackgroundCorrection

Damian Plichta Fri, 07 Mar 2014 08:06:44 -0800

Hi Henrik,

Lowering memory helped - it's drinks on me when we meet.


It has been running for approximately 7 days now ("ETA for unit type 
'expression': 20140320 23:20:26"). With the updated affxparser can I speed 
it up by increasing the memory burden? The current memory allocation is 
approximately 3 Gbytes. I don't want to cancel the run if it means loosing 
the progress though.

Best,
Damian

On Tuesday, March 4, 2014 8:32:17 PM UTC-5, Henrik Bengtsson wrote:
>
> Did lowering "memory/ram" solve your problem? 
>
> Also, an updated version of affxparser that no longer should overflow 
> by the integer multiplication is available (on Bioconductor). 
>
> Cheers, 
>
> Henrik 
>
> On Thu, Feb 27, 2014 at 12:36 PM, Henrik Bengtsson 
> <henrik.b...@aroma-project.org <javascript:>> wrote: 
> > Congratulations Damian, 
> > 
> > I think your the first one to hit a limit of the Aroma Framework 
> > (remind me to by you a drink whenever you see me in person). 
> > 
> > I narrowed it down to the affxparser(*) package and I'll investigate 
> > further on how to fix this.  It should not occur and I'm confident 
> > that it can be avoided internally.  In the meanwhile, try to lower 
> > your 'memory/ram' setting, e.g. setOption(aromaSettings, "memory/ram", 
> > 10.0) or less.  I'm not 100% sure it'll help, but if it does, that's a 
> > good clue (for me) on what's causing it. 
> > 
> > /Henrik 
> > 
> > DETAILS: The below illustrates the issue in affxparser::readCelUnits(): 
> > 
> >> .Machine$integer.max 
> > [1] 2147483647 
> >> nbrOfArrays <- 5622L 
> >> .Machine$integer.max / nbrOfArrays 
> > [1] 381978.6 
> >> nbrOfCells <- 381978L 
> >> nbrOfCells * nbrOfArrays 
> > [1] 2147480316 
> >> nbrOfCells <- 381979L 
> >> nbrOfCells * nbrOfArrays 
> > [1] NA 
> > Warning message: 
> > In nbrOfCells * nbrOfArrays : NAs produced by integer overflow 
> > 
> > By decreasing 'memory/ram' I *hope* that 'nbrOfCells' effectively 
> > becomes smaller. 
> > 
> > 
> > On Wed, Feb 26, 2014 at 9:15 PM, Damian Plichta 
> > <damian....@gmail.com <javascript:>> wrote: 
> >> Hi Henrik, 
> >> 
> >> Thank you, that was helpful. 
> >> 
> >> I run to another problem though. I am trying to perform 
> ExonRmaPlm(csQN, 
> >> merge=TRUE) but this produces a following error: 
> >> 
> >> 20140226 23:25:33|       Identifying CDF cell indices...done 
> >> Error in vector("double", nbrOfCells * nbrOfArrays) : 
> >>   vector size cannot be NA 
> >> In addition: Warning message: 
> >> In nbrOfCells * nbrOfArrays : NAs produced by integer overflow 
> >> 20140226 23:28:35|      Reading probe intensities from 5622 
> arrays...done 
> >> 20140226 23:28:35|     Fitting chunk #1 of 1 of 'expression' units 
> (code=1) 
> >> with various dimensions...done 
> >> 20140226 23:28:35|    Unit dimension #3 (various dimensions) of 
> 3...done 
> >> 20140226 23:28:35|   Fitting the model by unit dimensions (at least for 
> the 
> >> large classes)...done 
> >> 20140226 23:28:35|  Unit type #1 ('expression') of 1...done 
> >> 20140226 23:28:35| Fitting ExonRmaPlm for each unit type 
> separately...done 
> >> 20140226 23:28:35|Fitting model of class ExonRmaPlm...done 
> >> 
> >> I testes whether it worked anyway, but the expression is zero across 
> all 
> >> arrays when I access it. 
> >> 
> >> Do you know what could be causing the problem? 
> >> 
> >> Best, 
> >> Damian 
> >> 
> >> 
> >> The code I run is below: 
> >> 
> >> library(aroma.affymetrix) 
> >> 
> >> library(aroma.core) 
> >> 
> >> setOption(aromaSettings, "memory/ram", 500.0); 
> >> 
> >> verbose <- Arguments$getVerbose(-8, timestamp=TRUE) 
> >> 
> >> chipType <- "HuEx-1_0-st-v2-core" 
> >> 
> >> cdf <- AffymetrixCdfFile$byChipType(chipType) 
> >> 
> >> #print(cdf) 
> >> 
> >> cs <- AffymetrixCelSet$byName("experiment1", cdf=cdf) 
> >> 
> >> bc <- RmaBackgroundCorrection(cs) 
> >> 
> >> csBC <- process(bc,verbose=verbose) 
> >> 
> >> qn <- QuantileNormalization(csBC, typesToUpdate="pm") 
> >> 
> >> target <- getTargetDistribution(qn, verbose=verbose) 
> >> 
> >> qn <- QuantileNormalization(csBC, typesToUpdate="pm", 
> >> targetDistribution=target) 
> >> 
> >> csQN <- process(qn, verbose=verbose) 
> >> 
> >> csPLM <- ExonRmaPlm(csQN, mergeGroups=TRUE) 
> >> 
> >> fit(csPLM, verbose=verbose) 
> >> 
> >> date() 
> >> 
> >> ces <- getChipEffectSet(csPLM) 
> >> 
> >> gExprs <- extractDataFrame(ces, units=1:3, addNames=TRUE) 
> >> 
> >> 
> >>> sessionInfo() 
> >> R version 3.0.2 (2013-09-25) 
> >> Platform: x86_64-unknown-linux-gnu (64-bit) 
> >> 
> >> locale: 
> >>  [1] LC_CTYPE=C                 LC_NUMERIC=C 
> >>  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8 
> >>  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8 
> >>  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C 
> >>  [9] LC_ADDRESS=C               LC_TELEPHONE=C 
> >> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C 
> >> 
> >> attached base packages: 
> >> [1] stats     graphics  grDevices utils     datasets  methods   base 
> >> 
> >> other attached packages: 
> >>  [1] preprocessCore_1.23.0   aroma.light_1.31.8      matrixStats_0.8.14 
> >>  [4] aroma.affymetrix_2.11.1 aroma.core_2.11.0       R.devices_2.8.2 
> >>  [7] R.filesets_2.3.0        R.utils_1.29.8          R.oo_1.17.0 
> >> [10] affxparser_1.34.0       R.methodsS3_1.6.1 
> >> 
> >> loaded via a namespace (and not attached): 
> >> [1] aroma.apd_0.4.0 base64enc_0.1-1 digest_0.6.4    DNAcopy_1.35.1 
> >> [5] PSCBS_0.40.4    R.cache_0.9.2   R.huge_0.6.0    R.rsp_0.9.28 
> >> [9] tools_3.0.2 
> >> 
> >> On Thursday, February 20, 2014 1:21:25 PM UTC-5, Henrik Bengtsson 
> wrote: 
> >>> 
> >>> On Tue, Feb 18, 2014 at 7:30 PM, Damian Plichta 
> >>> <damian....@gmail.com> wrote: 
> >>> > Thanks, that helped a lot. It took me less than 3 hours to perform 
> the 
> >>> > background correction. 
> >>> > 
> >>> > Now I'm wondering if for the next step, quantile normalization, I 
> could 
> >>> > do a 
> >>> > similar trick. Is there a way to precompute the target empirical 
> >>> > distribution based on all arrays and then do the normalization on 
> chunks 
> >>> > of 
> >>> > data (thus in an independent manner)? I can see the option 
> >>> > targetDistribution under QuantileNormalization. 
> >>> 
> >>> # Calculate the target distribution based on *all* arrays [not 
> >>> parallalized] 
> >>> qn <- QuantileNormalization(dsC, typesToUpdate="pm") 
> >>> target <- getTargetDistribution(qn, verbose=verbose) 
> >>> 
> >>> # Normalize array by array toward the same target distribution [in 
> chucks] 
> >>> dsCs <- extract(dsC, 1:100) 
> >>> qn <- QuantileNormalization(dsCs, typesToUpdate="pm", 
> >>> targetDistribution=target) 
> >>> csNs <- process(qn, verbose=verbose) 
> >>> 
> >>> Hope this helps 
> >>> 
> >>> /Henrik 
> >>> 
> >>> > 
> >>> > Kind regards, 
> >>> > 
> >>> > Damian Plichta 
> >>> > 
> >>> > On Monday, February 17, 2014 4:03:54 PM UTC-5, Henrik Bengtsson 
> wrote: 
> >>> >> 
> >>> >> Hi. 
> >>> >> 
> >>> >> On Sun, Feb 16, 2014 at 6:53 PM, Damian Plichta 
> >>> >> <damian....@gmail.com> wrote: 
> >>> >> > Hi, 
> >>> >> > 
> >>> >> > I'm processing around 5500 affymetrix exon arrays. The 
> >>> >> > RmaBackgroundCorrection() is pretty slow, 1-2 minutes/array. I 
> played 
> >>> >> > with 
> >>> >> > setOption(aromaSettings, "memory/ram", X) and increased X up to 
> 100 
> >>> >> > but 
> >>> >> > it 
> >>> >> > didn't have any effect on this stage of analysis. 
> >>> >> 
> >>> >> If you don't notice any difference in processing time by changing 
> >>> >> "memory/ram" from the default (1.0) to 100, then the memory is not 
> >>> >> your bottleneck. 
> >>> >> > 
> >>> >> > Any way to speed the process up? 
> >>> >> 
> >>> >> If you haven't already, make sure to read "How to: Improve 
> processing 
> >>> >> time": 
> >>> >> 
> >>> >>   http://aroma-project.org/howtos/ImproveProcessingTime 
> >>> >> 
> >>> >> If you have access to multiple machines on the same file system, 
> you 
> >>> >> can do poor mans parallel processing for the *background 
> correction*, 
> >>> >> because each array is corrected independently of the others.  You 
> can 
> >>> >> do this by processing a subset of arrays per computer, e.g. 
> >>> >> 
> >>> >> dsR <- AffymetrixCelSet$byName("MyDataSet", 
> chipType="HuEx-1_0-st-v2") 
> >>> >> dsR <- extract(dsR, 1:100) 
> >>> >> bg <- RmaBackgroundCorrection(dsS) 
> >>> >> dsC <- process(bg, verbose=verbose) 
> >>> >> 
> >>> >> Repeat on another machine with 101:200, and so on. 
> >>> >> 
> >>> >> When all arrays have been background corrected, you can move back 
> to 
> >>> >> your original script - all arrays background corrected are already 
> >>> >> saved to file and will therefore not be redone. 
> >>> >> 
> >>> >> /Henrik 
> >>> >> 
> >>> >> > 
> >>> >> > Kind regards, 
> >>> >> > 
> >>> >> > Damian Plichta 
> >>> >> > 
> >>> >> > -- 
> >>> >> > -- 
> >>> >> > When reporting problems on aroma.affymetrix, make sure 1) to run 
> the 
> >>> >> > latest 
> >>> >> > version of the package, 2) to report the output of sessionInfo() 
> and 
> >>> >> > traceback(), and 3) to post a complete code example. 
> >>> >> > 
> >>> >> > 
> >>> >> > You received this message because you are subscribed to the 
> Google 
> >>> >> > Groups 
> >>> >> > "aroma.affymetrix" group with website 
> http://www.aroma-project.org/. 
> >>> >> > To post to this group, send email to aroma-af...@googlegroups.com 
> >>> >> > To unsubscribe and other options, go to 
> >>> >> > http://www.aroma-project.org/forum/ 
> >>> >> > 
> >>> >> > --- 
> >>> >> > You received this message because you are subscribed to the 
> Google 
> >>> >> > Groups 
> >>> >> > "aroma.affymetrix" group. 
> >>> >> > To unsubscribe from this group and stop receiving emails from it, 
> >>> >> > send 
> >>> >> > an 
> >>> >> > email to aroma-affymetr...@googlegroups.com. 
> >>> >> > For more options, visit https://groups.google.com/groups/opt_out. 
>
> >>> > 
> >>> > -- 
> >>> > -- 
> >>> > When reporting problems on aroma.affymetrix, make sure 1) to run the 
> >>> > latest 
> >>> > version of the package, 2) to report the output of sessionInfo() and 
> >>> > traceback(), and 3) to post a complete code example. 
> >>> > 
> >>> > 
> >>> > You received this message because you are subscribed to the Google 
> >>> > Groups 
> >>> > "aroma.affymetrix" group with website http://www.aroma-project.org/. 
>
> >>> > To post to this group, send email to aroma-af...@googlegroups.com 
> >>> > To unsubscribe and other options, go to 
> >>> > http://www.aroma-project.org/forum/ 
> >>> > 
> >>> > --- 
> >>> > You received this message because you are subscribed to the Google 
> >>> > Groups 
> >>> > "aroma.affymetrix" group. 
> >>> > To unsubscribe from this group and stop receiving emails from it, 
> send 
> >>> > an 
> >>> > email to aroma-affymetr...@googlegroups.com. 
> >>> > For more options, visit https://groups.google.com/groups/opt_out. 
> >> 
> >> -- 
> >> -- 
> >> When reporting problems on aroma.affymetrix, make sure 1) to run the 
> latest 
> >> version of the package, 2) to report the output of sessionInfo() and 
> >> traceback(), and 3) to post a complete code example. 
> >> 
> >> 
> >> You received this message because you are subscribed to the Google 
> Groups 
> >> "aroma.affymetrix" group with website http://www.aroma-project.org/. 
> >> To post to this group, send email to 
> >> aroma-af...@googlegroups.com<javascript:> 
> >> To unsubscribe and other options, go to 
> http://www.aroma-project.org/forum/ 
> >> 
> >> --- 
> >> You received this message because you are subscribed to the Google 
> Groups 
> >> "aroma.affymetrix" group. 
> >> To unsubscribe from this group and stop receiving emails from it, send 
> an 
> >> email to aroma-affymetr...@googlegroups.com <javascript:>. 
> >> For more options, visit https://groups.google.com/groups/opt_out. 
>

-- 
-- 
When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
version of the package, 2) to report the output of sessionInfo() and 
traceback(), and 3) to post a complete code example.


You received this message because you are subscribed to the Google Groups 
"aroma.affymetrix" group with website http://www.aroma-project.org/.
To post to this group, send email to aroma-affymetrix@googlegroups.com
To unsubscribe and other options, go to http://www.aroma-project.org/forum/

--- 
You received this message because you are subscribed to the Google Groups 
"aroma.affymetrix" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to aroma-affymetrix+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: [aroma.affymetrix] Speeding up RmaBackgroundCorrection

Reply via email to