Make Result an immutable: immutable Result pvalue::Float64 i::Int64 j::Int64 end
that way the numbers are stored in the S array directly and all its memory is preallocated. Otherwise S just holds pointers to the Results objects. gc time is only 2% for me then (but I didn't run it for more than 10min). However, why the gc time increases in the original code, I don't know. I don't think it should. On Tue, 2015-04-07 at 16:26, Adam Labadorf <[email protected]> wrote: > I moved compute! out of the main function and pass the ratios as you > suggested and the performance is a bit better but I still notice the gc > time increasing albeit more slowly, new code below. The csv I'm using is > available at > https://www.dropbox.com/s/uqeisg1vx027gjc/all_mRNA_nonzero_norm_counts_trim.csv?dl=0 > > using HypothesisTests, ArrayViews > > type Result > pvalue::Float64 > i::Int64 > j::Int64 > end > > function readtable(fn) > fp = readcsv(fn) > columns = fp[1,2:end] > rows = fp[2:end,1] > data = float(fp[2:end,2:end]) > return (columns,rows,data) > end > @time (cols, genes, counts) = > readtable("../all_mRNA_nonzero_norm_counts_trim.csv") > > h_cols = find([c[1] == 'H' for c in cols]) > c_cols = find([c[1] == 'C' for c in cols]) > > # restrict to HD and control > # add a pseudocount to avoid division by zero errors > counts = counts[:,[h_cols;c_cols]] + 0.01 > > h_cols = 1:length(h_cols) > c_cols = (length(h_cols)+1):size(counts,2) > > # arrays are stored in column order, transpose counts to make > # accessing more efficient > counts = transpose(counts) > > function > compute!(S::Array{Result,1},ratios::Array{Float64,1},tot_i::Int64,i::Int64,j::Int64,h_cols::UnitRange{Int64},c_cols::UnitRange{Int64},M::Int64) > t = UnequalVarianceTTest(view(ratios,h_cols),view(ratios,c_cols)) > S[tot_i] = Result(pvalue(t),i,j) > end > > function main(counts,genes,h_cols,c_cols) > > N = size(genes,1) > M = size(counts,1) > > ratios = Array(Float64,M) > > tot_i = 0 > tot = (N^2-N)/2 > > S = Array(Result,round(Int,tot)) > > for i=1:N-1 > @time for j=(i+1):N > tot_i += 1 > > # use a sigmoid function to compress log ratios to [-10,10] > b = 10 > for k=1:M > ratios[k] = b*(2/(1+exp(-log2(counts[k,i]/counts[k,j])/(b/2)))-1) > end > > compute!(S,ratios,tot_i,i,j,h_cols,c_cols,M) > end > @show (tot_i,tot,tot_i/tot) > end > > end > > S = main(counts,genes,h_cols,c_cols) > > > > On Tuesday, April 7, 2015 at 8:47:06 AM UTC-4, Mauro wrote: >> >> >> On Tue, 2015-04-07 at 06:14, Adam Labadorf <[email protected] >> <javascript:>> wrote: >> > Thanks for the replies. I took your suggestions (and reread the scope >> > section of the docs) and am still experiencing the gc creep. Below is >> the >> > complete program, with the notable changes that I wrapped the main >> > computation in a function and eliminated all references to global >> variables >> > inside. I'm also using the most recent nightly build of 0.4. Overall >> this >> > version of the code is much faster, but there is still significant >> slowdown >> > as the computation progresses. Is this expected? Do you see anything I'm >> > doing wrong? >> > >> > # julia v0.4.0-dev+4159 >> > using HypothesisTests, ArrayViews >> > >> > type Result >> > pvalue::Float64 >> > i::Int64 >> > j::Int64 >> > end >> > >> > function readtable(fn) >> > fp = readcsv(fn) >> > columns = fp[1,2:end] >> > rows = fp[:,2:end] >> > data = float(fp[2:end,2:end]) >> > return (columns,rows,data) >> > end >> > @time (cols, genes, counts) = >> > readtable("../all_mRNA_nonzero_norm_counts_trim.csv") >> > >> > h_cols = find([c[1] == 'H' for c in cols]) >> > c_cols = find([c[1] == 'C' for c in cols]) >> > >> > # filter out genes with zeros since it messes with the ratios >> > nonzero = mapslices(x -> !any(x.==0),counts,2) >> > counts = counts[find(nonzero),[h_cols;c_cols]] >> > >> > # slices seem to be faster >> > h_cols = 1:length(h_cols) >> > c_cols = (length(h_cols)+1):size(counts,2) >> > >> > # arrays are stored in column order, transpose counts to make >> > # accessing more efficient >> > counts = transpose(counts) >> > >> > genes = genes[find(nonzero)] >> > >> > function main(counts,genes,h_cols,c_cols) >> > >> > N = size(genes,1) >> > M = size(counts,1) >> > >> > ratios = Array(Float64,M) >> > function >> > >> compute!(S::Array{Result,1},counts::Array{Float64,2},tot_i::Int64,i::Int64,j::Int64,h_cols::UnitRange{Int64},c_cols::UnitRange{Int64},M::Int64) >> >> >> > for k=1:M >> > ratios[k] = counts[k,i]/counts[k,j] >> > end >> > t = UnequalVarianceTTest(view(ratios,h_cols),view(ratios,c_cols)) >> > S[tot_i] = Result(pvalue(t),i,j) >> > end >> >> Sadly, nested function are often bad as type-inference does not work >> properly, as Tim suggested. Consider this example: >> >> function a(n) >> aa(x,y) = x*y >> out = 0 >> for i=1:n >> out += aa(i,i) >> end >> out >> end >> >> bb(x,y) = x*y >> function b(n) >> out = 0 >> for i=1:n >> out += bb(i,i) >> end >> out >> end >> n = 10^7 >> @time a(n) >> @time a(n) # elapsed time: 0.680312065 seconds (1220 MB allocated, 2.71% >> gc time in 56 pauses with 0 full sweep) >> b(n) >> @time b(n) # elapsed time: 3.086e-6 seconds (192 bytes allocated) >> >> @code_warntype a(n) # see how the return type of the function is not >> inferred! >> @code_warntype b(n) >> >> Move compute! out of main and it should be better. >> >> >> > tot_i = 0 >> > tot = (N^2-N)/2 >> > >> > S = Array(Result,round(Int,tot)) >> > >> > for i=1:N-1 >> > @time for j=(i+1):N >> > tot_i += 1 >> > compute!(S,counts,tot_i,i,j,h_cols,c_cols,M) >> > end >> > end >> > >> > end >> > >> > S = main(counts,genes,h_cols,c_cols) >> > >> > >> > And the output: >> > >> > elapsed time: 0.427719149 seconds (23 MB allocated, 39.90% gc time in 2 >> > pauses with 0 full sweep) >> > elapsed time: 0.031006382 seconds (14 MB allocated) >> > elapsed time: 0.131579099 seconds (14 MB allocated, 73.64% gc time in 1 >> > pauses with 1 full sweep) >> > elapsed time: 0.140120717 seconds (14 MB allocated, 73.58% gc time in 1 >> > pauses with 0 full sweep) >> > elapsed time: 0.030248237 seconds (14 MB allocated) >> > ... >> > elapsed time: 0.507894781 seconds (5 MB allocated, 97.65% gc time in 1 >> > pauses with 0 full sweep) >> > elapsed time: 0.011821657 seconds (5 MB allocated) >> > elapsed time: 0.011610651 seconds (5 MB allocated) >> > elapsed time: 0.011816277 seconds (5 MB allocated) >> > elapsed time: 0.50779098 seconds (5 MB allocated, 97.65% gc time in 1 >> > pauses with 0 full sweep) >> > elapsed time: 0.011997168 seconds (5 MB allocated) >> > elapsed time: 0.011721667 seconds (5 MB allocated) >> > elapsed time: 0.011561071 seconds (5 MB allocated) >> >> This looks ok-ish to me. The program runs, allocates memory and every >> so often the memory is garbage collected. What is not ok that the gc >> runs after only 20MB is allocated and that it takes so long. But at >> that point all the memory is used up, right? Maybe that is why it takes >> so long then? >> >> > On Saturday, April 4, 2015 at 12:38:46 PM UTC-4, Patrick O'Leary wrote: >> >> >> >> Silly me, ignoring all the commented out lines assuming they were >> >> comments...yes, this is almost certainly it. >> >> >> >> On Saturday, April 4, 2015 at 3:24:50 AM UTC-5, Tim Holy wrote: >> >>> >> >>> Devectorization should never slow anything down. If it does, then you >> >>> have >> >>> some other problem. Here, M is a global variable, and that's probably >> >>> what's >> >>> killing you. Performance tip #1: >> >>> http://docs.julialang.org/en/latest/manual/performance-tips/ >> >>> >> >>> --Tim >> >>> >> >>> On Friday, April 03, 2015 09:43:51 AM Adam Labadorf wrote: >> >>> > Hi, >> >>> > >> >>> > I am struggling with an issue related to garbage collection taking >> up >> >>> the >> >>> > vast majority (>99%) of compute time on a simple nested for loop. >> Code >> >>> > excerpt below: >> >>> > >> >>> > # julia version 0.3.7 >> >>> > # counts is an MxN matrix of Float64 >> >>> > # N=15000 >> >>> > # M=108 >> >>> > # h_cols and c_cols are indices \in {1:M} >> >>> > using HypothesisTests, ArrayViews >> >>> > ratios = Array(Float64,M) >> >>> > function compute!(S,tot_i::Int64,i::Int64,j::Int64) >> >>> > ratios = view(counts,:,i)./view(counts,:,j) >> >>> > #for k=1:M >> >>> > # ratios[k] = counts[k,i]/counts[k,j] >> >>> > #end >> >>> > #ratios = counts[:,i]./counts[:,j] >> >>> > t = UnequalVarianceTTest(ratios[h_cols],ratios[c_cols]) >> >>> > S[tot_i] = (pvalue(t),i,j) >> >>> > end >> >>> > >> >>> > for i=1:N-1 >> >>> > @time for j=(i+1):N >> >>> > tot_i += 1 >> >>> > compute!(S,tot_i,i,j) >> >>> > end >> >>> > end >> >>> > >> >>> > The loop begins fast, output from time: >> >>> > >> >>> > elapsed time: 1.023850054 seconds (62027220 bytes allocated) >> >>> > elapsed time: 0.170916977 seconds (45785624 bytes allocated) >> >>> > elapsed time: 0.171598156 seconds (45782760 bytes allocated) >> >>> > elapsed time: 0.173866309 seconds (45779896 bytes allocated) >> >>> > elapsed time: 0.170267172 seconds (45777032 bytes allocated) >> >>> > elapsed time: 0.171754713 seconds (45774168 bytes allocated) >> >>> > elapsed time: 0.170110142 seconds (45771304 bytes allocated) >> >>> > elapsed time: 0.175199053 seconds (45768440 bytes allocated) >> >>> > elapsed time: 0.179893161 seconds (45765576 bytes allocated) >> >>> > elapsed time: 0.212172824 seconds (45762712 bytes allocated) >> >>> > elapsed time: 0.252750549 seconds (45759848 bytes allocated) >> >>> > elapsed time: 0.254874855 seconds (45756984 bytes allocated) >> >>> > elapsed time: 0.231003319 seconds (45754120 bytes allocated) >> >>> > elapsed time: 0.235060195 seconds (45751256 bytes allocated) >> >>> > elapsed time: 0.235379355 seconds (45748392 bytes allocated) >> >>> > elapsed time: 0.927622743 seconds (45746168 bytes allocated, 77.65% >> gc >> >>> time) >> >>> > elapsed time: 0.9132931 seconds (45742664 bytes allocated, 78.35% gc >> >>> time) >> >>> > >> >>> > But as soon as it starts doing gc the % time spent in increases >> almost >> >>> > indefinitely, output from time much later: >> >>> > >> >>> > elapsed time: 0.174122929 seconds (36239160 bytes allocated) >> >>> > elapsed time: 18.535572658 seconds (36236168 bytes allocated, 99.22% >> gc >> >>> > time) >> >>> > elapsed time: 19.189478819 seconds (36233176 bytes allocated, 99.26% >> gc >> >>> > time) >> >>> > elapsed time: 21.812889439 seconds (36230184 bytes allocated, 99.35% >> gc >> >>> > time) >> >>> > elapsed time: 22.182467723 seconds (36227192 bytes allocated, 99.30% >> gc >> >>> > time) >> >>> > elapsed time: 0.169849999 seconds (36224200 bytes allocated) >> >>> > >> >>> > The inner loop, despite iterating over fewer and fewer indices has >> >>> > massively increased the gc, and therefore overall, execution time. I >> >>> have >> >>> > tried many things, including creating the compute function, >> >>> devectorizing >> >>> > the ratios calculation (which really slowed things down), using view >> >>> and >> >>> > sub in various places, profiling with --trace-allocation=all but I >> >>> can't >> >>> > figure out why this happens or how to fix it. Doing gc for 22 >> seconds >> >>> > obviously kills the performance, and since there are about 22M >> >>> iterations >> >>> > this is prohibitive. Can anyone suggest something I can try to >> improve >> >>> the >> >>> > performance here? >> >>> > >> >>> > Thanks, >> >>> > Adam >> >>> >> >>> >> >>
