Glad that helped!  (Also not all immutable are inlined that way, only
when isbits==true)

Maybe the gc time increased because it had to look at more and more
objects as more and more Result instances were created?

On Tue, 2015-04-07 at 18:24, Adam Labadorf <alabad...@gmail.com> wrote:
> Yes, that gives a massive improvement and I see the same gc behavior you 
> got. I could notice the gc increasing slowly after just watching for a 
> minute or two, and that seems to have stopped now. I'll know for sure when 
> I run the whole thing and will report back.
>
> Thanks so much to all of you for your swift replies, this was my first 
> julia experience and it's great to have such a supportive community.
>
> On Tuesday, April 7, 2015 at 11:40:49 AM UTC-4, Mauro wrote:
>>
>> Make Result an immutable: 
>>
>> immutable Result 
>>   pvalue::Float64 
>>   i::Int64 
>>   j::Int64 
>> end 
>>
>> that way the numbers are stored in the S array directly and all its 
>> memory is preallocated.  Otherwise S just holds pointers to the Results 
>> objects.  gc time is only 2% for me then (but I didn't run it for more 
>> than 10min). 
>>
>> However, why the gc time increases in the original code, I don't know. 
>> I don't think it should. 
>>
>> On Tue, 2015-04-07 at 16:26, Adam Labadorf <alab...@gmail.com 
>> <javascript:>> wrote: 
>> > I moved compute! out of the main function and pass the ratios as you 
>> > suggested and the performance is a bit better but I still notice the gc 
>> > time increasing albeit more slowly, new code below. The csv I'm using is 
>> > available at 
>> > 
>> https://www.dropbox.com/s/uqeisg1vx027gjc/all_mRNA_nonzero_norm_counts_trim.csv?dl=0
>>  
>> > 
>> > using HypothesisTests, ArrayViews 
>> > 
>> > type Result 
>> >   pvalue::Float64 
>> >   i::Int64 
>> >   j::Int64 
>> > end 
>> > 
>> > function readtable(fn) 
>> >   fp = readcsv(fn) 
>> >   columns = fp[1,2:end] 
>> >   rows = fp[2:end,1] 
>> >   data = float(fp[2:end,2:end]) 
>> >   return (columns,rows,data) 
>> > end 
>> > @time (cols, genes, counts) = 
>> > readtable("../all_mRNA_nonzero_norm_counts_trim.csv") 
>> > 
>> > h_cols = find([c[1] == 'H' for c in cols]) 
>> > c_cols = find([c[1] == 'C' for c in cols]) 
>> > 
>> > # restrict to HD and control 
>> > # add a pseudocount to avoid division by zero errors 
>> > counts = counts[:,[h_cols;c_cols]] + 0.01 
>> > 
>> > h_cols = 1:length(h_cols) 
>> > c_cols = (length(h_cols)+1):size(counts,2) 
>> > 
>> > # arrays are stored in column order, transpose counts to make 
>> > # accessing more efficient 
>> > counts = transpose(counts) 
>> > 
>> > function 
>> > 
>> compute!(S::Array{Result,1},ratios::Array{Float64,1},tot_i::Int64,i::Int64,j::Int64,h_cols::UnitRange{Int64},c_cols::UnitRange{Int64},M::Int64)
>>  
>>
>> >   t = UnequalVarianceTTest(view(ratios,h_cols),view(ratios,c_cols)) 
>> >   S[tot_i] = Result(pvalue(t),i,j) 
>> > end 
>> > 
>> > function main(counts,genes,h_cols,c_cols) 
>> > 
>> >   N = size(genes,1) 
>> >   M = size(counts,1) 
>> > 
>> >   ratios = Array(Float64,M) 
>> > 
>> >   tot_i = 0 
>> >   tot = (N^2-N)/2 
>> > 
>> >   S = Array(Result,round(Int,tot)) 
>> > 
>> >   for i=1:N-1 
>> >     @time for j=(i+1):N 
>> >       tot_i += 1 
>> > 
>> >       # use a sigmoid function to compress log ratios to [-10,10] 
>> >       b = 10 
>> >       for k=1:M 
>> >         ratios[k] = 
>> b*(2/(1+exp(-log2(counts[k,i]/counts[k,j])/(b/2)))-1) 
>> >       end 
>> > 
>> >       compute!(S,ratios,tot_i,i,j,h_cols,c_cols,M) 
>> >     end 
>> >     @show (tot_i,tot,tot_i/tot) 
>> >   end 
>> > 
>> > end 
>> > 
>> > S = main(counts,genes,h_cols,c_cols) 
>> > 
>> > 
>> > 
>> > On Tuesday, April 7, 2015 at 8:47:06 AM UTC-4, Mauro wrote: 
>> >> 
>> >> 
>> >> On Tue, 2015-04-07 at 06:14, Adam Labadorf <alab...@gmail.com 
>> >> <javascript:>> wrote: 
>> >> > Thanks for the replies. I took your suggestions (and reread the scope 
>> >> > section of the docs) and am still experiencing the gc creep. Below is 
>> >> the 
>> >> > complete program, with the notable changes that I wrapped the main 
>> >> > computation in a function and eliminated all references to global 
>> >> variables 
>> >> > inside. I'm also using the most recent nightly build of 0.4. Overall 
>> >> this 
>> >> > version of the code is much faster, but there is still significant 
>> >> slowdown 
>> >> > as the computation progresses. Is this expected? Do you see anything 
>> I'm 
>> >> > doing wrong? 
>> >> > 
>> >> > # julia v0.4.0-dev+4159 
>> >> > using HypothesisTests, ArrayViews 
>> >> > 
>> >> > type Result 
>> >> >   pvalue::Float64 
>> >> >   i::Int64 
>> >> >   j::Int64 
>> >> > end 
>> >> > 
>> >> > function readtable(fn) 
>> >> >   fp = readcsv(fn) 
>> >> >   columns = fp[1,2:end] 
>> >> >   rows = fp[:,2:end] 
>> >> >   data = float(fp[2:end,2:end]) 
>> >> >   return (columns,rows,data) 
>> >> > end 
>> >> > @time (cols, genes, counts) = 
>> >> > readtable("../all_mRNA_nonzero_norm_counts_trim.csv") 
>> >> > 
>> >> > h_cols = find([c[1] == 'H' for c in cols]) 
>> >> > c_cols = find([c[1] == 'C' for c in cols]) 
>> >> > 
>> >> > # filter out genes with zeros since it messes with the ratios 
>> >> > nonzero = mapslices(x -> !any(x.==0),counts,2) 
>> >> > counts = counts[find(nonzero),[h_cols;c_cols]] 
>> >> > 
>> >> > # slices seem to be faster 
>> >> > h_cols = 1:length(h_cols) 
>> >> > c_cols = (length(h_cols)+1):size(counts,2) 
>> >> > 
>> >> > # arrays are stored in column order, transpose counts to make 
>> >> > # accessing more efficient 
>> >> > counts = transpose(counts) 
>> >> > 
>> >> > genes = genes[find(nonzero)] 
>> >> > 
>> >> > function main(counts,genes,h_cols,c_cols) 
>> >> > 
>> >> >   N = size(genes,1) 
>> >> >   M = size(counts,1) 
>> >> > 
>> >> >   ratios = Array(Float64,M) 
>> >> >   function 
>> >> > 
>> >> 
>> compute!(S::Array{Result,1},counts::Array{Float64,2},tot_i::Int64,i::Int64,j::Int64,h_cols::UnitRange{Int64},c_cols::UnitRange{Int64},M::Int64)
>>  
>>
>> >> 
>> >> >       for k=1:M 
>> >> >         ratios[k] = counts[k,i]/counts[k,j] 
>> >> >       end 
>> >> >       t = 
>> UnequalVarianceTTest(view(ratios,h_cols),view(ratios,c_cols)) 
>> >> >       S[tot_i] = Result(pvalue(t),i,j) 
>> >> >   end 
>> >> 
>> >> Sadly, nested function are often bad as type-inference does not work 
>> >> properly, as Tim suggested. Consider this example: 
>> >> 
>> >> function a(n) 
>> >>     aa(x,y) = x*y 
>> >>     out = 0 
>> >>     for i=1:n 
>> >>         out += aa(i,i) 
>> >>     end 
>> >>     out 
>> >> end 
>> >> 
>> >> bb(x,y) = x*y 
>> >> function b(n) 
>> >>     out = 0 
>> >>     for i=1:n 
>> >>         out += bb(i,i) 
>> >>     end 
>> >>     out 
>> >> end 
>> >> n = 10^7 
>> >> @time a(n) 
>> >> @time a(n)  # elapsed time: 0.680312065 seconds (1220 MB allocated, 
>> 2.71% 
>> >> gc time in 56 pauses with 0 full sweep) 
>> >> b(n) 
>> >> @time b(n)  # elapsed time: 3.086e-6 seconds (192 bytes allocated) 
>> >> 
>> >> @code_warntype a(n) # see how the return type of the function is not 
>> >> inferred! 
>> >> @code_warntype b(n) 
>> >> 
>> >> Move compute! out of main and it should be better. 
>> >> 
>> >> 
>> >> >   tot_i = 0 
>> >> >   tot = (N^2-N)/2 
>> >> > 
>> >> >   S = Array(Result,round(Int,tot)) 
>> >> > 
>> >> >   for i=1:N-1 
>> >> >     @time for j=(i+1):N 
>> >> >       tot_i += 1 
>> >> >       compute!(S,counts,tot_i,i,j,h_cols,c_cols,M) 
>> >> >     end 
>> >> >   end 
>> >> > 
>> >> > end 
>> >> > 
>> >> > S = main(counts,genes,h_cols,c_cols) 
>> >> > 
>> >> > 
>> >> > And the output: 
>> >> > 
>> >> > elapsed time: 0.427719149 seconds (23 MB allocated, 39.90% gc time in 
>> 2 
>> >> > pauses with 0 full sweep) 
>> >> > elapsed time: 0.031006382 seconds (14 MB allocated) 
>> >> > elapsed time: 0.131579099 seconds (14 MB allocated, 73.64% gc time in 
>> 1 
>> >> > pauses with 1 full sweep) 
>> >> > elapsed time: 0.140120717 seconds (14 MB allocated, 73.58% gc time in 
>> 1 
>> >> > pauses with 0 full sweep) 
>> >> > elapsed time: 0.030248237 seconds (14 MB allocated) 
>> >> > ... 
>> >> > elapsed time: 0.507894781 seconds (5 MB allocated, 97.65% gc time in 
>> 1 
>> >> > pauses with 0 full sweep) 
>> >> > elapsed time: 0.011821657 seconds (5 MB allocated) 
>> >> > elapsed time: 0.011610651 seconds (5 MB allocated) 
>> >> > elapsed time: 0.011816277 seconds (5 MB allocated) 
>> >> > elapsed time: 0.50779098 seconds (5 MB allocated, 97.65% gc time in 1 
>> >> > pauses with 0 full sweep) 
>> >> > elapsed time: 0.011997168 seconds (5 MB allocated) 
>> >> > elapsed time: 0.011721667 seconds (5 MB allocated) 
>> >> > elapsed time: 0.011561071 seconds (5 MB allocated) 
>> >> 
>> >> This looks ok-ish to me.  The program runs, allocates memory and every 
>> >> so often the memory is garbage collected.  What is not ok that the gc 
>> >> runs after only 20MB is allocated and that it takes so long.  But at 
>> >> that point all the memory is used up, right?  Maybe that is why it 
>> takes 
>> >> so long then? 
>> >> 
>> >> > On Saturday, April 4, 2015 at 12:38:46 PM UTC-4, Patrick O'Leary 
>> wrote: 
>> >> >> 
>> >> >> Silly me, ignoring all the commented out lines assuming they were 
>> >> >> comments...yes, this is almost certainly it. 
>> >> >> 
>> >> >> On Saturday, April 4, 2015 at 3:24:50 AM UTC-5, Tim Holy wrote: 
>> >> >>> 
>> >> >>> Devectorization should never slow anything down. If it does, then 
>> you 
>> >> >>> have 
>> >> >>> some other problem. Here, M is a global variable, and that's 
>> probably 
>> >> >>> what's 
>> >> >>> killing you. Performance tip #1: 
>> >> >>> http://docs.julialang.org/en/latest/manual/performance-tips/ 
>> >> >>> 
>> >> >>> --Tim 
>> >> >>> 
>> >> >>> On Friday, April 03, 2015 09:43:51 AM Adam Labadorf wrote: 
>> >> >>> > Hi, 
>> >> >>> > 
>> >> >>> > I am struggling with an issue related to garbage collection 
>> taking 
>> >> up 
>> >> >>> the 
>> >> >>> > vast majority (>99%) of compute time on a simple nested for loop. 
>> >> Code 
>> >> >>> > excerpt below: 
>> >> >>> > 
>> >> >>> > # julia version 0.3.7 
>> >> >>> > # counts is an MxN matrix of Float64 
>> >> >>> > # N=15000 
>> >> >>> > # M=108 
>> >> >>> > # h_cols and c_cols are indices \in {1:M} 
>> >> >>> > using HypothesisTests, ArrayViews 
>> >> >>> > ratios = Array(Float64,M) 
>> >> >>> > function compute!(S,tot_i::Int64,i::Int64,j::Int64) 
>> >> >>> >     ratios = view(counts,:,i)./view(counts,:,j) 
>> >> >>> >     #for k=1:M 
>> >> >>> >     #  ratios[k] = counts[k,i]/counts[k,j] 
>> >> >>> >     #end 
>> >> >>> >     #ratios = counts[:,i]./counts[:,j] 
>> >> >>> >     t = UnequalVarianceTTest(ratios[h_cols],ratios[c_cols]) 
>> >> >>> >     S[tot_i] = (pvalue(t),i,j) 
>> >> >>> > end 
>> >> >>> > 
>> >> >>> > for i=1:N-1 
>> >> >>> >   @time for j=(i+1):N 
>> >> >>> >     tot_i += 1 
>> >> >>> >     compute!(S,tot_i,i,j) 
>> >> >>> >   end 
>> >> >>> > end 
>> >> >>> > 
>> >> >>> > The loop begins fast, output from time: 
>> >> >>> > 
>> >> >>> > elapsed time: 1.023850054 seconds (62027220 bytes allocated) 
>> >> >>> > elapsed time: 0.170916977 seconds (45785624 bytes allocated) 
>> >> >>> > elapsed time: 0.171598156 seconds (45782760 bytes allocated) 
>> >> >>> > elapsed time: 0.173866309 seconds (45779896 bytes allocated) 
>> >> >>> > elapsed time: 0.170267172 seconds (45777032 bytes allocated) 
>> >> >>> > elapsed time: 0.171754713 seconds (45774168 bytes allocated) 
>> >> >>> > elapsed time: 0.170110142 seconds (45771304 bytes allocated) 
>> >> >>> > elapsed time: 0.175199053 seconds (45768440 bytes allocated) 
>> >> >>> > elapsed time: 0.179893161 seconds (45765576 bytes allocated) 
>> >> >>> > elapsed time: 0.212172824 seconds (45762712 bytes allocated) 
>> >> >>> > elapsed time: 0.252750549 seconds (45759848 bytes allocated) 
>> >> >>> > elapsed time: 0.254874855 seconds (45756984 bytes allocated) 
>> >> >>> > elapsed time: 0.231003319 seconds (45754120 bytes allocated) 
>> >> >>> > elapsed time: 0.235060195 seconds (45751256 bytes allocated) 
>> >> >>> > elapsed time: 0.235379355 seconds (45748392 bytes allocated) 
>> >> >>> > elapsed time: 0.927622743 seconds (45746168 bytes allocated, 
>> 77.65% 
>> >> gc 
>> >> >>> time) 
>> >> >>> > elapsed time: 0.9132931 seconds (45742664 bytes allocated, 78.35% 
>> gc 
>> >> >>> time) 
>> >> >>> > 
>> >> >>> > But as soon as it starts doing gc the % time spent in increases 
>> >> almost 
>> >> >>> > indefinitely, output from time much later: 
>> >> >>> > 
>> >> >>> > elapsed time: 0.174122929 seconds (36239160 bytes allocated) 
>> >> >>> > elapsed time: 18.535572658 seconds (36236168 bytes allocated, 
>> 99.22% 
>> >> gc 
>> >> >>> > time) 
>> >> >>> > elapsed time: 19.189478819 seconds (36233176 bytes allocated, 
>> 99.26% 
>> >> gc 
>> >> >>> > time) 
>> >> >>> > elapsed time: 21.812889439 seconds (36230184 bytes allocated, 
>> 99.35% 
>> >> gc 
>> >> >>> > time) 
>> >> >>> > elapsed time: 22.182467723 seconds (36227192 bytes allocated, 
>> 99.30% 
>> >> gc 
>> >> >>> > time) 
>> >> >>> > elapsed time: 0.169849999 seconds (36224200 bytes allocated) 
>> >> >>> > 
>> >> >>> > The inner loop, despite iterating over fewer and fewer indices 
>> has 
>> >> >>> > massively increased the gc, and therefore overall, execution 
>> time. I 
>> >> >>> have 
>> >> >>> > tried many things, including creating the compute function, 
>> >> >>> devectorizing 
>> >> >>> > the ratios calculation (which really slowed things down), using 
>> view 
>> >> >>> and 
>> >> >>> > sub in various places, profiling with --trace-allocation=all but 
>> I 
>> >> >>> can't 
>> >> >>> > figure out why this happens or how to fix it. Doing gc for 22 
>> >> seconds 
>> >> >>> > obviously kills the performance, and since there are about 22M 
>> >> >>> iterations 
>> >> >>> > this is prohibitive. Can anyone suggest something I can try to 
>> >> improve 
>> >> >>> the 
>> >> >>> > performance here? 
>> >> >>> > 
>> >> >>> > Thanks, 
>> >> >>> > Adam 
>> >> >>> 
>> >> >>> 
>> >> 
>> >> 
>>
>>

Reply via email to