Silly me, ignoring all the commented out lines assuming they were comments...yes, this is almost certainly it.
On Saturday, April 4, 2015 at 3:24:50 AM UTC-5, Tim Holy wrote: > > Devectorization should never slow anything down. If it does, then you have > some other problem. Here, M is a global variable, and that's probably > what's > killing you. Performance tip #1: > http://docs.julialang.org/en/latest/manual/performance-tips/ > > --Tim > > On Friday, April 03, 2015 09:43:51 AM Adam Labadorf wrote: > > Hi, > > > > I am struggling with an issue related to garbage collection taking up > the > > vast majority (>99%) of compute time on a simple nested for loop. Code > > excerpt below: > > > > # julia version 0.3.7 > > # counts is an MxN matrix of Float64 > > # N=15000 > > # M=108 > > # h_cols and c_cols are indices \in {1:M} > > using HypothesisTests, ArrayViews > > ratios = Array(Float64,M) > > function compute!(S,tot_i::Int64,i::Int64,j::Int64) > > ratios = view(counts,:,i)./view(counts,:,j) > > #for k=1:M > > # ratios[k] = counts[k,i]/counts[k,j] > > #end > > #ratios = counts[:,i]./counts[:,j] > > t = UnequalVarianceTTest(ratios[h_cols],ratios[c_cols]) > > S[tot_i] = (pvalue(t),i,j) > > end > > > > for i=1:N-1 > > @time for j=(i+1):N > > tot_i += 1 > > compute!(S,tot_i,i,j) > > end > > end > > > > The loop begins fast, output from time: > > > > elapsed time: 1.023850054 seconds (62027220 bytes allocated) > > elapsed time: 0.170916977 seconds (45785624 bytes allocated) > > elapsed time: 0.171598156 seconds (45782760 bytes allocated) > > elapsed time: 0.173866309 seconds (45779896 bytes allocated) > > elapsed time: 0.170267172 seconds (45777032 bytes allocated) > > elapsed time: 0.171754713 seconds (45774168 bytes allocated) > > elapsed time: 0.170110142 seconds (45771304 bytes allocated) > > elapsed time: 0.175199053 seconds (45768440 bytes allocated) > > elapsed time: 0.179893161 seconds (45765576 bytes allocated) > > elapsed time: 0.212172824 seconds (45762712 bytes allocated) > > elapsed time: 0.252750549 seconds (45759848 bytes allocated) > > elapsed time: 0.254874855 seconds (45756984 bytes allocated) > > elapsed time: 0.231003319 seconds (45754120 bytes allocated) > > elapsed time: 0.235060195 seconds (45751256 bytes allocated) > > elapsed time: 0.235379355 seconds (45748392 bytes allocated) > > elapsed time: 0.927622743 seconds (45746168 bytes allocated, 77.65% gc > time) > > elapsed time: 0.9132931 seconds (45742664 bytes allocated, 78.35% gc > time) > > > > But as soon as it starts doing gc the % time spent in increases almost > > indefinitely, output from time much later: > > > > elapsed time: 0.174122929 seconds (36239160 bytes allocated) > > elapsed time: 18.535572658 seconds (36236168 bytes allocated, 99.22% gc > > time) > > elapsed time: 19.189478819 seconds (36233176 bytes allocated, 99.26% gc > > time) > > elapsed time: 21.812889439 seconds (36230184 bytes allocated, 99.35% gc > > time) > > elapsed time: 22.182467723 seconds (36227192 bytes allocated, 99.30% gc > > time) > > elapsed time: 0.169849999 seconds (36224200 bytes allocated) > > > > The inner loop, despite iterating over fewer and fewer indices has > > massively increased the gc, and therefore overall, execution time. I > have > > tried many things, including creating the compute function, > devectorizing > > the ratios calculation (which really slowed things down), using view and > > sub in various places, profiling with --trace-allocation=all but I can't > > figure out why this happens or how to fix it. Doing gc for 22 seconds > > obviously kills the performance, and since there are about 22M > iterations > > this is prohibitive. Can anyone suggest something I can try to improve > the > > performance here? > > > > Thanks, > > Adam > >