Re: [julia-users] Extreme garbage collection time
just a moment to applaud this exemplary thread !!! excellent closing post ... love the Julia community. best, cdm
Re: [julia-users] Extreme garbage collection time
Out of curiosity, what happens if you take the compute! function out of main and pass ratios (and any other needed variables) as another argument? --Tim On Monday, April 06, 2015 09:14:24 PM Adam Labadorf wrote: Thanks for the replies. I took your suggestions (and reread the scope section of the docs) and am still experiencing the gc creep. Below is the complete program, with the notable changes that I wrapped the main computation in a function and eliminated all references to global variables inside. I'm also using the most recent nightly build of 0.4. Overall this version of the code is much faster, but there is still significant slowdown as the computation progresses. Is this expected? Do you see anything I'm doing wrong? # julia v0.4.0-dev+4159 using HypothesisTests, ArrayViews type Result pvalue::Float64 i::Int64 j::Int64 end function readtable(fn) fp = readcsv(fn) columns = fp[1,2:end] rows = fp[:,2:end] data = float(fp[2:end,2:end]) return (columns,rows,data) end @time (cols, genes, counts) = readtable(../all_mRNA_nonzero_norm_counts_trim.csv) h_cols = find([c[1] == 'H' for c in cols]) c_cols = find([c[1] == 'C' for c in cols]) # filter out genes with zeros since it messes with the ratios nonzero = mapslices(x - !any(x.==0),counts,2) counts = counts[find(nonzero),[h_cols;c_cols]] # slices seem to be faster h_cols = 1:length(h_cols) c_cols = (length(h_cols)+1):size(counts,2) # arrays are stored in column order, transpose counts to make # accessing more efficient counts = transpose(counts) genes = genes[find(nonzero)] function main(counts,genes,h_cols,c_cols) N = size(genes,1) M = size(counts,1) ratios = Array(Float64,M) function compute!(S::Array{Result,1},counts::Array{Float64,2},tot_i::Int64,i::Int64,j ::Int64,h_cols::UnitRange{Int64},c_cols::UnitRange{Int64},M::Int64) for k=1:M ratios[k] = counts[k,i]/counts[k,j] end t = UnequalVarianceTTest(view(ratios,h_cols),view(ratios,c_cols)) S[tot_i] = Result(pvalue(t),i,j) end tot_i = 0 tot = (N^2-N)/2 S = Array(Result,round(Int,tot)) for i=1:N-1 @time for j=(i+1):N tot_i += 1 compute!(S,counts,tot_i,i,j,h_cols,c_cols,M) end end end S = main(counts,genes,h_cols,c_cols) And the output: elapsed time: 0.427719149 seconds (23 MB allocated, 39.90% gc time in 2 pauses with 0 full sweep) elapsed time: 0.031006382 seconds (14 MB allocated) elapsed time: 0.131579099 seconds (14 MB allocated, 73.64% gc time in 1 pauses with 1 full sweep) elapsed time: 0.140120717 seconds (14 MB allocated, 73.58% gc time in 1 pauses with 0 full sweep) elapsed time: 0.030248237 seconds (14 MB allocated) ... elapsed time: 0.507894781 seconds (5 MB allocated, 97.65% gc time in 1 pauses with 0 full sweep) elapsed time: 0.011821657 seconds (5 MB allocated) elapsed time: 0.011610651 seconds (5 MB allocated) elapsed time: 0.011816277 seconds (5 MB allocated) elapsed time: 0.50779098 seconds (5 MB allocated, 97.65% gc time in 1 pauses with 0 full sweep) elapsed time: 0.011997168 seconds (5 MB allocated) elapsed time: 0.011721667 seconds (5 MB allocated) elapsed time: 0.011561071 seconds (5 MB allocated) On Saturday, April 4, 2015 at 12:38:46 PM UTC-4, Patrick O'Leary wrote: Silly me, ignoring all the commented out lines assuming they were comments...yes, this is almost certainly it. On Saturday, April 4, 2015 at 3:24:50 AM UTC-5, Tim Holy wrote: Devectorization should never slow anything down. If it does, then you have some other problem. Here, M is a global variable, and that's probably what's killing you. Performance tip #1: http://docs.julialang.org/en/latest/manual/performance-tips/ --Tim On Friday, April 03, 2015 09:43:51 AM Adam Labadorf wrote: Hi, I am struggling with an issue related to garbage collection taking up the vast majority (99%) of compute time on a simple nested for loop. Code excerpt below: # julia version 0.3.7 # counts is an MxN matrix of Float64 # N=15000 # M=108 # h_cols and c_cols are indices \in {1:M} using HypothesisTests, ArrayViews ratios = Array(Float64,M) function compute!(S,tot_i::Int64,i::Int64,j::Int64) ratios = view(counts,:,i)./view(counts,:,j) #for k=1:M # ratios[k] = counts[k,i]/counts[k,j] #end #ratios = counts[:,i]./counts[:,j] t = UnequalVarianceTTest(ratios[h_cols],ratios[c_cols]) S[tot_i] = (pvalue(t),i,j) end for i=1:N-1 @time for j=(i+1):N tot_i += 1 compute!(S,tot_i,i,j) end end The loop begins fast, output from time: elapsed time: 1.023850054 seconds (62027220 bytes allocated) elapsed time: 0.170916977 seconds (45785624 bytes allocated) elapsed time: 0.171598156 seconds (45782760 bytes allocated) elapsed time: 0.173866309
Re: [julia-users] Extreme garbage collection time
Can you post your csv file so we can actually run the program? On Tuesday, April 7, 2015 at 6:21:18 AM UTC+2, Adam Labadorf wrote: Thanks for the replies. I took your suggestions (and reread the scope section of the docs) and am still experiencing the gc creep. Below is the complete program, with the notable changes that I wrapped the main computation in a function and eliminated all references to global variables inside. I'm also using the most recent nightly build of 0.4. Overall this version of the code is much faster, but there is still significant slowdown as the computation progresses. Is this expected? Do you see anything I'm doing wrong? # julia v0.4.0-dev+4159 using HypothesisTests, ArrayViews type Result pvalue::Float64 i::Int64 j::Int64 end function readtable(fn) fp = readcsv(fn) columns = fp[1,2:end] rows = fp[:,2:end] data = float(fp[2:end,2:end]) return (columns,rows,data) end @time (cols, genes, counts) = readtable(../all_mRNA_nonzero_norm_counts_trim.csv) h_cols = find([c[1] == 'H' for c in cols]) c_cols = find([c[1] == 'C' for c in cols]) # filter out genes with zeros since it messes with the ratios nonzero = mapslices(x - !any(x.==0),counts,2) counts = counts[find(nonzero),[h_cols;c_cols]] # slices seem to be faster h_cols = 1:length(h_cols) c_cols = (length(h_cols)+1):size(counts,2) # arrays are stored in column order, transpose counts to make # accessing more efficient counts = transpose(counts) genes = genes[find(nonzero)] function main(counts,genes,h_cols,c_cols) N = size(genes,1) M = size(counts,1) ratios = Array(Float64,M) function compute!(S::Array{Result,1},counts::Array{Float64,2},tot_i::Int64,i::Int64,j::Int64,h_cols::UnitRange{Int64},c_cols::UnitRange{Int64},M::Int64) for k=1:M ratios[k] = counts[k,i]/counts[k,j] end t = UnequalVarianceTTest(view(ratios,h_cols),view(ratios,c_cols)) S[tot_i] = Result(pvalue(t),i,j) end tot_i = 0 tot = (N^2-N)/2 S = Array(Result,round(Int,tot)) for i=1:N-1 @time for j=(i+1):N tot_i += 1 compute!(S,counts,tot_i,i,j,h_cols,c_cols,M) end end end S = main(counts,genes,h_cols,c_cols) And the output: elapsed time: 0.427719149 seconds (23 MB allocated, 39.90% gc time in 2 pauses with 0 full sweep) elapsed time: 0.031006382 seconds (14 MB allocated) elapsed time: 0.131579099 seconds (14 MB allocated, 73.64% gc time in 1 pauses with 1 full sweep) elapsed time: 0.140120717 seconds (14 MB allocated, 73.58% gc time in 1 pauses with 0 full sweep) elapsed time: 0.030248237 seconds (14 MB allocated) ... elapsed time: 0.507894781 seconds (5 MB allocated, 97.65% gc time in 1 pauses with 0 full sweep) elapsed time: 0.011821657 seconds (5 MB allocated) elapsed time: 0.011610651 seconds (5 MB allocated) elapsed time: 0.011816277 seconds (5 MB allocated) elapsed time: 0.50779098 seconds (5 MB allocated, 97.65% gc time in 1 pauses with 0 full sweep) elapsed time: 0.011997168 seconds (5 MB allocated) elapsed time: 0.011721667 seconds (5 MB allocated) elapsed time: 0.011561071 seconds (5 MB allocated) On Saturday, April 4, 2015 at 12:38:46 PM UTC-4, Patrick O'Leary wrote: Silly me, ignoring all the commented out lines assuming they were comments...yes, this is almost certainly it. On Saturday, April 4, 2015 at 3:24:50 AM UTC-5, Tim Holy wrote: Devectorization should never slow anything down. If it does, then you have some other problem. Here, M is a global variable, and that's probably what's killing you. Performance tip #1: http://docs.julialang.org/en/latest/manual/performance-tips/ --Tim On Friday, April 03, 2015 09:43:51 AM Adam Labadorf wrote: Hi, I am struggling with an issue related to garbage collection taking up the vast majority (99%) of compute time on a simple nested for loop. Code excerpt below: # julia version 0.3.7 # counts is an MxN matrix of Float64 # N=15000 # M=108 # h_cols and c_cols are indices \in {1:M} using HypothesisTests, ArrayViews ratios = Array(Float64,M) function compute!(S,tot_i::Int64,i::Int64,j::Int64) ratios = view(counts,:,i)./view(counts,:,j) #for k=1:M # ratios[k] = counts[k,i]/counts[k,j] #end #ratios = counts[:,i]./counts[:,j] t = UnequalVarianceTTest(ratios[h_cols],ratios[c_cols]) S[tot_i] = (pvalue(t),i,j) end for i=1:N-1 @time for j=(i+1):N tot_i += 1 compute!(S,tot_i,i,j) end end The loop begins fast, output from time: elapsed time: 1.023850054 seconds (62027220 bytes allocated) elapsed time: 0.170916977 seconds (45785624 bytes allocated) elapsed time: 0.171598156 seconds (45782760 bytes allocated) elapsed time: 0.173866309 seconds (45779896 bytes allocated) elapsed time: 0.170267172 seconds (45777032 bytes allocated) elapsed time: 0.171754713
Re: [julia-users] Extreme garbage collection time
On Tue, 2015-04-07 at 06:14, Adam Labadorf alabad...@gmail.com wrote: Thanks for the replies. I took your suggestions (and reread the scope section of the docs) and am still experiencing the gc creep. Below is the complete program, with the notable changes that I wrapped the main computation in a function and eliminated all references to global variables inside. I'm also using the most recent nightly build of 0.4. Overall this version of the code is much faster, but there is still significant slowdown as the computation progresses. Is this expected? Do you see anything I'm doing wrong? # julia v0.4.0-dev+4159 using HypothesisTests, ArrayViews type Result pvalue::Float64 i::Int64 j::Int64 end function readtable(fn) fp = readcsv(fn) columns = fp[1,2:end] rows = fp[:,2:end] data = float(fp[2:end,2:end]) return (columns,rows,data) end @time (cols, genes, counts) = readtable(../all_mRNA_nonzero_norm_counts_trim.csv) h_cols = find([c[1] == 'H' for c in cols]) c_cols = find([c[1] == 'C' for c in cols]) # filter out genes with zeros since it messes with the ratios nonzero = mapslices(x - !any(x.==0),counts,2) counts = counts[find(nonzero),[h_cols;c_cols]] # slices seem to be faster h_cols = 1:length(h_cols) c_cols = (length(h_cols)+1):size(counts,2) # arrays are stored in column order, transpose counts to make # accessing more efficient counts = transpose(counts) genes = genes[find(nonzero)] function main(counts,genes,h_cols,c_cols) N = size(genes,1) M = size(counts,1) ratios = Array(Float64,M) function compute!(S::Array{Result,1},counts::Array{Float64,2},tot_i::Int64,i::Int64,j::Int64,h_cols::UnitRange{Int64},c_cols::UnitRange{Int64},M::Int64) for k=1:M ratios[k] = counts[k,i]/counts[k,j] end t = UnequalVarianceTTest(view(ratios,h_cols),view(ratios,c_cols)) S[tot_i] = Result(pvalue(t),i,j) end Sadly, nested function are often bad as type-inference does not work properly, as Tim suggested. Consider this example: function a(n) aa(x,y) = x*y out = 0 for i=1:n out += aa(i,i) end out end bb(x,y) = x*y function b(n) out = 0 for i=1:n out += bb(i,i) end out end n = 10^7 @time a(n) @time a(n) # elapsed time: 0.680312065 seconds (1220 MB allocated, 2.71% gc time in 56 pauses with 0 full sweep) b(n) @time b(n) # elapsed time: 3.086e-6 seconds (192 bytes allocated) @code_warntype a(n) # see how the return type of the function is not inferred! @code_warntype b(n) Move compute! out of main and it should be better. tot_i = 0 tot = (N^2-N)/2 S = Array(Result,round(Int,tot)) for i=1:N-1 @time for j=(i+1):N tot_i += 1 compute!(S,counts,tot_i,i,j,h_cols,c_cols,M) end end end S = main(counts,genes,h_cols,c_cols) And the output: elapsed time: 0.427719149 seconds (23 MB allocated, 39.90% gc time in 2 pauses with 0 full sweep) elapsed time: 0.031006382 seconds (14 MB allocated) elapsed time: 0.131579099 seconds (14 MB allocated, 73.64% gc time in 1 pauses with 1 full sweep) elapsed time: 0.140120717 seconds (14 MB allocated, 73.58% gc time in 1 pauses with 0 full sweep) elapsed time: 0.030248237 seconds (14 MB allocated) ... elapsed time: 0.507894781 seconds (5 MB allocated, 97.65% gc time in 1 pauses with 0 full sweep) elapsed time: 0.011821657 seconds (5 MB allocated) elapsed time: 0.011610651 seconds (5 MB allocated) elapsed time: 0.011816277 seconds (5 MB allocated) elapsed time: 0.50779098 seconds (5 MB allocated, 97.65% gc time in 1 pauses with 0 full sweep) elapsed time: 0.011997168 seconds (5 MB allocated) elapsed time: 0.011721667 seconds (5 MB allocated) elapsed time: 0.011561071 seconds (5 MB allocated) This looks ok-ish to me. The program runs, allocates memory and every so often the memory is garbage collected. What is not ok that the gc runs after only 20MB is allocated and that it takes so long. But at that point all the memory is used up, right? Maybe that is why it takes so long then? On Saturday, April 4, 2015 at 12:38:46 PM UTC-4, Patrick O'Leary wrote: Silly me, ignoring all the commented out lines assuming they were comments...yes, this is almost certainly it. On Saturday, April 4, 2015 at 3:24:50 AM UTC-5, Tim Holy wrote: Devectorization should never slow anything down. If it does, then you have some other problem. Here, M is a global variable, and that's probably what's killing you. Performance tip #1: http://docs.julialang.org/en/latest/manual/performance-tips/ --Tim On Friday, April 03, 2015 09:43:51 AM Adam Labadorf wrote: Hi, I am struggling with an issue related to garbage collection taking up the vast majority (99%) of compute time on a simple nested for loop. Code excerpt below: # julia version 0.3.7 # counts is an MxN matrix of Float64 # N=15000 # M=108 # h_cols and c_cols are
Re: [julia-users] Extreme garbage collection time
I moved compute! out of the main function and pass the ratios as you suggested and the performance is a bit better but I still notice the gc time increasing albeit more slowly, new code below. The csv I'm using is available at https://www.dropbox.com/s/uqeisg1vx027gjc/all_mRNA_nonzero_norm_counts_trim.csv?dl=0 using HypothesisTests, ArrayViews type Result pvalue::Float64 i::Int64 j::Int64 end function readtable(fn) fp = readcsv(fn) columns = fp[1,2:end] rows = fp[2:end,1] data = float(fp[2:end,2:end]) return (columns,rows,data) end @time (cols, genes, counts) = readtable(../all_mRNA_nonzero_norm_counts_trim.csv) h_cols = find([c[1] == 'H' for c in cols]) c_cols = find([c[1] == 'C' for c in cols]) # restrict to HD and control # add a pseudocount to avoid division by zero errors counts = counts[:,[h_cols;c_cols]] + 0.01 h_cols = 1:length(h_cols) c_cols = (length(h_cols)+1):size(counts,2) # arrays are stored in column order, transpose counts to make # accessing more efficient counts = transpose(counts) function compute!(S::Array{Result,1},ratios::Array{Float64,1},tot_i::Int64,i::Int64,j::Int64,h_cols::UnitRange{Int64},c_cols::UnitRange{Int64},M::Int64) t = UnequalVarianceTTest(view(ratios,h_cols),view(ratios,c_cols)) S[tot_i] = Result(pvalue(t),i,j) end function main(counts,genes,h_cols,c_cols) N = size(genes,1) M = size(counts,1) ratios = Array(Float64,M) tot_i = 0 tot = (N^2-N)/2 S = Array(Result,round(Int,tot)) for i=1:N-1 @time for j=(i+1):N tot_i += 1 # use a sigmoid function to compress log ratios to [-10,10] b = 10 for k=1:M ratios[k] = b*(2/(1+exp(-log2(counts[k,i]/counts[k,j])/(b/2)))-1) end compute!(S,ratios,tot_i,i,j,h_cols,c_cols,M) end @show (tot_i,tot,tot_i/tot) end end S = main(counts,genes,h_cols,c_cols) On Tuesday, April 7, 2015 at 8:47:06 AM UTC-4, Mauro wrote: On Tue, 2015-04-07 at 06:14, Adam Labadorf alab...@gmail.com javascript: wrote: Thanks for the replies. I took your suggestions (and reread the scope section of the docs) and am still experiencing the gc creep. Below is the complete program, with the notable changes that I wrapped the main computation in a function and eliminated all references to global variables inside. I'm also using the most recent nightly build of 0.4. Overall this version of the code is much faster, but there is still significant slowdown as the computation progresses. Is this expected? Do you see anything I'm doing wrong? # julia v0.4.0-dev+4159 using HypothesisTests, ArrayViews type Result pvalue::Float64 i::Int64 j::Int64 end function readtable(fn) fp = readcsv(fn) columns = fp[1,2:end] rows = fp[:,2:end] data = float(fp[2:end,2:end]) return (columns,rows,data) end @time (cols, genes, counts) = readtable(../all_mRNA_nonzero_norm_counts_trim.csv) h_cols = find([c[1] == 'H' for c in cols]) c_cols = find([c[1] == 'C' for c in cols]) # filter out genes with zeros since it messes with the ratios nonzero = mapslices(x - !any(x.==0),counts,2) counts = counts[find(nonzero),[h_cols;c_cols]] # slices seem to be faster h_cols = 1:length(h_cols) c_cols = (length(h_cols)+1):size(counts,2) # arrays are stored in column order, transpose counts to make # accessing more efficient counts = transpose(counts) genes = genes[find(nonzero)] function main(counts,genes,h_cols,c_cols) N = size(genes,1) M = size(counts,1) ratios = Array(Float64,M) function compute!(S::Array{Result,1},counts::Array{Float64,2},tot_i::Int64,i::Int64,j::Int64,h_cols::UnitRange{Int64},c_cols::UnitRange{Int64},M::Int64) for k=1:M ratios[k] = counts[k,i]/counts[k,j] end t = UnequalVarianceTTest(view(ratios,h_cols),view(ratios,c_cols)) S[tot_i] = Result(pvalue(t),i,j) end Sadly, nested function are often bad as type-inference does not work properly, as Tim suggested. Consider this example: function a(n) aa(x,y) = x*y out = 0 for i=1:n out += aa(i,i) end out end bb(x,y) = x*y function b(n) out = 0 for i=1:n out += bb(i,i) end out end n = 10^7 @time a(n) @time a(n) # elapsed time: 0.680312065 seconds (1220 MB allocated, 2.71% gc time in 56 pauses with 0 full sweep) b(n) @time b(n) # elapsed time: 3.086e-6 seconds (192 bytes allocated) @code_warntype a(n) # see how the return type of the function is not inferred! @code_warntype b(n) Move compute! out of main and it should be better. tot_i = 0 tot = (N^2-N)/2 S = Array(Result,round(Int,tot)) for i=1:N-1 @time for j=(i+1):N tot_i += 1 compute!(S,counts,tot_i,i,j,h_cols,c_cols,M) end end end S =
Re: [julia-users] Extreme garbage collection time
Make Result an immutable: immutable Result pvalue::Float64 i::Int64 j::Int64 end that way the numbers are stored in the S array directly and all its memory is preallocated. Otherwise S just holds pointers to the Results objects. gc time is only 2% for me then (but I didn't run it for more than 10min). However, why the gc time increases in the original code, I don't know. I don't think it should. On Tue, 2015-04-07 at 16:26, Adam Labadorf alabad...@gmail.com wrote: I moved compute! out of the main function and pass the ratios as you suggested and the performance is a bit better but I still notice the gc time increasing albeit more slowly, new code below. The csv I'm using is available at https://www.dropbox.com/s/uqeisg1vx027gjc/all_mRNA_nonzero_norm_counts_trim.csv?dl=0 using HypothesisTests, ArrayViews type Result pvalue::Float64 i::Int64 j::Int64 end function readtable(fn) fp = readcsv(fn) columns = fp[1,2:end] rows = fp[2:end,1] data = float(fp[2:end,2:end]) return (columns,rows,data) end @time (cols, genes, counts) = readtable(../all_mRNA_nonzero_norm_counts_trim.csv) h_cols = find([c[1] == 'H' for c in cols]) c_cols = find([c[1] == 'C' for c in cols]) # restrict to HD and control # add a pseudocount to avoid division by zero errors counts = counts[:,[h_cols;c_cols]] + 0.01 h_cols = 1:length(h_cols) c_cols = (length(h_cols)+1):size(counts,2) # arrays are stored in column order, transpose counts to make # accessing more efficient counts = transpose(counts) function compute!(S::Array{Result,1},ratios::Array{Float64,1},tot_i::Int64,i::Int64,j::Int64,h_cols::UnitRange{Int64},c_cols::UnitRange{Int64},M::Int64) t = UnequalVarianceTTest(view(ratios,h_cols),view(ratios,c_cols)) S[tot_i] = Result(pvalue(t),i,j) end function main(counts,genes,h_cols,c_cols) N = size(genes,1) M = size(counts,1) ratios = Array(Float64,M) tot_i = 0 tot = (N^2-N)/2 S = Array(Result,round(Int,tot)) for i=1:N-1 @time for j=(i+1):N tot_i += 1 # use a sigmoid function to compress log ratios to [-10,10] b = 10 for k=1:M ratios[k] = b*(2/(1+exp(-log2(counts[k,i]/counts[k,j])/(b/2)))-1) end compute!(S,ratios,tot_i,i,j,h_cols,c_cols,M) end @show (tot_i,tot,tot_i/tot) end end S = main(counts,genes,h_cols,c_cols) On Tuesday, April 7, 2015 at 8:47:06 AM UTC-4, Mauro wrote: On Tue, 2015-04-07 at 06:14, Adam Labadorf alab...@gmail.com javascript: wrote: Thanks for the replies. I took your suggestions (and reread the scope section of the docs) and am still experiencing the gc creep. Below is the complete program, with the notable changes that I wrapped the main computation in a function and eliminated all references to global variables inside. I'm also using the most recent nightly build of 0.4. Overall this version of the code is much faster, but there is still significant slowdown as the computation progresses. Is this expected? Do you see anything I'm doing wrong? # julia v0.4.0-dev+4159 using HypothesisTests, ArrayViews type Result pvalue::Float64 i::Int64 j::Int64 end function readtable(fn) fp = readcsv(fn) columns = fp[1,2:end] rows = fp[:,2:end] data = float(fp[2:end,2:end]) return (columns,rows,data) end @time (cols, genes, counts) = readtable(../all_mRNA_nonzero_norm_counts_trim.csv) h_cols = find([c[1] == 'H' for c in cols]) c_cols = find([c[1] == 'C' for c in cols]) # filter out genes with zeros since it messes with the ratios nonzero = mapslices(x - !any(x.==0),counts,2) counts = counts[find(nonzero),[h_cols;c_cols]] # slices seem to be faster h_cols = 1:length(h_cols) c_cols = (length(h_cols)+1):size(counts,2) # arrays are stored in column order, transpose counts to make # accessing more efficient counts = transpose(counts) genes = genes[find(nonzero)] function main(counts,genes,h_cols,c_cols) N = size(genes,1) M = size(counts,1) ratios = Array(Float64,M) function compute!(S::Array{Result,1},counts::Array{Float64,2},tot_i::Int64,i::Int64,j::Int64,h_cols::UnitRange{Int64},c_cols::UnitRange{Int64},M::Int64) for k=1:M ratios[k] = counts[k,i]/counts[k,j] end t = UnequalVarianceTTest(view(ratios,h_cols),view(ratios,c_cols)) S[tot_i] = Result(pvalue(t),i,j) end Sadly, nested function are often bad as type-inference does not work properly, as Tim suggested. Consider this example: function a(n) aa(x,y) = x*y out = 0 for i=1:n out += aa(i,i) end out end bb(x,y) = x*y function b(n) out = 0 for i=1:n out += bb(i,i) end out end n = 10^7 @time a(n) @time a(n) # elapsed time: 0.680312065 seconds (1220 MB allocated,
Re: [julia-users] Extreme garbage collection time
Yes, that gives a massive improvement and I see the same gc behavior you got. I could notice the gc increasing slowly after just watching for a minute or two, and that seems to have stopped now. I'll know for sure when I run the whole thing and will report back. Thanks so much to all of you for your swift replies, this was my first julia experience and it's great to have such a supportive community. On Tuesday, April 7, 2015 at 11:40:49 AM UTC-4, Mauro wrote: Make Result an immutable: immutable Result pvalue::Float64 i::Int64 j::Int64 end that way the numbers are stored in the S array directly and all its memory is preallocated. Otherwise S just holds pointers to the Results objects. gc time is only 2% for me then (but I didn't run it for more than 10min). However, why the gc time increases in the original code, I don't know. I don't think it should. On Tue, 2015-04-07 at 16:26, Adam Labadorf alab...@gmail.com javascript: wrote: I moved compute! out of the main function and pass the ratios as you suggested and the performance is a bit better but I still notice the gc time increasing albeit more slowly, new code below. The csv I'm using is available at https://www.dropbox.com/s/uqeisg1vx027gjc/all_mRNA_nonzero_norm_counts_trim.csv?dl=0 using HypothesisTests, ArrayViews type Result pvalue::Float64 i::Int64 j::Int64 end function readtable(fn) fp = readcsv(fn) columns = fp[1,2:end] rows = fp[2:end,1] data = float(fp[2:end,2:end]) return (columns,rows,data) end @time (cols, genes, counts) = readtable(../all_mRNA_nonzero_norm_counts_trim.csv) h_cols = find([c[1] == 'H' for c in cols]) c_cols = find([c[1] == 'C' for c in cols]) # restrict to HD and control # add a pseudocount to avoid division by zero errors counts = counts[:,[h_cols;c_cols]] + 0.01 h_cols = 1:length(h_cols) c_cols = (length(h_cols)+1):size(counts,2) # arrays are stored in column order, transpose counts to make # accessing more efficient counts = transpose(counts) function compute!(S::Array{Result,1},ratios::Array{Float64,1},tot_i::Int64,i::Int64,j::Int64,h_cols::UnitRange{Int64},c_cols::UnitRange{Int64},M::Int64) t = UnequalVarianceTTest(view(ratios,h_cols),view(ratios,c_cols)) S[tot_i] = Result(pvalue(t),i,j) end function main(counts,genes,h_cols,c_cols) N = size(genes,1) M = size(counts,1) ratios = Array(Float64,M) tot_i = 0 tot = (N^2-N)/2 S = Array(Result,round(Int,tot)) for i=1:N-1 @time for j=(i+1):N tot_i += 1 # use a sigmoid function to compress log ratios to [-10,10] b = 10 for k=1:M ratios[k] = b*(2/(1+exp(-log2(counts[k,i]/counts[k,j])/(b/2)))-1) end compute!(S,ratios,tot_i,i,j,h_cols,c_cols,M) end @show (tot_i,tot,tot_i/tot) end end S = main(counts,genes,h_cols,c_cols) On Tuesday, April 7, 2015 at 8:47:06 AM UTC-4, Mauro wrote: On Tue, 2015-04-07 at 06:14, Adam Labadorf alab...@gmail.com javascript: wrote: Thanks for the replies. I took your suggestions (and reread the scope section of the docs) and am still experiencing the gc creep. Below is the complete program, with the notable changes that I wrapped the main computation in a function and eliminated all references to global variables inside. I'm also using the most recent nightly build of 0.4. Overall this version of the code is much faster, but there is still significant slowdown as the computation progresses. Is this expected? Do you see anything I'm doing wrong? # julia v0.4.0-dev+4159 using HypothesisTests, ArrayViews type Result pvalue::Float64 i::Int64 j::Int64 end function readtable(fn) fp = readcsv(fn) columns = fp[1,2:end] rows = fp[:,2:end] data = float(fp[2:end,2:end]) return (columns,rows,data) end @time (cols, genes, counts) = readtable(../all_mRNA_nonzero_norm_counts_trim.csv) h_cols = find([c[1] == 'H' for c in cols]) c_cols = find([c[1] == 'C' for c in cols]) # filter out genes with zeros since it messes with the ratios nonzero = mapslices(x - !any(x.==0),counts,2) counts = counts[find(nonzero),[h_cols;c_cols]] # slices seem to be faster h_cols = 1:length(h_cols) c_cols = (length(h_cols)+1):size(counts,2) # arrays are stored in column order, transpose counts to make # accessing more efficient counts = transpose(counts) genes = genes[find(nonzero)] function main(counts,genes,h_cols,c_cols) N = size(genes,1) M = size(counts,1) ratios = Array(Float64,M) function
Re: [julia-users] Extreme garbage collection time
Glad that helped! (Also not all immutable are inlined that way, only when isbits==true) Maybe the gc time increased because it had to look at more and more objects as more and more Result instances were created? On Tue, 2015-04-07 at 18:24, Adam Labadorf alabad...@gmail.com wrote: Yes, that gives a massive improvement and I see the same gc behavior you got. I could notice the gc increasing slowly after just watching for a minute or two, and that seems to have stopped now. I'll know for sure when I run the whole thing and will report back. Thanks so much to all of you for your swift replies, this was my first julia experience and it's great to have such a supportive community. On Tuesday, April 7, 2015 at 11:40:49 AM UTC-4, Mauro wrote: Make Result an immutable: immutable Result pvalue::Float64 i::Int64 j::Int64 end that way the numbers are stored in the S array directly and all its memory is preallocated. Otherwise S just holds pointers to the Results objects. gc time is only 2% for me then (but I didn't run it for more than 10min). However, why the gc time increases in the original code, I don't know. I don't think it should. On Tue, 2015-04-07 at 16:26, Adam Labadorf alab...@gmail.com javascript: wrote: I moved compute! out of the main function and pass the ratios as you suggested and the performance is a bit better but I still notice the gc time increasing albeit more slowly, new code below. The csv I'm using is available at https://www.dropbox.com/s/uqeisg1vx027gjc/all_mRNA_nonzero_norm_counts_trim.csv?dl=0 using HypothesisTests, ArrayViews type Result pvalue::Float64 i::Int64 j::Int64 end function readtable(fn) fp = readcsv(fn) columns = fp[1,2:end] rows = fp[2:end,1] data = float(fp[2:end,2:end]) return (columns,rows,data) end @time (cols, genes, counts) = readtable(../all_mRNA_nonzero_norm_counts_trim.csv) h_cols = find([c[1] == 'H' for c in cols]) c_cols = find([c[1] == 'C' for c in cols]) # restrict to HD and control # add a pseudocount to avoid division by zero errors counts = counts[:,[h_cols;c_cols]] + 0.01 h_cols = 1:length(h_cols) c_cols = (length(h_cols)+1):size(counts,2) # arrays are stored in column order, transpose counts to make # accessing more efficient counts = transpose(counts) function compute!(S::Array{Result,1},ratios::Array{Float64,1},tot_i::Int64,i::Int64,j::Int64,h_cols::UnitRange{Int64},c_cols::UnitRange{Int64},M::Int64) t = UnequalVarianceTTest(view(ratios,h_cols),view(ratios,c_cols)) S[tot_i] = Result(pvalue(t),i,j) end function main(counts,genes,h_cols,c_cols) N = size(genes,1) M = size(counts,1) ratios = Array(Float64,M) tot_i = 0 tot = (N^2-N)/2 S = Array(Result,round(Int,tot)) for i=1:N-1 @time for j=(i+1):N tot_i += 1 # use a sigmoid function to compress log ratios to [-10,10] b = 10 for k=1:M ratios[k] = b*(2/(1+exp(-log2(counts[k,i]/counts[k,j])/(b/2)))-1) end compute!(S,ratios,tot_i,i,j,h_cols,c_cols,M) end @show (tot_i,tot,tot_i/tot) end end S = main(counts,genes,h_cols,c_cols) On Tuesday, April 7, 2015 at 8:47:06 AM UTC-4, Mauro wrote: On Tue, 2015-04-07 at 06:14, Adam Labadorf alab...@gmail.com javascript: wrote: Thanks for the replies. I took your suggestions (and reread the scope section of the docs) and am still experiencing the gc creep. Below is the complete program, with the notable changes that I wrapped the main computation in a function and eliminated all references to global variables inside. I'm also using the most recent nightly build of 0.4. Overall this version of the code is much faster, but there is still significant slowdown as the computation progresses. Is this expected? Do you see anything I'm doing wrong? # julia v0.4.0-dev+4159 using HypothesisTests, ArrayViews type Result pvalue::Float64 i::Int64 j::Int64 end function readtable(fn) fp = readcsv(fn) columns = fp[1,2:end] rows = fp[:,2:end] data = float(fp[2:end,2:end]) return (columns,rows,data) end @time (cols, genes, counts) = readtable(../all_mRNA_nonzero_norm_counts_trim.csv) h_cols = find([c[1] == 'H' for c in cols]) c_cols = find([c[1] == 'C' for c in cols]) # filter out genes with zeros since it messes with the ratios nonzero = mapslices(x - !any(x.==0),counts,2) counts = counts[find(nonzero),[h_cols;c_cols]] # slices seem to be faster h_cols = 1:length(h_cols) c_cols = (length(h_cols)+1):size(counts,2) # arrays are stored in column order, transpose counts to make # accessing more efficient
Re: [julia-users] Extreme garbage collection time
The code ran to completion at least 17x times faster than the equivalent python implementation. Out of curiosity, I collected the elapsed time and % spent in gc statistics from @time and plotted them from across the whole run: This was run on a shared cluster so the spikes in time are probably due to other things going on with the node. The linear decrease in elapsed time confirms that the immutable type change fixed this issue. The trend in % time spent is linear until the end when the cost of gc was greater than the computations of the inner loop, makes sense. Not sure if anyone else cares about this plot besides me, but figured I'd toss it out there anyway. On Tuesday, April 7, 2015 at 2:46:48 PM UTC-4, Mauro wrote: Glad that helped! (Also not all immutable are inlined that way, only when isbits==true) Maybe the gc time increased because it had to look at more and more objects as more and more Result instances were created? On Tue, 2015-04-07 at 18:24, Adam Labadorf alab...@gmail.com javascript: wrote: Yes, that gives a massive improvement and I see the same gc behavior you got. I could notice the gc increasing slowly after just watching for a minute or two, and that seems to have stopped now. I'll know for sure when I run the whole thing and will report back. Thanks so much to all of you for your swift replies, this was my first julia experience and it's great to have such a supportive community. On Tuesday, April 7, 2015 at 11:40:49 AM UTC-4, Mauro wrote: Make Result an immutable: immutable Result pvalue::Float64 i::Int64 j::Int64 end that way the numbers are stored in the S array directly and all its memory is preallocated. Otherwise S just holds pointers to the Results objects. gc time is only 2% for me then (but I didn't run it for more than 10min). However, why the gc time increases in the original code, I don't know. I don't think it should. On Tue, 2015-04-07 at 16:26, Adam Labadorf alab...@gmail.com javascript: wrote: I moved compute! out of the main function and pass the ratios as you suggested and the performance is a bit better but I still notice the gc time increasing albeit more slowly, new code below. The csv I'm using is available at https://www.dropbox.com/s/uqeisg1vx027gjc/all_mRNA_nonzero_norm_counts_trim.csv?dl=0 using HypothesisTests, ArrayViews type Result pvalue::Float64 i::Int64 j::Int64 end function readtable(fn) fp = readcsv(fn) columns = fp[1,2:end] rows = fp[2:end,1] data = float(fp[2:end,2:end]) return (columns,rows,data) end @time (cols, genes, counts) = readtable(../all_mRNA_nonzero_norm_counts_trim.csv) h_cols = find([c[1] == 'H' for c in cols]) c_cols = find([c[1] == 'C' for c in cols]) # restrict to HD and control # add a pseudocount to avoid division by zero errors counts = counts[:,[h_cols;c_cols]] + 0.01 h_cols = 1:length(h_cols) c_cols = (length(h_cols)+1):size(counts,2) # arrays are stored in column order, transpose counts to make # accessing more efficient counts = transpose(counts) function compute!(S::Array{Result,1},ratios::Array{Float64,1},tot_i::Int64,i::Int64,j::Int64,h_cols::UnitRange{Int64},c_cols::UnitRange{Int64},M::Int64) t = UnequalVarianceTTest(view(ratios,h_cols),view(ratios,c_cols)) S[tot_i] = Result(pvalue(t),i,j) end function main(counts,genes,h_cols,c_cols) N = size(genes,1) M = size(counts,1) ratios = Array(Float64,M) tot_i = 0 tot = (N^2-N)/2 S = Array(Result,round(Int,tot)) for i=1:N-1 @time for j=(i+1):N tot_i += 1 # use a sigmoid function to compress log ratios to [-10,10] b = 10 for k=1:M ratios[k] = b*(2/(1+exp(-log2(counts[k,i]/counts[k,j])/(b/2)))-1) end compute!(S,ratios,tot_i,i,j,h_cols,c_cols,M) end @show (tot_i,tot,tot_i/tot) end end S = main(counts,genes,h_cols,c_cols) On Tuesday, April 7, 2015 at 8:47:06 AM UTC-4, Mauro wrote: On Tue, 2015-04-07 at 06:14, Adam Labadorf alab...@gmail.com javascript: wrote: Thanks for the replies. I took your suggestions (and reread the scope section of the docs) and am still experiencing the gc creep. Below is the complete program, with the notable changes that I wrapped the main computation in a function and eliminated all references to global variables inside. I'm also using the most recent nightly build of 0.4. Overall this version of the code is much faster, but there is still significant slowdown as the computation progresses. Is this expected? Do you see anything I'm
Re: [julia-users] Extreme garbage collection time
The code ran to completion at least 17x times faster than the equivalent python implementation. Out of curiosity, I collected the elapsed time and % spent in gc statistics from @time and plotted them from across the whole run. This was run on a shared cluster so the spikes in time are probably due to other things going on with the node. The linear decrease in elapsed time confirms that the immutable type change fixed this issue. The trend in % time spent is linear until the end when the cost of gc was greater than the computations of the inner loop, makes sense. Not sure if anyone else cares about this plot besides me, but figured I'd toss it out there anyway. https://lh3.googleusercontent.com/-ftbz3MuaUfQ/VSRTBC_sJVI/DvA/_MZ96knbNsE/s1600/time_gc_julia.png On Tuesday, April 7, 2015 at 2:46:48 PM UTC-4, Mauro wrote: Glad that helped! (Also not all immutable are inlined that way, only when isbits==true) Maybe the gc time increased because it had to look at more and more objects as more and more Result instances were created? On Tue, 2015-04-07 at 18:24, Adam Labadorf alab...@gmail.com javascript: wrote: Yes, that gives a massive improvement and I see the same gc behavior you got. I could notice the gc increasing slowly after just watching for a minute or two, and that seems to have stopped now. I'll know for sure when I run the whole thing and will report back. Thanks so much to all of you for your swift replies, this was my first julia experience and it's great to have such a supportive community. On Tuesday, April 7, 2015 at 11:40:49 AM UTC-4, Mauro wrote: Make Result an immutable: immutable Result pvalue::Float64 i::Int64 j::Int64 end that way the numbers are stored in the S array directly and all its memory is preallocated. Otherwise S just holds pointers to the Results objects. gc time is only 2% for me then (but I didn't run it for more than 10min). However, why the gc time increases in the original code, I don't know. I don't think it should. On Tue, 2015-04-07 at 16:26, Adam Labadorf alab...@gmail.com javascript: wrote: I moved compute! out of the main function and pass the ratios as you suggested and the performance is a bit better but I still notice the gc time increasing albeit more slowly, new code below. The csv I'm using is available at https://www.dropbox.com/s/uqeisg1vx027gjc/all_mRNA_nonzero_norm_counts_trim.csv?dl=0 using HypothesisTests, ArrayViews type Result pvalue::Float64 i::Int64 j::Int64 end function readtable(fn) fp = readcsv(fn) columns = fp[1,2:end] rows = fp[2:end,1] data = float(fp[2:end,2:end]) return (columns,rows,data) end @time (cols, genes, counts) = readtable(../all_mRNA_nonzero_norm_counts_trim.csv) h_cols = find([c[1] == 'H' for c in cols]) c_cols = find([c[1] == 'C' for c in cols]) # restrict to HD and control # add a pseudocount to avoid division by zero errors counts = counts[:,[h_cols;c_cols]] + 0.01 h_cols = 1:length(h_cols) c_cols = (length(h_cols)+1):size(counts,2) # arrays are stored in column order, transpose counts to make # accessing more efficient counts = transpose(counts) function compute!(S::Array{Result,1},ratios::Array{Float64,1},tot_i::Int64,i::Int64,j::Int64,h_cols::UnitRange{Int64},c_cols::UnitRange{Int64},M::Int64) t = UnequalVarianceTTest(view(ratios,h_cols),view(ratios,c_cols)) S[tot_i] = Result(pvalue(t),i,j) end function main(counts,genes,h_cols,c_cols) N = size(genes,1) M = size(counts,1) ratios = Array(Float64,M) tot_i = 0 tot = (N^2-N)/2 S = Array(Result,round(Int,tot)) for i=1:N-1 @time for j=(i+1):N tot_i += 1 # use a sigmoid function to compress log ratios to [-10,10] b = 10 for k=1:M ratios[k] = b*(2/(1+exp(-log2(counts[k,i]/counts[k,j])/(b/2)))-1) end compute!(S,ratios,tot_i,i,j,h_cols,c_cols,M) end @show (tot_i,tot,tot_i/tot) end end S = main(counts,genes,h_cols,c_cols) On Tuesday, April 7, 2015 at 8:47:06 AM UTC-4, Mauro wrote: On Tue, 2015-04-07 at 06:14, Adam Labadorf alab...@gmail.com javascript: wrote: Thanks for the replies. I took your suggestions (and reread the scope section of the docs) and am still experiencing the gc creep. Below is the complete program, with the notable changes that I wrapped the main computation in a function and eliminated all references to global variables inside. I'm also using the most recent nightly build of 0.4. Overall this version of the code is much faster, but there is still significant
Re: [julia-users] Extreme garbage collection time
Thanks for the replies. I took your suggestions (and reread the scope section of the docs) and am still experiencing the gc creep. Below is the complete program, with the notable changes that I wrapped the main computation in a function and eliminated all references to global variables inside. I'm also using the most recent nightly build of 0.4. Overall this version of the code is much faster, but there is still significant slowdown as the computation progresses. Is this expected? Do you see anything I'm doing wrong? # julia v0.4.0-dev+4159 using HypothesisTests, ArrayViews type Result pvalue::Float64 i::Int64 j::Int64 end function readtable(fn) fp = readcsv(fn) columns = fp[1,2:end] rows = fp[:,2:end] data = float(fp[2:end,2:end]) return (columns,rows,data) end @time (cols, genes, counts) = readtable(../all_mRNA_nonzero_norm_counts_trim.csv) h_cols = find([c[1] == 'H' for c in cols]) c_cols = find([c[1] == 'C' for c in cols]) # filter out genes with zeros since it messes with the ratios nonzero = mapslices(x - !any(x.==0),counts,2) counts = counts[find(nonzero),[h_cols;c_cols]] # slices seem to be faster h_cols = 1:length(h_cols) c_cols = (length(h_cols)+1):size(counts,2) # arrays are stored in column order, transpose counts to make # accessing more efficient counts = transpose(counts) genes = genes[find(nonzero)] function main(counts,genes,h_cols,c_cols) N = size(genes,1) M = size(counts,1) ratios = Array(Float64,M) function compute!(S::Array{Result,1},counts::Array{Float64,2},tot_i::Int64,i::Int64,j::Int64,h_cols::UnitRange{Int64},c_cols::UnitRange{Int64},M::Int64) for k=1:M ratios[k] = counts[k,i]/counts[k,j] end t = UnequalVarianceTTest(view(ratios,h_cols),view(ratios,c_cols)) S[tot_i] = Result(pvalue(t),i,j) end tot_i = 0 tot = (N^2-N)/2 S = Array(Result,round(Int,tot)) for i=1:N-1 @time for j=(i+1):N tot_i += 1 compute!(S,counts,tot_i,i,j,h_cols,c_cols,M) end end end S = main(counts,genes,h_cols,c_cols) And the output: elapsed time: 0.427719149 seconds (23 MB allocated, 39.90% gc time in 2 pauses with 0 full sweep) elapsed time: 0.031006382 seconds (14 MB allocated) elapsed time: 0.131579099 seconds (14 MB allocated, 73.64% gc time in 1 pauses with 1 full sweep) elapsed time: 0.140120717 seconds (14 MB allocated, 73.58% gc time in 1 pauses with 0 full sweep) elapsed time: 0.030248237 seconds (14 MB allocated) ... elapsed time: 0.507894781 seconds (5 MB allocated, 97.65% gc time in 1 pauses with 0 full sweep) elapsed time: 0.011821657 seconds (5 MB allocated) elapsed time: 0.011610651 seconds (5 MB allocated) elapsed time: 0.011816277 seconds (5 MB allocated) elapsed time: 0.50779098 seconds (5 MB allocated, 97.65% gc time in 1 pauses with 0 full sweep) elapsed time: 0.011997168 seconds (5 MB allocated) elapsed time: 0.011721667 seconds (5 MB allocated) elapsed time: 0.011561071 seconds (5 MB allocated) On Saturday, April 4, 2015 at 12:38:46 PM UTC-4, Patrick O'Leary wrote: Silly me, ignoring all the commented out lines assuming they were comments...yes, this is almost certainly it. On Saturday, April 4, 2015 at 3:24:50 AM UTC-5, Tim Holy wrote: Devectorization should never slow anything down. If it does, then you have some other problem. Here, M is a global variable, and that's probably what's killing you. Performance tip #1: http://docs.julialang.org/en/latest/manual/performance-tips/ --Tim On Friday, April 03, 2015 09:43:51 AM Adam Labadorf wrote: Hi, I am struggling with an issue related to garbage collection taking up the vast majority (99%) of compute time on a simple nested for loop. Code excerpt below: # julia version 0.3.7 # counts is an MxN matrix of Float64 # N=15000 # M=108 # h_cols and c_cols are indices \in {1:M} using HypothesisTests, ArrayViews ratios = Array(Float64,M) function compute!(S,tot_i::Int64,i::Int64,j::Int64) ratios = view(counts,:,i)./view(counts,:,j) #for k=1:M # ratios[k] = counts[k,i]/counts[k,j] #end #ratios = counts[:,i]./counts[:,j] t = UnequalVarianceTTest(ratios[h_cols],ratios[c_cols]) S[tot_i] = (pvalue(t),i,j) end for i=1:N-1 @time for j=(i+1):N tot_i += 1 compute!(S,tot_i,i,j) end end The loop begins fast, output from time: elapsed time: 1.023850054 seconds (62027220 bytes allocated) elapsed time: 0.170916977 seconds (45785624 bytes allocated) elapsed time: 0.171598156 seconds (45782760 bytes allocated) elapsed time: 0.173866309 seconds (45779896 bytes allocated) elapsed time: 0.170267172 seconds (45777032 bytes allocated) elapsed time: 0.171754713 seconds (45774168 bytes allocated) elapsed time: 0.170110142 seconds (45771304 bytes allocated) elapsed time: 0.175199053 seconds (45768440 bytes allocated) elapsed time: 0.179893161 seconds (45765576 bytes
Re: [julia-users] Extreme garbage collection time
Devectorization should never slow anything down. If it does, then you have some other problem. Here, M is a global variable, and that's probably what's killing you. Performance tip #1: http://docs.julialang.org/en/latest/manual/performance-tips/ --Tim On Friday, April 03, 2015 09:43:51 AM Adam Labadorf wrote: Hi, I am struggling with an issue related to garbage collection taking up the vast majority (99%) of compute time on a simple nested for loop. Code excerpt below: # julia version 0.3.7 # counts is an MxN matrix of Float64 # N=15000 # M=108 # h_cols and c_cols are indices \in {1:M} using HypothesisTests, ArrayViews ratios = Array(Float64,M) function compute!(S,tot_i::Int64,i::Int64,j::Int64) ratios = view(counts,:,i)./view(counts,:,j) #for k=1:M # ratios[k] = counts[k,i]/counts[k,j] #end #ratios = counts[:,i]./counts[:,j] t = UnequalVarianceTTest(ratios[h_cols],ratios[c_cols]) S[tot_i] = (pvalue(t),i,j) end for i=1:N-1 @time for j=(i+1):N tot_i += 1 compute!(S,tot_i,i,j) end end The loop begins fast, output from time: elapsed time: 1.023850054 seconds (62027220 bytes allocated) elapsed time: 0.170916977 seconds (45785624 bytes allocated) elapsed time: 0.171598156 seconds (45782760 bytes allocated) elapsed time: 0.173866309 seconds (45779896 bytes allocated) elapsed time: 0.170267172 seconds (45777032 bytes allocated) elapsed time: 0.171754713 seconds (45774168 bytes allocated) elapsed time: 0.170110142 seconds (45771304 bytes allocated) elapsed time: 0.175199053 seconds (45768440 bytes allocated) elapsed time: 0.179893161 seconds (45765576 bytes allocated) elapsed time: 0.212172824 seconds (45762712 bytes allocated) elapsed time: 0.252750549 seconds (45759848 bytes allocated) elapsed time: 0.254874855 seconds (45756984 bytes allocated) elapsed time: 0.231003319 seconds (45754120 bytes allocated) elapsed time: 0.235060195 seconds (45751256 bytes allocated) elapsed time: 0.235379355 seconds (45748392 bytes allocated) elapsed time: 0.927622743 seconds (45746168 bytes allocated, 77.65% gc time) elapsed time: 0.9132931 seconds (45742664 bytes allocated, 78.35% gc time) But as soon as it starts doing gc the % time spent in increases almost indefinitely, output from time much later: elapsed time: 0.174122929 seconds (36239160 bytes allocated) elapsed time: 18.535572658 seconds (36236168 bytes allocated, 99.22% gc time) elapsed time: 19.189478819 seconds (36233176 bytes allocated, 99.26% gc time) elapsed time: 21.812889439 seconds (36230184 bytes allocated, 99.35% gc time) elapsed time: 22.182467723 seconds (36227192 bytes allocated, 99.30% gc time) elapsed time: 0.16984 seconds (36224200 bytes allocated) The inner loop, despite iterating over fewer and fewer indices has massively increased the gc, and therefore overall, execution time. I have tried many things, including creating the compute function, devectorizing the ratios calculation (which really slowed things down), using view and sub in various places, profiling with --trace-allocation=all but I can't figure out why this happens or how to fix it. Doing gc for 22 seconds obviously kills the performance, and since there are about 22M iterations this is prohibitive. Can anyone suggest something I can try to improve the performance here? Thanks, Adam
Re: [julia-users] Extreme garbage collection time
Silly me, ignoring all the commented out lines assuming they were comments...yes, this is almost certainly it. On Saturday, April 4, 2015 at 3:24:50 AM UTC-5, Tim Holy wrote: Devectorization should never slow anything down. If it does, then you have some other problem. Here, M is a global variable, and that's probably what's killing you. Performance tip #1: http://docs.julialang.org/en/latest/manual/performance-tips/ --Tim On Friday, April 03, 2015 09:43:51 AM Adam Labadorf wrote: Hi, I am struggling with an issue related to garbage collection taking up the vast majority (99%) of compute time on a simple nested for loop. Code excerpt below: # julia version 0.3.7 # counts is an MxN matrix of Float64 # N=15000 # M=108 # h_cols and c_cols are indices \in {1:M} using HypothesisTests, ArrayViews ratios = Array(Float64,M) function compute!(S,tot_i::Int64,i::Int64,j::Int64) ratios = view(counts,:,i)./view(counts,:,j) #for k=1:M # ratios[k] = counts[k,i]/counts[k,j] #end #ratios = counts[:,i]./counts[:,j] t = UnequalVarianceTTest(ratios[h_cols],ratios[c_cols]) S[tot_i] = (pvalue(t),i,j) end for i=1:N-1 @time for j=(i+1):N tot_i += 1 compute!(S,tot_i,i,j) end end The loop begins fast, output from time: elapsed time: 1.023850054 seconds (62027220 bytes allocated) elapsed time: 0.170916977 seconds (45785624 bytes allocated) elapsed time: 0.171598156 seconds (45782760 bytes allocated) elapsed time: 0.173866309 seconds (45779896 bytes allocated) elapsed time: 0.170267172 seconds (45777032 bytes allocated) elapsed time: 0.171754713 seconds (45774168 bytes allocated) elapsed time: 0.170110142 seconds (45771304 bytes allocated) elapsed time: 0.175199053 seconds (45768440 bytes allocated) elapsed time: 0.179893161 seconds (45765576 bytes allocated) elapsed time: 0.212172824 seconds (45762712 bytes allocated) elapsed time: 0.252750549 seconds (45759848 bytes allocated) elapsed time: 0.254874855 seconds (45756984 bytes allocated) elapsed time: 0.231003319 seconds (45754120 bytes allocated) elapsed time: 0.235060195 seconds (45751256 bytes allocated) elapsed time: 0.235379355 seconds (45748392 bytes allocated) elapsed time: 0.927622743 seconds (45746168 bytes allocated, 77.65% gc time) elapsed time: 0.9132931 seconds (45742664 bytes allocated, 78.35% gc time) But as soon as it starts doing gc the % time spent in increases almost indefinitely, output from time much later: elapsed time: 0.174122929 seconds (36239160 bytes allocated) elapsed time: 18.535572658 seconds (36236168 bytes allocated, 99.22% gc time) elapsed time: 19.189478819 seconds (36233176 bytes allocated, 99.26% gc time) elapsed time: 21.812889439 seconds (36230184 bytes allocated, 99.35% gc time) elapsed time: 22.182467723 seconds (36227192 bytes allocated, 99.30% gc time) elapsed time: 0.16984 seconds (36224200 bytes allocated) The inner loop, despite iterating over fewer and fewer indices has massively increased the gc, and therefore overall, execution time. I have tried many things, including creating the compute function, devectorizing the ratios calculation (which really slowed things down), using view and sub in various places, profiling with --trace-allocation=all but I can't figure out why this happens or how to fix it. Doing gc for 22 seconds obviously kills the performance, and since there are about 22M iterations this is prohibitive. Can anyone suggest something I can try to improve the performance here? Thanks, Adam
[julia-users] Extreme garbage collection time
Hi, I am struggling with an issue related to garbage collection taking up the vast majority (99%) of compute time on a simple nested for loop. Code excerpt below: # julia version 0.3.7 # counts is an MxN matrix of Float64 # N=15000 # M=108 # h_cols and c_cols are indices \in {1:M} using HypothesisTests, ArrayViews ratios = Array(Float64,M) function compute!(S,tot_i::Int64,i::Int64,j::Int64) ratios = view(counts,:,i)./view(counts,:,j) #for k=1:M # ratios[k] = counts[k,i]/counts[k,j] #end #ratios = counts[:,i]./counts[:,j] t = UnequalVarianceTTest(ratios[h_cols],ratios[c_cols]) S[tot_i] = (pvalue(t),i,j) end for i=1:N-1 @time for j=(i+1):N tot_i += 1 compute!(S,tot_i,i,j) end end The loop begins fast, output from time: elapsed time: 1.023850054 seconds (62027220 bytes allocated) elapsed time: 0.170916977 seconds (45785624 bytes allocated) elapsed time: 0.171598156 seconds (45782760 bytes allocated) elapsed time: 0.173866309 seconds (45779896 bytes allocated) elapsed time: 0.170267172 seconds (45777032 bytes allocated) elapsed time: 0.171754713 seconds (45774168 bytes allocated) elapsed time: 0.170110142 seconds (45771304 bytes allocated) elapsed time: 0.175199053 seconds (45768440 bytes allocated) elapsed time: 0.179893161 seconds (45765576 bytes allocated) elapsed time: 0.212172824 seconds (45762712 bytes allocated) elapsed time: 0.252750549 seconds (45759848 bytes allocated) elapsed time: 0.254874855 seconds (45756984 bytes allocated) elapsed time: 0.231003319 seconds (45754120 bytes allocated) elapsed time: 0.235060195 seconds (45751256 bytes allocated) elapsed time: 0.235379355 seconds (45748392 bytes allocated) elapsed time: 0.927622743 seconds (45746168 bytes allocated, 77.65% gc time) elapsed time: 0.9132931 seconds (45742664 bytes allocated, 78.35% gc time) But as soon as it starts doing gc the % time spent in increases almost indefinitely, output from time much later: elapsed time: 0.174122929 seconds (36239160 bytes allocated) elapsed time: 18.535572658 seconds (36236168 bytes allocated, 99.22% gc time) elapsed time: 19.189478819 seconds (36233176 bytes allocated, 99.26% gc time) elapsed time: 21.812889439 seconds (36230184 bytes allocated, 99.35% gc time) elapsed time: 22.182467723 seconds (36227192 bytes allocated, 99.30% gc time) elapsed time: 0.16984 seconds (36224200 bytes allocated) The inner loop, despite iterating over fewer and fewer indices has massively increased the gc, and therefore overall, execution time. I have tried many things, including creating the compute function, devectorizing the ratios calculation (which really slowed things down), using view and sub in various places, profiling with --trace-allocation=all but I can't figure out why this happens or how to fix it. Doing gc for 22 seconds obviously kills the performance, and since there are about 22M iterations this is prohibitive. Can anyone suggest something I can try to improve the performance here? Thanks, Adam