If you're using sub or slice, your performance should be vastly better on 0.4 than 0.3, but as you observed it will still be awful. In the long run (maybe even by the time 0.4 is released?), we hope that in such loops sub/slice won't actually create a new object and allocate memory---it will be transparently elided by the compiler. I think the main roadblock currently is the fact that immutables with tuple fields (see the definition of SubArray) currently do not "inline" the tuple, instead holding a reference to a heap-allocated object. That basically prevents further optimization. For improvements, the main issue to watch is https://github.com/JuliaLang/julia/issues/8974.
Manual devectorization or manual immutables like your Point type are currently your best bet. Best, --Tim On Sunday, February 01, 2015 06:35:05 AM Kristoffer Carlsson wrote: > I have two versions of an example function that calculates a number by > looping over all pair of points. In the first one I use a 2d-array and > access points with [:,i] syntax to get the coordinates. In the second > version of the function I instead creates an array of Point-types (each > Point has a x and y coordinate). I then access the coordinate like point.x, > point.y etc. > > These two functions takes vastly different time and memory usage. > > This is the first function: > > function slow() > srand(1234) > points = randn(2, 5000) > n_points::Int = size(points,2) > cum = 0.0 > for i in 1:n_points > for j in (i+1):n_points > point_2 = points[:, j] > cum += point_2[1] > end > end > return cum > end > > This is the fast version with the Point types: > > immutable Point > x::Float64 > y::Float64 > end > > > function fast() > srand(1234) > points = randn(2, 5000) > n_points = size(points, 2) > cum= 0.0 > > > # Create array of points > points_vec = Point[] > for i in 1:n_points > push!(points_vec, Point( points [1,i], points [2,i])) > end > > > for i in 1:n_points > for j in (i+1):n_points > point_2 = points_vec[j] > cum += point_2.x > end > end > return cum > end > > > Running > @time println(slow()) > @time println(fast()) > > now gives: > > -23952.535945302105 > elapsed time: 0.954317047 seconds (1055 MB allocated, 3.78% gc time in 48 > pauses with 0 full sweep) > > -23952.535945302105 > elapsed time: 0.025171914 seconds (1 MB allocated) > > The slow version takes 50 times longer and consumes 1000x the memory. > Running the functions with memory tacker gives: > > - > - > - > - function slow() > 28688 srand(1234) > 80048 points = randn(2, 5000) > 0 n_points::Int = size(points,2) > 0 cum = 0.0 > 0 for i in 1:n_points > 0 for j in (i+1):n_points > 1099780000 point_2 = points[:, j] > 0 cum += point_2[1] > - end > - end > 0 return cum > - end > - > - > - > - immutable Point > - x::Float64 > - y::Float64 > - end > - > - function fast() > 2540964 srand(1234) > 80048 points = randn(2, 5000) > 0 n_points = size(points, 2) > 0 cum= 0.0 > - > - # Create array of points > 48 points_vec = Point[] > 0 for i in 1:n_points > 263112 push!(points_vec, Point( points [1,i], points [2,i])) > - end > - > 0 for i in 1:n_points > 0 for j in (i+1):n_points > 0 point_2 = points_vec[j] > 0 cum += point_2.x > - end > - end > 0 return cum > - end > - > - > - > - @time println(slow()) > - @time println(fast()) > - > - > > > > So what seems to take all the memory is > point_2 = points[:, j] > > Maybe some copying is performed when slicing but I have tried replacing it > with sub and slice etc (that shouldnt copy?) and it just get worse. Are > there some alignment issues? > > I have tried both in 0.3.5 and 0.4 with the same results. > > Any help? > > Best regards, > Kristoffer Carlsson