If you're using sub or slice, your performance should be vastly better on 0.4 
than 0.3, but as you observed it will still be awful. In the long run (maybe 
even by the time 0.4 is released?), we hope that in such loops sub/slice won't 
actually create a new object and allocate memory---it will be transparently 
elided by the compiler. I think the main roadblock currently is the fact that 
immutables with tuple fields (see the definition of SubArray) currently do not 
"inline" the tuple, instead holding a reference to a heap-allocated object. 
That basically prevents further optimization. For improvements, the main issue 
to watch is https://github.com/JuliaLang/julia/issues/8974.

Manual devectorization or manual immutables like your Point type are currently 
your best bet.

Best,
--Tim

On Sunday, February 01, 2015 06:35:05 AM Kristoffer Carlsson wrote:
> I have two versions of an example function that calculates a number by
> looping over all pair of points. In the first one I use a 2d-array and
> access points with [:,i] syntax to get the coordinates. In the second
> version of the function I instead creates an array of Point-types (each
> Point has a x and y coordinate). I then access the coordinate like point.x,
> point.y etc.
> 
> These two functions takes vastly different time and memory usage.
> 
> This is the first function:
> 
> function slow()
> srand(1234)
> points = randn(2, 5000)
>     n_points::Int = size(points,2)
>     cum = 0.0
>     for i in 1:n_points
>         for j in (i+1):n_points
>             point_2 = points[:, j]
>             cum += point_2[1]
>         end
>   end
>   return cum
> end
> 
> This is the fast version with the Point types:
> 
> immutable Point
>   x::Float64
>   y::Float64
> end
> 
> 
> function fast()
>     srand(1234)
>     points = randn(2, 5000)
>     n_points = size(points, 2)
>     cum= 0.0
> 
> 
>     # Create array of points
>     points_vec = Point[]
>     for i in 1:n_points
>         push!(points_vec, Point( points [1,i], points [2,i]))
>     end
> 
> 
>   for i in 1:n_points
>     for j in (i+1):n_points
>             point_2 =  points_vec[j]
>             cum += point_2.x
>     end
>   end
>   return cum
> end
> 
> 
> Running
> @time println(slow())
> @time println(fast())
> 
> now gives:
> 
> -23952.535945302105
> elapsed time: 0.954317047 seconds (1055 MB allocated, 3.78% gc time in 48
> pauses with 0 full sweep)
> 
> -23952.535945302105
> elapsed time: 0.025171914 seconds (1 MB allocated)
> 
> The slow version takes 50 times longer and consumes 1000x the memory.
> Running the functions with memory tacker gives:
> 
>         -
>         -
>         -
>         - function slow()
>     28688 srand(1234)
>     80048 points = randn(2, 5000)
>         0     n_points::Int = size(points,2)
>         0     cum = 0.0
>         0     for i in 1:n_points
>         0         for j in (i+1):n_points
> 1099780000             point_2 = points[:, j]
>         0             cum += point_2[1]
>         -         end
>         -   end
>         0   return cum
>         - end
>         -
>         -
>         -
>         - immutable Point
>         -   x::Float64
>         -   y::Float64
>         - end
>         -
>         - function fast()
>   2540964     srand(1234)
>     80048     points = randn(2, 5000)
>         0     n_points = size(points, 2)
>         0     cum= 0.0
>         -
>         -     # Create array of points
>        48     points_vec = Point[]
>         0     for i in 1:n_points
>    263112         push!(points_vec, Point( points [1,i], points [2,i]))
>         -     end
>         -
>         0   for i in 1:n_points
>         0     for j in (i+1):n_points
>         0             point_2 =  points_vec[j]
>         0             cum += point_2.x
>         -     end
>         -   end
>         0   return cum
>         - end
>         -
>         -
>         -
>         - @time println(slow())
>         - @time println(fast())
>         -
>         -
> 
> 
> 
> So what seems to take all the memory is
> point_2 = points[:, j]
> 
> Maybe some copying is performed when slicing but I have tried replacing it
> with sub and slice etc (that shouldnt copy?) and it just get worse. Are
> there some alignment issues?
> 
> I have tried both in 0.3.5 and 0.4 with the same results.
> 
> Any help?
> 
> Best regards,
> Kristoffer Carlsson

Reply via email to