Hi Gray thank you very much. Yes, memory allocation is astonishing. This works great.
On Friday, September 5, 2014 6:59:53 AM UTC+2, Gray Calhoun wrote: > > On Wednesday, September 3, 2014 6:46:07 PM UTC-5, Giuseppe Ragusa wrote: >> >> I have been struggling to find a fast and elegant way to implement the >> following algorithm. >> > [...] > >> function block_crossprod!(out, X, cl) >> for j in distinct(cl) >> out += X[find(cl.==j),:]'*X[find(cl.==j),:] >> end >> end >> > > Hi Giuseppe, hope you're doing well! I think that "+=" creates a > temporary array, even though the notation suggests that it's > mutating. The BLAS function "syrk!" does essentially what you > want but without creating the temporary variables[1]. i.e.: > ``` > function block_crossprod2!(out, X, cl) > for j in unique(cl) > Base.BLAS.syrk!('U', 'T', 1., X[cl .== j,:], 1., out) > end > end > ``` > out will only return the upper triangular part, but > Symmetric(out) will make it symmetric. > > If you know more about the indexing scheme, you may be able to > improve things more by replacing X[cl .== j,:] with a subarray: > ``` > function block_crossprod3!(out, X, bstarts, blengths) > for j in 1:length(bstarts) > Base.BLAS.syrk!('U', 'T', 1., > sub(X, range(bstarts[j], blengths[j]),:), 1., out) > end > end > ``` > Other people may have more improvements, or know > how to use subarrays in the general case. Hope this helps! > > A quick profile: > ``` > X = randn(10_000,200) > cl=rep(1:500, 20) > out1 = zeros(200,200) > bstarts = 1:20:10_000 > blengths = rep(20, 500) > > @time block_crossprod!(out1, X, cl) > # elapsed time: 0.611584694 seconds (358190568 bytes allocated, 41.52% gc > time) > > @time block_crossprod2!(out1, X, cl) > # elapsed time: 0.179214852 seconds (19066568 bytes allocated, 20.01% gc > time) > > @time block_crossprod3!(out1, X, bstarts, blengths) > # elapsed time: 0.095740838 seconds (390416 bytes allocated) > ``` > The memory allocation for the last version is kind of astonishing. > > [1]: > http://julia.readthedocs.org/en/latest/stdlib/linalg/#Base.LinAlg.BLAS.syrk > ! >