Their size seems much decreased. I’d imagine to totally avoid allocation in 
this benchmark requires an optimization that really has nothing to do with 
subarrays per se. You’d have to do an escape analysis and see that Aj never 
left sumcols. Not easy in practice, since it’s passed to slice and length, and 
you’d have to make sure they didn’t squirrel it away or pass it on to someone 
else. Then you could stack allocate it, or even destructure it into a bunch of 
scalar mutations on the stack. After eliminating dead code, you’d end up with a 
no-allocation loop much like you’d write by hand. This sort of optimization 
seems to be quite tricky for compilers to pull off, but it’s a common pattern 
in numerical code. 

In Julia is such cleverness left entirely to LLVM, or are there optimization 
passes in Julia itself?
On April 19, 2015 at 6:49:21 AM, Tim Holy (tim.h...@gmail.com) wrote:

Sorry to be slow to chime in here, but the tuple overhaul has landed and they  
are still not zero-cost:  

function sumcols(A)  
s = 0.0  
for j = 1:size(A,2)  
Aj = slice(A, :, j)  
for i = 1:length(Aj)  
s += Aj[i]  
end  
end  
s  
end  

Even in the latest 0.4, this still allocates memory. On the other hand, while  
SubArrays allocate nearly 2x more memory than ArrayViews, the speed of the two  
(replacing `slice` with `view` above) is, for me, nearly identical.  

--Tim  


On Friday, April 17, 2015 08:30:27 PM Sebastian Good wrote:  
> This was discussed a few weeks ago  
>  
> https://groups.google.com/d/msg/julia-users/IxrvV8ABZoQ/uWZu5-IB3McJ  
>  
> I think the bottom line is that the current implementation *should* be  
> 'zero-cost' once a set of planned improvements and optimizations take  
> place. One of the key ones is a tuple overhaul.  
>  
> Fair to say it can never be 'zero' cost since there is different inherent  
> overhead depending on the type of subarray, e.g. offset, slice,  
> re-dimension, etc. however the implementation is quite clever about  
> allowing specialization of those.  
>  
> In a common case (e.g. a constant offset or simple stride) my understanding  
> is that the structure will be type-specialized and likely stack allocated  
> in many cases, reducing to what you'd write by hand. At least this is what  
> they're after.  
>  
> On Friday, April 17, 2015 at 4:24:14 PM UTC-4, Peter Brady wrote:  
> > Thanks for the links. I'll check out ArrayViews as it looks like what I  
> > was going to do manually without wrapping it in a type.  
> >  
> > By semi-dim agnostic I meant that the differencing algorithm itself only  
> > cares about one dimension but that dimension is different for different  
> > directions. Only a few toplevel routines actually need to know about the  
> > dimensionality of the problem.  
> >  
> > On Friday, April 17, 2015 at 2:04:39 PM UTC-6, René Donner wrote:  
> >> As far as I have measured it sub in 0.4 is still not cheap, as it  
> >> provides the flexibility to deal with all kinds of strides and offsets,  
> >> and  
> >> the view object itself thus has a certain size. See  
> >> https://github.com/rened/FunctionalData.jl#efficiency for a simple  
> >> analysis, where the speed is mostly dominated by the speed of the  
> >> "sub-view" mechanism.  
> >>  
> >> To get faster views which require strides etc you can try  
> >> https://github.com/JuliaLang/ArrayViews.jl  
> >>  
> >> What do you mean by semi-dim agnostic? In case you only need indexing  
> >> along the last dimension (like a[:,:,i] and a[:,:,:,i]) you can use  
> >>  
> >> https://github.com/rened/FunctionalData.jl#efficient-views-details  
> >>  
> >> which uses normal DenseArrays and simple pointer updates internally. It  
> >> can also update a view in-place, by just incrementing the pointer.  
> >>  
> >> Am 17.04.2015 um 21:48 schrieb Peter Brady <peter...@gmail.com>:  
> >> > Inorder to write some differencing algorithms in a semi-dimensional  
> >>  
> >> agnostic manner the code I've written makes heavy use of subarrays which  
> >> turn out to be rather costly. I've noticed some posts on the cost of  
> >> subarrays here and that things will be better in 0.4. Can someone  
> >> comment  
> >> on how much better? Would subarray (or anything like it) be on par with  
> >> simply passing an offset and stride (constant) and computing the index  
> >> myself? I'm currently using the 0.3 release branch.  

Reply via email to