On Wed, Nov 11, 2015 at 1:29 AM, 'Greg Plowman' via julia-users < julia-users@googlegroups.com> wrote:
> I have some naïve and probably stupid questions about writing efficient > code in Julia. > I almost didn't post this because I'm not sure if this is the right place > for asking for this type of conceptual help. I do hope it is appropriate. I > apologise in advance if it's not. > This is exactly the right place. *Splatting/Slurping* > I have read that splatting/slurping incurs a penalty. > Is it the splatting or the slurping or both? > I have also read that this is not the case if the function is inlined. Is > this true? > Splatting/slurping is generally free for small, predictably-sized tuples. For example, doing f(a, (b,c)...) will be exactly equivalent to writing out f(a, b, c). For example: julia> g(a, b, c) = +(a, (b, c)...) g (generic function with 1 method) julia> g(1, 2, 3) 6 julia> @code_llvm g(1, 2, 3) define i64 @julia_g_22684(i64, i64, i64) #0 { top: %3 = add i64 %1, %0 %4 = add i64 %3, %2 ret i64 %4 } The inlining is irrelevant here: if you try this where g calls a function that doesn't get inlined, it will produce the same code whether you do the splatting or change the args to (a, b, c). This is an excessively simple example since it's very easy to see that +(a, (b, c)...) is equivalent to +(a, b, c), but since we specialized function bodies on argument types, which often includes the size and component types of tuple arguments, what type inference is actually reasoning about is often just this simple. What do you really want to avoid is splatting dynamically sized collections as the arguments. For example, you can sum the values in a vector like this: julia> v = rand(1000); julia> @time +(v...) 0.077058 seconds (39.84 k allocations: 1.930 MB) 509.51187339334575 julia> @time +(v...) 0.000131 seconds (2.01 k allocations: 71.016 KB) 509.51187339334575 But this is really bad. Why? Because Julia dispatches function calls on all of a functions arguments. In this case there are 1000 arguments, all of which, in principle are being used for dispatch. Not good. So if you ever find yourself splatting a variably sized collection into a function's arguments, stop and think that there must be a reducer that does this. In this case, it's the sum function: julia> @time sum(v) 0.000005 seconds (5 allocations: 176 bytes) 509.5118733933458 As a bonus, our sum function is also more accurate since it uses a better summation algorithm than the naive left-to-right accumulation approach. *Inlining* > How do I know if a function is inlined? > When is it necessary to explicitly inline? (say with @inline, or > Exrp(:meta, :inline)) > Does this guarantee inlining, or is it just a hint to the compiler? > Is the compiler usually smart enough to inline optimally. > Why wouldn't I explicitly inline all my small functions? > You can tell if something was inlined by looking at the body of the calling method and seeing if there's an explicit method calls or not. For example the output of @code_llvm g(1, 2, 3) above, there are no calls to any + methods – the addition operations code fully inlined. If, on the other hand, you look at @code_llvm g(big(1), big(2), big(3)), you'll see that there are explicit calls to Julia methods with names like @"julia_+_22861". You generally don't want or need to mess with this. The compiler should be and generally is good at figuring out what should or shouldn't be inlined. However, there are some times when a very small method really ought to always be inlined, in which case, you can annotated the method definition with @inline. You can't always force inlining – because it's not always possible – but when it is, this skips the heuristic checks that normally happen and just does it. Beware, this can make things slower and can bloat generated code. > *Overhead of accessing tuples compared to separate variables/arguments* > Is there overhead in accessing tuples vs separate arguments? > Is the expression assigned to x slower than expression assigned to y? > I = (1,2,3) > i1 = 1, i2 = 2, i3 = 3 > x = I[1]+I[2]+I[3] > y = i1+i2+i3 > In general no, there is no overhead. Caveats, the tuples must be reasonably sized (if they are thousands of elements long, this will not be fast), and you should try not to do something very confusing. Type inference specializes function bodies on arguments, but if it gets called with very long tuples or lots of different tuple types, it will bail out and just use generic code, which will be slower. > *"Chained" function calls *(not sure if chained is the right word)It > seems sometimes I have lots of small "convenience" functions, effectively > chained until a final "working" function is called. > A calls B calls C calls D calls ... calls Z. If A, B, C, ... are small > (maybe getters, convenience functions etc), is this still efficient? > Is there any penalty for this? > How does this relate to inlining? > Presumably there is no penalty if and only if functions are inlined? Is > this true? > If so, is there a limit to the number of levels (functions in call chain) > that can be inlined? > Should I be worried about any of this? > Function chaining is cheap. If inlining occurs, it's free. Even if inlining doesn't occur, function calls are very cheap on modern CPUs. Calling functions can be cheaper than inlining, so one is not uniformly better than the other. Function calls are only expensive if dispatch is unpredictable and generic dispatch ends up happening. You can tell this by looking at the LLVM code and looking for jl_apply_generic – if that's being called, that's generic dispatch. If that's happening there may also be type instabilities and other badness.