These are definitely not stupid questions. Tuning performance at this level is very hard. I'll just add a few practical notes on micro-benchmarking and introspection. Inspecting and benchmarking splatted/slurped functions can be very tricky. One thing to remember is that `@code_llvm` displays the code for the method returned by `@which`. This means that if you try to look at the generated code for a slurped function definition, you won't see the fully specialized case. Let's modify Stefan's example above slightly:
julia> g2(a, bs...) = +(a, bs...) g2 (generic function with 1 method) julia> @code_llvm g2(1,2,3) define %jl_value_t* @julia_g2_22749(%jl_value_t*, %jl_value_t**, i32) { # … long, with a GC frame and tuple unpacking This is simply because we're *not* looking at the code for three-argument `g2`! We're looking at the general variadic case where the number of arguments is unknown: julia> @which g2(1,2,3) g2(a, bs...) at none:1 If, however, we put this into an outer function with a set number of arguments, we can force julia to show us the code as it would compile it when the number of arguments is known: julia> h(a,b,c) = g2(a,b,c) h (generic function with 2 methods) julia> @code_llvm h(1,2,3) define i64 @julia_h_22752(i64, i64, i64) { top: %3 = add i64 %1, %0 %4 = add i64 %3, %2 ret i64 %4 } Finally, benchmarking such tiny things with `@time` can be tricky, which is why looking at the generated LLVM is very helpful. A few of us are working on a Benchmarking package that should give much more stable and robust results (with confidence intervals) at https://github.com/johnmyleswhite/Benchmarks.jl. I'm delinquent on some documentation on how the `@benchmark` macro works, but here's a quick rundown: It only works on function calls, and, unlike `@time`, it evaluates all the arguments first as a setup step. It benchmarks the function by calling it repeatedly with the same arguments over and over until it has a good estimate for the performance, taking care to ensure that LLVM doesn't optimize the benchmarking loop away or do unwanted constant-folding operations. Matt On Wednesday, November 11, 2015 at 9:40:28 AM UTC-5, Stefan Karpinski wrote: > > On Wed, Nov 11, 2015 at 1:29 AM, 'Greg Plowman' via julia-users < > julia...@googlegroups.com <javascript:>> wrote: > >> I have some naïve and probably stupid questions about writing efficient >> code in Julia. >> I almost didn't post this because I'm not sure if this is the right place >> for asking for this type of conceptual help. I do hope it is appropriate. I >> apologise in advance if it's not. >> > > This is exactly the right place. > > *Splatting/Slurping* >> I have read that splatting/slurping incurs a penalty. >> Is it the splatting or the slurping or both? >> I have also read that this is not the case if the function is inlined. Is >> this true? >> > > Splatting/slurping is generally free for small, predictably-sized tuples. > For example, doing f(a, (b,c)...) will be exactly equivalent to writing out > f(a, b, c). For example: > > julia> g(a, b, c) = +(a, (b, c)...) > g (generic function with 1 method) > > julia> g(1, 2, 3) > 6 > > julia> @code_llvm g(1, 2, 3) > > define i64 @julia_g_22684(i64, i64, i64) #0 { > top: > %3 = add i64 %1, %0 > %4 = add i64 %3, %2 > ret i64 %4 > } > > > The inlining is irrelevant here: if you try this where g calls a function > that doesn't get inlined, it will produce the same code whether you do the > splatting or change the args to (a, b, c). This is an excessively simple > example since it's very easy to see that +(a, (b, c)...) is equivalent to > +(a, b, c), but since we specialized function bodies on argument types, > which often includes the size and component types of tuple arguments, what > type inference is actually reasoning about is often just this simple. > > What do you really want to avoid is splatting dynamically sized > collections as the arguments. For example, you can sum the values in a > vector like this: > > julia> v = rand(1000); > > julia> @time +(v...) > 0.077058 seconds (39.84 k allocations: 1.930 MB) > 509.51187339334575 > > julia> @time +(v...) > 0.000131 seconds (2.01 k allocations: 71.016 KB) > 509.51187339334575 > > > But this is really bad. Why? Because Julia dispatches function calls on > all of a functions arguments. In this case there are 1000 arguments, all of > which, in principle are being used for dispatch. Not good. So if you ever > find yourself splatting a variably sized collection into a function's > arguments, stop and think that there must be a reducer that does this. In > this case, it's the sum function: > > julia> @time sum(v) > 0.000005 seconds (5 allocations: 176 bytes) > 509.5118733933458 > > > As a bonus, our sum function is also more accurate since it uses a better > summation algorithm than the naive left-to-right accumulation approach. > > *Inlining* >> How do I know if a function is inlined? >> When is it necessary to explicitly inline? (say with @inline, or >> Exrp(:meta, :inline)) >> Does this guarantee inlining, or is it just a hint to the compiler? >> Is the compiler usually smart enough to inline optimally. >> Why wouldn't I explicitly inline all my small functions? >> > > You can tell if something was inlined by looking at the body of the > calling method and seeing if there's an explicit method calls or not. For > example the output of @code_llvm g(1, 2, 3) above, there are no calls to > any + methods – the addition operations code fully inlined. If, on the > other hand, you look at @code_llvm g(big(1), big(2), big(3)), you'll see > that there are explicit calls to Julia methods with names > like @"julia_+_22861". > > You generally don't want or need to mess with this. The compiler should be > and generally is good at figuring out what should or shouldn't be inlined. > However, there are some times when a very small method really ought to > always be inlined, in which case, you can annotated the method definition > with @inline. You can't always force inlining – because it's not always > possible – but when it is, this skips the heuristic checks that normally > happen and just does it. Beware, this can make things slower and can bloat > generated code. > > >> *Overhead of accessing tuples compared to separate variables/arguments* >> Is there overhead in accessing tuples vs separate arguments? >> Is the expression assigned to x slower than expression assigned to y? >> I = (1,2,3) >> i1 = 1, i2 = 2, i3 = 3 >> x = I[1]+I[2]+I[3] >> y = i1+i2+i3 >> > > In general no, there is no overhead. Caveats, the tuples must be > reasonably sized (if they are thousands of elements long, this will not be > fast), and you should try not to do something very confusing. Type > inference specializes function bodies on arguments, but if it gets called > with very long tuples or lots of different tuple types, it will bail out > and just use generic code, which will be slower. > > >> *"Chained" function calls *(not sure if chained is the right word)It >> seems sometimes I have lots of small "convenience" functions, effectively >> chained until a final "working" function is called. >> A calls B calls C calls D calls ... calls Z. If A, B, C, ... are small >> (maybe getters, convenience functions etc), is this still efficient? >> Is there any penalty for this? >> How does this relate to inlining? >> Presumably there is no penalty if and only if functions are inlined? Is >> this true? >> If so, is there a limit to the number of levels (functions in call chain) >> that can be inlined? >> Should I be worried about any of this? >> > > Function chaining is cheap. If inlining occurs, it's free. Even if > inlining doesn't occur, function calls are very cheap on modern CPUs. > Calling functions can be cheaper than inlining, so one is not uniformly > better than the other. Function calls are only expensive if dispatch is > unpredictable and generic dispatch ends up happening. You can tell this by > looking at the LLVM code and looking for jl_apply_generic – if that's > being called, that's generic dispatch. If that's happening there may also > be type instabilities and other badness. >