Re: [julia-users] General help with concepts: splatting/slurping, inlining, tuple access, function chaining
These are definitely not stupid questions. Tuning performance at this level is very hard. I'll just add a few practical notes on micro-benchmarking and introspection. Inspecting and benchmarking splatted/slurped functions can be very tricky. One thing to remember is that `@code_llvm` displays the code for the method returned by `@which`. This means that if you try to look at the generated code for a slurped function definition, you won't see the fully specialized case. Let's modify Stefan's example above slightly: julia> g2(a, bs...) = +(a, bs...) g2 (generic function with 1 method) julia> @code_llvm g2(1,2,3) define %jl_value_t* @julia_g2_22749(%jl_value_t*, %jl_value_t**, i32) { # … long, with a GC frame and tuple unpacking This is simply because we're *not* looking at the code for three-argument `g2`! We're looking at the general variadic case where the number of arguments is unknown: julia> @which g2(1,2,3) g2(a, bs...) at none:1 If, however, we put this into an outer function with a set number of arguments, we can force julia to show us the code as it would compile it when the number of arguments is known: julia> h(a,b,c) = g2(a,b,c) h (generic function with 2 methods) julia> @code_llvm h(1,2,3) define i64 @julia_h_22752(i64, i64, i64) { top: %3 = add i64 %1, %0 %4 = add i64 %3, %2 ret i64 %4 } Finally, benchmarking such tiny things with `@time` can be tricky, which is why looking at the generated LLVM is very helpful. A few of us are working on a Benchmarking package that should give much more stable and robust results (with confidence intervals) at https://github.com/johnmyleswhite/Benchmarks.jl. I'm delinquent on some documentation on how the `@benchmark` macro works, but here's a quick rundown: It only works on function calls, and, unlike `@time`, it evaluates all the arguments first as a setup step. It benchmarks the function by calling it repeatedly with the same arguments over and over until it has a good estimate for the performance, taking care to ensure that LLVM doesn't optimize the benchmarking loop away or do unwanted constant-folding operations. Matt On Wednesday, November 11, 2015 at 9:40:28 AM UTC-5, Stefan Karpinski wrote: > > On Wed, Nov 11, 2015 at 1:29 AM, 'Greg Plowman' via julia-users < > julia...@googlegroups.com > wrote: > >> I have some naïve and probably stupid questions about writing efficient >> code in Julia. >> I almost didn't post this because I'm not sure if this is the right place >> for asking for this type of conceptual help. I do hope it is appropriate. I >> apologise in advance if it's not. >> > > This is exactly the right place. > > *Splatting/Slurping* >> I have read that splatting/slurping incurs a penalty. >> Is it the splatting or the slurping or both? >> I have also read that this is not the case if the function is inlined. Is >> this true? >> > > Splatting/slurping is generally free for small, predictably-sized tuples. > For example, doing f(a, (b,c)...) will be exactly equivalent to writing out > f(a, b, c). For example: > > julia> g(a, b, c) = +(a, (b, c)...) > g (generic function with 1 method) > > julia> g(1, 2, 3) > 6 > > julia> @code_llvm g(1, 2, 3) > > define i64 @julia_g_22684(i64, i64, i64) #0 { > top: > %3 = add i64 %1, %0 > %4 = add i64 %3, %2 > ret i64 %4 > } > > > The inlining is irrelevant here: if you try this where g calls a function > that doesn't get inlined, it will produce the same code whether you do the > splatting or change the args to (a, b, c). This is an excessively simple > example since it's very easy to see that +(a, (b, c)...) is equivalent to > +(a, b, c), but since we specialized function bodies on argument types, > which often includes the size and component types of tuple arguments, what > type inference is actually reasoning about is often just this simple. > > What do you really want to avoid is splatting dynamically sized > collections as the arguments. For example, you can sum the values in a > vector like this: > > julia> v = rand(1000); > > julia> @time +(v...) > 0.077058 seconds (39.84 k allocations: 1.930 MB) > 509.51187339334575 > > julia> @time +(v...) > 0.000131 seconds (2.01 k allocations: 71.016 KB) > 509.51187339334575 > > > But this is really bad. Why? Because Julia dispatches function calls on > all of a functions arguments. In this case there are 1000 arguments, all of > which, in principle are being used for dispatch. Not good. So if you ever > find yourself splatting a variably sized collection into a function's > arguments, stop and think that there must be a reducer that does this. In > this case, it's the sum function: > > julia> @time sum(v) > 0.05 seconds (5 allocations: 176 bytes) > 509.5118733933458 > > > As a bonus, our sum function is also more accurate since it uses a better > summation algorithm than the naive left-to-right accumulation approach. > > *Inlining* >> How do I know if a function is inlined? >> Whe
Re: [julia-users] General help with concepts: splatting/slurping, inlining, tuple access, function chaining
On Wed, Nov 11, 2015 at 1:29 AM, 'Greg Plowman' via julia-users < julia-users@googlegroups.com> wrote: > I have some naïve and probably stupid questions about writing efficient > code in Julia. > I almost didn't post this because I'm not sure if this is the right place > for asking for this type of conceptual help. I do hope it is appropriate. I > apologise in advance if it's not. > This is exactly the right place. *Splatting/Slurping* > I have read that splatting/slurping incurs a penalty. > Is it the splatting or the slurping or both? > I have also read that this is not the case if the function is inlined. Is > this true? > Splatting/slurping is generally free for small, predictably-sized tuples. For example, doing f(a, (b,c)...) will be exactly equivalent to writing out f(a, b, c). For example: julia> g(a, b, c) = +(a, (b, c)...) g (generic function with 1 method) julia> g(1, 2, 3) 6 julia> @code_llvm g(1, 2, 3) define i64 @julia_g_22684(i64, i64, i64) #0 { top: %3 = add i64 %1, %0 %4 = add i64 %3, %2 ret i64 %4 } The inlining is irrelevant here: if you try this where g calls a function that doesn't get inlined, it will produce the same code whether you do the splatting or change the args to (a, b, c). This is an excessively simple example since it's very easy to see that +(a, (b, c)...) is equivalent to +(a, b, c), but since we specialized function bodies on argument types, which often includes the size and component types of tuple arguments, what type inference is actually reasoning about is often just this simple. What do you really want to avoid is splatting dynamically sized collections as the arguments. For example, you can sum the values in a vector like this: julia> v = rand(1000); julia> @time +(v...) 0.077058 seconds (39.84 k allocations: 1.930 MB) 509.51187339334575 julia> @time +(v...) 0.000131 seconds (2.01 k allocations: 71.016 KB) 509.51187339334575 But this is really bad. Why? Because Julia dispatches function calls on all of a functions arguments. In this case there are 1000 arguments, all of which, in principle are being used for dispatch. Not good. So if you ever find yourself splatting a variably sized collection into a function's arguments, stop and think that there must be a reducer that does this. In this case, it's the sum function: julia> @time sum(v) 0.05 seconds (5 allocations: 176 bytes) 509.5118733933458 As a bonus, our sum function is also more accurate since it uses a better summation algorithm than the naive left-to-right accumulation approach. *Inlining* > How do I know if a function is inlined? > When is it necessary to explicitly inline? (say with @inline, or > Exrp(:meta, :inline)) > Does this guarantee inlining, or is it just a hint to the compiler? > Is the compiler usually smart enough to inline optimally. > Why wouldn't I explicitly inline all my small functions? > You can tell if something was inlined by looking at the body of the calling method and seeing if there's an explicit method calls or not. For example the output of @code_llvm g(1, 2, 3) above, there are no calls to any + methods – the addition operations code fully inlined. If, on the other hand, you look at @code_llvm g(big(1), big(2), big(3)), you'll see that there are explicit calls to Julia methods with names like @"julia_+_22861". You generally don't want or need to mess with this. The compiler should be and generally is good at figuring out what should or shouldn't be inlined. However, there are some times when a very small method really ought to always be inlined, in which case, you can annotated the method definition with @inline. You can't always force inlining – because it's not always possible – but when it is, this skips the heuristic checks that normally happen and just does it. Beware, this can make things slower and can bloat generated code. > *Overhead of accessing tuples compared to separate variables/arguments* > Is there overhead in accessing tuples vs separate arguments? > Is the expression assigned to x slower than expression assigned to y? > I = (1,2,3) > i1 = 1, i2 = 2, i3 = 3 > x = I[1]+I[2]+I[3] > y = i1+i2+i3 > In general no, there is no overhead. Caveats, the tuples must be reasonably sized (if they are thousands of elements long, this will not be fast), and you should try not to do something very confusing. Type inference specializes function bodies on arguments, but if it gets called with very long tuples or lots of different tuple types, it will bail out and just use generic code, which will be slower. > *"Chained" function calls *(not sure if chained is the right word)It > seems sometimes I have lots of small "convenience" functions, effectively > chained until a final "working" function is called. > A calls B calls C calls D calls ... calls Z. If A, B, C, ... are small > (maybe getters, convenience functions etc), is this still efficient? > Is there any penalty for this? > How does this relate to inlining? > Presumably there is n
[julia-users] General help with concepts: splatting/slurping, inlining, tuple access, function chaining
I have some naïve and probably stupid questions about writing efficient code in Julia. I almost didn't post this because I'm not sure if this is the right place for asking for this type of conceptual help. I do hope it is appropriate. I apologise in advance if it's not. I have been trying to benchmark some code (using @time) but I'm coming up against several concepts I would like to understand better, and will no doubt help to better guide my hitherto "black box" approach. These concepts are related in what I'm trying to understand, but understanding them separately might also help me. - splatting/slurping - inlining - tuple access - function chaining Not really sure how to present my questions. Maybe first read through to *Separate Arguments vs Tuple* where all concepts seem to come into play and there are more direct questions. *Splatting/Slurping* I have read that splatting/slurping incurs a penalty. Is it the splatting or the slurping or both? I have also read that this is not the case if the function is inlined. Is this true? *Inlining* How do I know if a function is inlined? When is it necessary to explicitly inline? (say with @inline, or Exrp(:meta, :inline)) Does this guarantee inlining, or is it just a hint to the compiler? Is the compiler usually smart enough to inline optimally. Why wouldn't I explicitly inline all my small functions? *Overhead of accessing tuples compared to separate variables/arguments* Is there overhead in accessing tuples vs separate arguments? Is the expression assigned to x slower than expression assigned to y? I = (1,2,3) i1 = 1, i2 = 2, i3 = 3 x = I[1]+I[2]+I[3] y = i1+i2+i3 *"Chained" function calls *(not sure if chained is the right word) It seems sometimes I have lots of small "convenience" functions, effectively chained until a final "working" function is called. A calls B calls C calls D calls ... calls Z. If A, B, C, ... are small (maybe getters, convenience functions etc), is this still efficient? Is there any penalty for this? How does this relate to inlining? Presumably there is no penalty if and only if functions are inlined? Is this true? If so, is there a limit to the number of levels (functions in call chain) that can be inlined? Should I be worried about any of this? So I guess all this comes together when trying to understand the following: *Separate Arguments vs Tuple* I have seen code where there are two forms of a function, one accepts a tuple, the other accepts separate arguments. foo(I::Tuple{Vararg{Integer}) = (some code here) -OR- foo{N}(I::NTuple{N, Integer}) = (some code here) foo(I::Integer...) = foo(I) Is calling foo((1,2,3)) more efficient than calling foo(1,2,3)? It seems foo(1,2,3) requires a slurp and then calls foo((1,2,3)) anyway. Is there overhead in the slurp? Is there overhead in the extra call, or will inlining remove the overhead? If so, how do I know if it will be inlined automatically? Do I need to explicitly @inline or equivalent? Does defining foo as a function of a tuple incur tuple access overhead? Would it be faster to define a "primary" function with separate arguments and a convenience function with tuple argument? bar(i1:Integer, i2::Integer, i3::Integer) = (some code here) bar(I::NTuple{3,Integer) = foo(I...) Is bar(i1,i2,i3) faster than foo(I) (simply because foo requires tuple access I[1], I[2],I[3]) Does calling bar(I) incur splatting overhead? Is there overhead in the extra call, or will inlining remove the overhead? - Greg