Re: [julia-users] General help with concepts: splatting/slurping, inlining, tuple access, function chaining

2015-11-11 Thread Matt Bauman
These are definitely not stupid questions.  Tuning performance at this 
level is very hard.  I'll just add a few practical notes on 
micro-benchmarking and introspection.  Inspecting and benchmarking 
splatted/slurped functions can be very tricky.  One thing to remember is 
that `@code_llvm` displays the code for the method returned by `@which`. 
 This means that if you try to look at the generated code for a slurped 
function definition, you won't see the fully specialized case.  Let's 
modify Stefan's example above slightly:

julia> g2(a, bs...) = +(a, bs...)
g2 (generic function with 1 method)

julia> @code_llvm g2(1,2,3)
define %jl_value_t* @julia_g2_22749(%jl_value_t*, %jl_value_t**, i32) {
# … long, with a GC frame and tuple unpacking

This is simply because we're *not* looking at the code for three-argument 
`g2`!  We're looking at the general variadic case where the number of 
arguments is unknown:

julia> @which g2(1,2,3)
g2(a, bs...) at none:1

If, however, we put this into an outer function with a set number of 
arguments, we can force julia to show us the code as it would compile it 
when the number of arguments is known:

julia> h(a,b,c) = g2(a,b,c)
h (generic function with 2 methods)

julia> @code_llvm h(1,2,3)
define i64 @julia_h_22752(i64, i64, i64) {
top:
  %3 = add i64 %1, %0
  %4 = add i64 %3, %2
  ret i64 %4
}

Finally, benchmarking such tiny things with `@time` can be tricky, which is 
why looking at the generated LLVM is very helpful.  A few of us are working 
on a Benchmarking package that should give much more stable and robust 
results (with confidence intervals) at 
https://github.com/johnmyleswhite/Benchmarks.jl.  I'm delinquent on some 
documentation on how the `@benchmark` macro works, but here's a quick 
rundown: It only works on function calls, and, unlike `@time`, it evaluates 
all the arguments first as a setup step. It benchmarks the function by 
calling it repeatedly with the same arguments over and over until it has a 
good estimate for the performance, taking care to ensure that LLVM doesn't 
optimize the benchmarking loop away or do unwanted constant-folding 
operations.

Matt

On Wednesday, November 11, 2015 at 9:40:28 AM UTC-5, Stefan Karpinski wrote:
>
> On Wed, Nov 11, 2015 at 1:29 AM, 'Greg Plowman' via julia-users <
> julia...@googlegroups.com > wrote:
>
>> I have some naïve and probably stupid questions about writing efficient 
>> code in Julia.
>> I almost didn't post this because I'm not sure if this is the right place 
>> for asking for this type of conceptual help. I do hope it is appropriate. I 
>> apologise in advance if it's not.
>>
>
> This is exactly the right place.
>
> *Splatting/Slurping*
>> I have read that splatting/slurping incurs a penalty. 
>> Is it the splatting or the slurping or both?
>> I have also read that this is not the case if the function is inlined. Is 
>> this true?
>>
>
> Splatting/slurping is generally free for small, predictably-sized tuples. 
> For example, doing f(a, (b,c)...) will be exactly equivalent to writing out 
> f(a, b, c). For example:
>
> julia> g(a, b, c) = +(a, (b, c)...)
> g (generic function with 1 method)
>
> julia> g(1, 2, 3)
> 6
>
> julia> @code_llvm g(1, 2, 3)
>
> define i64 @julia_g_22684(i64, i64, i64) #0 {
> top:
>   %3 = add i64 %1, %0
>   %4 = add i64 %3, %2
>   ret i64 %4
> }
>
>
> The inlining is irrelevant here: if you try this where g calls a function 
> that doesn't get inlined, it will produce the same code whether you do the 
> splatting or change the args to (a, b, c). This is an excessively simple 
> example since it's very easy to see that +(a, (b, c)...) is equivalent to 
> +(a, b, c), but since we specialized function bodies on argument types, 
> which often includes the size and component types of tuple arguments, what 
> type inference is actually reasoning about is often just this simple.
>
> What do you really want to avoid is splatting dynamically sized 
> collections as the arguments. For example, you can sum the values in a 
> vector like this:
>
> julia> v = rand(1000);
>
> julia> @time +(v...)
>   0.077058 seconds (39.84 k allocations: 1.930 MB)
> 509.51187339334575
>
> julia> @time +(v...)
>   0.000131 seconds (2.01 k allocations: 71.016 KB)
> 509.51187339334575
>
>
> But this is really bad. Why? Because Julia dispatches function calls on 
> all of a functions arguments. In this case there are 1000 arguments, all of 
> which, in principle are being used for dispatch. Not good. So if you ever 
> find yourself splatting a variably sized collection into a function's 
> arguments, stop and think that there must be a reducer that does this. In 
> this case, it's the sum function:
>
> julia> @time sum(v)
>   0.05 seconds (5 allocations: 176 bytes)
> 509.5118733933458
>
>  
> As a bonus, our sum function is also more accurate since it uses a better 
> summation algorithm than the naive left-to-right accumulation approach.
>
> *Inlining*
>> How do I know if a function is inlined? 
>> Whe

Re: [julia-users] General help with concepts: splatting/slurping, inlining, tuple access, function chaining

2015-11-11 Thread Stefan Karpinski
On Wed, Nov 11, 2015 at 1:29 AM, 'Greg Plowman' via julia-users <
julia-users@googlegroups.com> wrote:

> I have some naïve and probably stupid questions about writing efficient
> code in Julia.
> I almost didn't post this because I'm not sure if this is the right place
> for asking for this type of conceptual help. I do hope it is appropriate. I
> apologise in advance if it's not.
>

This is exactly the right place.

*Splatting/Slurping*
> I have read that splatting/slurping incurs a penalty.
> Is it the splatting or the slurping or both?
> I have also read that this is not the case if the function is inlined. Is
> this true?
>

Splatting/slurping is generally free for small, predictably-sized tuples.
For example, doing f(a, (b,c)...) will be exactly equivalent to writing out
f(a, b, c). For example:

julia> g(a, b, c) = +(a, (b, c)...)
g (generic function with 1 method)

julia> g(1, 2, 3)
6

julia> @code_llvm g(1, 2, 3)

define i64 @julia_g_22684(i64, i64, i64) #0 {
top:
  %3 = add i64 %1, %0
  %4 = add i64 %3, %2
  ret i64 %4
}


The inlining is irrelevant here: if you try this where g calls a function
that doesn't get inlined, it will produce the same code whether you do the
splatting or change the args to (a, b, c). This is an excessively simple
example since it's very easy to see that +(a, (b, c)...) is equivalent to
+(a, b, c), but since we specialized function bodies on argument types,
which often includes the size and component types of tuple arguments, what
type inference is actually reasoning about is often just this simple.

What do you really want to avoid is splatting dynamically sized collections
as the arguments. For example, you can sum the values in a vector like this:

julia> v = rand(1000);

julia> @time +(v...)
  0.077058 seconds (39.84 k allocations: 1.930 MB)
509.51187339334575

julia> @time +(v...)
  0.000131 seconds (2.01 k allocations: 71.016 KB)
509.51187339334575


But this is really bad. Why? Because Julia dispatches function calls on all
of a functions arguments. In this case there are 1000 arguments, all of
which, in principle are being used for dispatch. Not good. So if you ever
find yourself splatting a variably sized collection into a function's
arguments, stop and think that there must be a reducer that does this. In
this case, it's the sum function:

julia> @time sum(v)
  0.05 seconds (5 allocations: 176 bytes)
509.5118733933458


As a bonus, our sum function is also more accurate since it uses a better
summation algorithm than the naive left-to-right accumulation approach.

*Inlining*
> How do I know if a function is inlined?
> When is it necessary to explicitly inline? (say with @inline, or
> Exrp(:meta, :inline))
> Does this guarantee inlining, or is it just a hint to the compiler?
> Is the compiler usually smart enough to inline optimally.
> Why wouldn't I explicitly inline all my small functions?
>

You can tell if something was inlined by looking at the body of the calling
method and seeing if there's an explicit method calls or not. For example
the output of @code_llvm g(1, 2, 3) above, there are no calls to any +
methods – the addition operations code fully inlined. If, on the other
hand, you look at @code_llvm g(big(1), big(2), big(3)), you'll see that
there are explicit calls to Julia methods with names like @"julia_+_22861".

You generally don't want or need to mess with this. The compiler should be
and generally is good at figuring out what should or shouldn't be inlined.
However, there are some times when a very small method really ought to
always be inlined, in which case, you can annotated the method definition
with @inline. You can't always force inlining – because it's not always
possible – but when it is, this skips the heuristic checks that normally
happen and just does it. Beware, this can make things slower and can bloat
generated code.


> *Overhead of accessing tuples compared to separate variables/arguments*
> Is there overhead in accessing tuples vs separate arguments?
> Is the expression assigned to x slower than expression assigned to y?
> I = (1,2,3)
> i1 = 1, i2 = 2, i3 = 3
> x = I[1]+I[2]+I[3]
> y = i1+i2+i3
>

In general no, there is no overhead. Caveats, the tuples must be reasonably
sized (if they are thousands of elements long, this will not be fast), and
you should try not to do something very confusing. Type inference
specializes function bodies on arguments, but if it gets called with very
long tuples or lots of different tuple types, it will bail out and just use
generic code, which will be slower.


> *"Chained" function calls *(not sure if chained is the right word)It
> seems sometimes I have lots of small "convenience" functions, effectively
> chained until a final "working" function is called.
> A calls B calls C calls D calls ... calls Z. If A, B, C, ... are small
> (maybe getters, convenience functions etc), is this still efficient?
> Is there any penalty for this?
> How does this relate to inlining?
> Presumably there is n

[julia-users] General help with concepts: splatting/slurping, inlining, tuple access, function chaining

2015-11-10 Thread 'Greg Plowman' via julia-users
I have some naïve and probably stupid questions about writing efficient 
code in Julia.
I almost didn't post this because I'm not sure if this is the right place 
for asking for this type of conceptual help. I do hope it is appropriate. I 
apologise in advance if it's not.

I have been trying to benchmark some code (using @time) but I'm coming up 
against several concepts I would like to understand better, and will no 
doubt help to better guide my hitherto "black box" approach.
These concepts are related in what I'm trying to understand, but 
understanding them separately might also help me.

   - splatting/slurping
   - inlining
   - tuple access
   - function chaining

Not really sure how to present my questions. 
Maybe first read through to *Separate Arguments vs Tuple* where all 
concepts seem to come into play and there are more direct questions.


*Splatting/Slurping*
I have read that splatting/slurping incurs a penalty. 
Is it the splatting or the slurping or both?
I have also read that this is not the case if the function is inlined. Is 
this true?

*Inlining*
How do I know if a function is inlined? 
When is it necessary to explicitly inline? (say with @inline, or 
Exrp(:meta, :inline)) 
Does this guarantee inlining, or is it just a hint to the compiler?
Is the compiler usually smart enough to inline optimally.
Why wouldn't I explicitly inline all my small functions?

*Overhead of accessing tuples compared to separate variables/arguments*
Is there overhead in accessing tuples vs separate arguments?
Is the expression assigned to x slower than expression assigned to y?
I = (1,2,3)
i1 = 1, i2 = 2, i3 = 3
x = I[1]+I[2]+I[3]
y = i1+i2+i3


*"Chained" function calls *(not sure if chained is the right word)
It seems sometimes I have lots of small "convenience" functions, 
effectively chained until a final "working" function is called.
A calls B calls C calls D calls ... calls Z. If A, B, C, ... are small 
(maybe getters, convenience functions etc), is this still efficient?
Is there any penalty for this?
How does this relate to inlining?
Presumably there is no penalty if and only if functions are inlined? Is 
this true?
If so, is there a limit to the number of levels (functions in call chain) 
that can be inlined?
Should I be worried about any of this?


So I guess all this comes together when trying to understand the following:

*Separate Arguments vs Tuple*
I have seen code where there are two forms of a function, one accepts a 
tuple, the other accepts separate arguments.

foo(I::Tuple{Vararg{Integer}) = (some code here) -OR- foo{N}(I::NTuple{N,
Integer}) = (some code here)
foo(I::Integer...) = foo(I)

Is calling foo((1,2,3)) more efficient than calling foo(1,2,3)?
It seems foo(1,2,3) requires a slurp and then calls foo((1,2,3)) anyway. 
Is there overhead in the slurp?
Is there overhead in the extra call, or will inlining remove the overhead?
If so, how do I know if it will be inlined automatically? Do I need to 
explicitly @inline or equivalent? 

Does defining foo as a function of a tuple incur tuple access overhead? 
Would it be faster to define a "primary" function with separate arguments 
and a convenience function with tuple argument?

bar(i1:Integer, i2::Integer, i3::Integer) = (some code here)
bar(I::NTuple{3,Integer) = foo(I...)

Is bar(i1,i2,i3) faster than foo(I) (simply because foo requires tuple 
access I[1], I[2],I[3])
Does calling bar(I) incur splatting overhead?
Is there overhead in the extra call, or will inlining remove the overhead?

- Greg