On Wed, Nov 11, 2015 at 1:29 AM, 'Greg Plowman' via julia-users <
julia-users@googlegroups.com> wrote:

> I have some naïve and probably stupid questions about writing efficient
> code in Julia.
> I almost didn't post this because I'm not sure if this is the right place
> for asking for this type of conceptual help. I do hope it is appropriate. I
> apologise in advance if it's not.
>

This is exactly the right place.

*Splatting/Slurping*
> I have read that splatting/slurping incurs a penalty.
> Is it the splatting or the slurping or both?
> I have also read that this is not the case if the function is inlined. Is
> this true?
>

Splatting/slurping is generally free for small, predictably-sized tuples.
For example, doing f(a, (b,c)...) will be exactly equivalent to writing out
f(a, b, c). For example:

julia> g(a, b, c) = +(a, (b, c)...)
g (generic function with 1 method)

julia> g(1, 2, 3)
6

julia> @code_llvm g(1, 2, 3)

define i64 @julia_g_22684(i64, i64, i64) #0 {
top:
  %3 = add i64 %1, %0
  %4 = add i64 %3, %2
  ret i64 %4
}


The inlining is irrelevant here: if you try this where g calls a function
that doesn't get inlined, it will produce the same code whether you do the
splatting or change the args to (a, b, c). This is an excessively simple
example since it's very easy to see that +(a, (b, c)...) is equivalent to
+(a, b, c), but since we specialized function bodies on argument types,
which often includes the size and component types of tuple arguments, what
type inference is actually reasoning about is often just this simple.

What do you really want to avoid is splatting dynamically sized collections
as the arguments. For example, you can sum the values in a vector like this:

julia> v = rand(1000);

julia> @time +(v...)
  0.077058 seconds (39.84 k allocations: 1.930 MB)
509.51187339334575

julia> @time +(v...)
  0.000131 seconds (2.01 k allocations: 71.016 KB)
509.51187339334575


But this is really bad. Why? Because Julia dispatches function calls on all
of a functions arguments. In this case there are 1000 arguments, all of
which, in principle are being used for dispatch. Not good. So if you ever
find yourself splatting a variably sized collection into a function's
arguments, stop and think that there must be a reducer that does this. In
this case, it's the sum function:

julia> @time sum(v)
  0.000005 seconds (5 allocations: 176 bytes)
509.5118733933458


As a bonus, our sum function is also more accurate since it uses a better
summation algorithm than the naive left-to-right accumulation approach.

*Inlining*
> How do I know if a function is inlined?
> When is it necessary to explicitly inline? (say with @inline, or
> Exrp(:meta, :inline))
> Does this guarantee inlining, or is it just a hint to the compiler?
> Is the compiler usually smart enough to inline optimally.
> Why wouldn't I explicitly inline all my small functions?
>

You can tell if something was inlined by looking at the body of the calling
method and seeing if there's an explicit method calls or not. For example
the output of @code_llvm g(1, 2, 3) above, there are no calls to any +
methods – the addition operations code fully inlined. If, on the other
hand, you look at @code_llvm g(big(1), big(2), big(3)), you'll see that
there are explicit calls to Julia methods with names like @"julia_+_22861".

You generally don't want or need to mess with this. The compiler should be
and generally is good at figuring out what should or shouldn't be inlined.
However, there are some times when a very small method really ought to
always be inlined, in which case, you can annotated the method definition
with @inline. You can't always force inlining – because it's not always
possible – but when it is, this skips the heuristic checks that normally
happen and just does it. Beware, this can make things slower and can bloat
generated code.


> *Overhead of accessing tuples compared to separate variables/arguments*
> Is there overhead in accessing tuples vs separate arguments?
> Is the expression assigned to x slower than expression assigned to y?
> I = (1,2,3)
> i1 = 1, i2 = 2, i3 = 3
> x = I[1]+I[2]+I[3]
> y = i1+i2+i3
>

In general no, there is no overhead. Caveats, the tuples must be reasonably
sized (if they are thousands of elements long, this will not be fast), and
you should try not to do something very confusing. Type inference
specializes function bodies on arguments, but if it gets called with very
long tuples or lots of different tuple types, it will bail out and just use
generic code, which will be slower.


> *"Chained" function calls *(not sure if chained is the right word)It
> seems sometimes I have lots of small "convenience" functions, effectively
> chained until a final "working" function is called.
> A calls B calls C calls D calls ... calls Z. If A, B, C, ... are small
> (maybe getters, convenience functions etc), is this still efficient?
> Is there any penalty for this?
> How does this relate to inlining?
> Presumably there is no penalty if and only if functions are inlined? Is
> this true?
> If so, is there a limit to the number of levels (functions in call chain)
> that can be inlined?
> Should I be worried about any of this?
>

Function chaining is cheap. If inlining occurs, it's free. Even if inlining
doesn't occur, function calls are very cheap on modern CPUs. Calling
functions can be cheaper than inlining, so one is not uniformly better than
the other. Function calls are only expensive if dispatch is unpredictable
and generic dispatch ends up happening. You can tell this by looking at the
LLVM code and looking for jl_apply_generic – if that's being called, that's
generic dispatch. If that's happening there may also be type instabilities
and other badness.

Reply via email to