Re: [julia-users] dispatch slowdown when iterating over array with abstract values

'Greg Plowman' via julia-users Tue, 05 Apr 2016 20:29:24 -0700

Thanks for your relies. I'm starting to understand some of this now.
In particular, I'm learning there are many aspects to dynamic dispatch.


Also worth noting the difference between:

if isa(features[i], A)
    retval += evaluate(features[i]::A)
elseif isa(features[i], B)
    retval += evaluate(features[i]::B)
else
    retval += evaluate(features[i])
end

and 

if isa(features[i], A)
    x = evaluate(features[i]::A)
elseif isa(features[i], B)
    x = evaluate(features[i]::B)
else
    x = evaluate(features[i])
end
retval = x

Execution times depend on:
whether other subtypes exist or not
whether evaluate() for these subtypes return consistent types
whether run-time vector actually contains any of these other subtypes

In nearly all cases, method 1 is much faster.

Thanks again for all your help. It's much appreciated.


On Tuesday, April 5, 2016 at 7:39:41 AM UTC+10, Yichao Yu wrote:

>
> On Apr 4, 2016 1:36 PM, "Cedric St-Jean" <cedric...@gmail.com 
> <javascript:>> wrote:
> >
> > I'm not a compiler dev, but here's how I understand it:
> >
>
> Sorry forgot to reply....
>
> >
> > On Sunday, April 3, 2016 at 6:32:17 PM UTC-4, Greg Plowman wrote:
> >>
> >> It seems to me that slowness of dynamic dispatch and type instability 
> are orthogonal.
> >>
> >> I mean is dynamic dispatch inherently slow, or is it slow because it 
> involves type instability?
> >
> >
> > Type instability (uncertainty) leads to multiple-dispatch, and multiple 
> dispatch leads to more type instability (in general, but not always), as 
> the return type is harder to pin down.
> >  
> >>
> >> @code_warntype slow(features) suggests that the return type from 
> evaluate() is Float64, so I can't see any type instability.
> >> If this is correct, then why is dynamic dispatch so much slower than 
> run-time if statements?
> >
>
> I didn't know we are doing this optimization. I think the actual limit for 
> type inference is how many method actually matches. If you add more subtype 
> of Feature with different implementation of the function, I think you'll 
> see type inference give up at some point (the limit used to be 4 iirc)
>
> Currently, whenever type inference cannot determine a single function to 
> call, it fallback to a full dynamic dispatch at runtime. This is of course 
> not the best way to do it See 
> https://github.com/JuliaLang/julia/issues/10805 and 
> https://github.com/JuliaLang/julia/pull/11862.
>
> >
> > Each object contains an ID which is an integer (or a pointer - I don't 
> know, but it's the same thing).
> >
> > if isa(x,A)
> >
> > checks wheter x's ID is equal to A's ID. That's an integer comparison, 
> and very fast. In contrast, multiple-dispatch involves looking up the 
> method table for which function to call when x is an Int. I don't know how 
> it's implemented, but a dictionary look-up is likely, and that's much more 
> costly.
>
> A "hidden" cost of the dynamic dispatch is that the result needs to be 
> boxed. Since that's the only way one can return an arbitrary type object in 
> C. There are thought on how this can be improved but they involve more 
> complexity and it is not particularly clear if this particular case is 
> worth optimizing for (i.e. if it will regress more important cases for 
> example).
>
> >  
> >>
> >>
> >> For my own understanding, I was almost hoping that slow() was not 
> type-stable,
> >
> >
> > slow can be type-stable if the compiler assumes that no other 
> subtype/method will be added, which it seems to be doing. There's still the 
> multiple-dispatch cost, since it doesn't know which method to call. To 
> demonstrate the point about type-stability, if I define 
> >
> > type C <: Feature end
> > evaluate(f::C) = 1:3
> >
> > at the same time as I defined A and B, then slow becomes twice as slow 
> as before (430 microseconds vs. 210), even though I haven't even 
> instantiated any C object. That's because the compiler no longer assumes 
> that `evaluate` returns a Float (check code_warntype) and thus needs a 
> second multiple-dispatch to make the + call.
> >  
> >>
> >> and annotating with evaluate(features[i])::Float64 would let the 
> compiler produce faster code,
> >> but it makes no difference.
> >>
> >>
> >> On Sunday, April 3, 2016 at 11:07:44 PM UTC+10, Cedric St-Jean wrote:
> >>>
> >>> Good call, it was already pointed out in that thread.
> >>>
> >>> On Sat, Apr 2, 2016 at 11:11 PM, Yichao Yu <yyc...@gmail.com> wrote:
> >>>>
> >>>> On Sat, Apr 2, 2016 at 10:53 PM, Cedric St-Jean <cedric...@gmail.com> 
> wrote:
> >>>> > That's actually a compiler bug, nice!
> >>>> >
> >>>> > abstract Feature
> >>>> >
> >>>> > type A <: Feature end
> >>>> > evaluate(f::A) = 1.0
> >>>> >
> >>>> > foo(features::Vector{Feature}) = isa(features[1], A) ?
> >>>> > evaluate(features[1]::A) : evaluate(features[1])
> >>>> >
> >>>> > @show foo(Feature[A()])
> >>>> >
> >>>> > type C <: Feature end
> >>>> > evaluate(f::C) = 100
> >>>> >
> >>>> > @show foo(Feature[C()])
> >>>> >
> >>>> > yields
> >>>> >
> >>>> > foo(Feature[A()]) = 1.0
> >>>> >
> >>>> > foo(Feature[C()]) = 4.94e-322
> >>>> >
> >>>> >
> >>>> > That explains why performance was the same on your computer: the 
> compiler
> >>>> > was making an incorrect assumption about the return type of 
> `evaluate`. Or
> >>>> > maybe it's an intentional gamble by the Julia devs, for the sake of
> >>>> > performance.
> >>>> >
> >>>> > I couldn't find any issue describing this. Yichao?
> >>>>
> >>>> This is effectively #265. It's not always predictable what assumption
> >>>> the compiler makes now...
> >>>>
> >>>> >
> >>>> >
> >>>> > On Saturday, April 2, 2016 at 10:16:59 PM UTC-4, Greg Plowman wrote:
> >>>> >>
> >>>> >> Thanks Cedric and Yichao.
> >>>> >>
> >>>> >> This makes sense that there might be new subtypes and associated
> >>>> >> specialised methods. I understand that now. Thanks.
> >>>> >>
> >>>> >> On my machine (v0.4.5 Windows), fast() and pretty_fast() seem to 
> run in
> >>>> >> similar time.
> >>>> >> So I looked as @code_warntype as Yichao suggested and get the 
> following.
> >>>> >> I don't fully know how to interpret output, but return type from 
> the final
> >>>> >> "catchall" evaluate() seems to be inferred/asserted as Float64 (see
> >>>> >> highlighted yellow line below)
> >>>> >>
> >>>> >> Would this explain why pretty_fast() seems to be as efficient as 
> fast()?
> >>>> >>
> >>>> >> Why is the return type being inferred asserted as Float64?
> >>>> >>
> >>>> >>
> >>>> >> julia> @code_warntype fast(features)
> >>>> >> Variables:
> >>>> >>   features::Array{Feature,1}
> >>>> >>   retval::Float64
> >>>> >>   #s1::Int64
> >>>> >>   i::Int64
> >>>> >>
> >>>> >> Body:
> >>>> >>   begin  # none, line 2:
> >>>> >>       retval = 0.0 # none, line 3:
> >>>> >>       GenSym(2) = 
> (Base.arraylen)(features::Array{Feature,1})::Int64
> >>>> >>       GenSym(0) = $(Expr(:new, UnitRange{Int64}, 1,
> >>>> >> 
> :(((top(getfield))(Base.Intrinsics,:select_value)::I)((Base.sle_int)(1,GenSym(2))::Bool,GenSym(2),(Base.box)
> >>>> >> (Int64,(Base.sub_int)(1,1)))::Int64)))
> >>>> >>       #s1 = (top(getfield))(GenSym(0),:start)::Int64
> >>>> >>       unless (Base.box)(Base.Bool,(Base.not_int)(#s1::Int64 ===
> >>>> >> 
> (Base.box)(Base.Int,(Base.add_int)((top(getfield))(GenSym(0),:stop)::Int64,1))::Bool))
> >>>> >> goto 1
> >>>> >>       2:
> >>>> >>       GenSym(3) = #s1::Int64
> >>>> >>       GenSym(4) = (Base.box)(Base.Int,(Base.add_int)(#s1::Int64,1))
> >>>> >>       i = GenSym(3)
> >>>> >>       #s1 = GenSym(4) # none, line 4:
> >>>> >>       unless
> >>>> >> 
> (Main.isa)((Base.arrayref)(features::Array{Feature,1},i::Int64)::Feature,Main.A)::Bool
> >>>> >> goto 4 # none, line 5:
> >>>> >>       retval =
> >>>> >> (Base.box)(Base.Float64,(Base.add_float)(retval::Float64,1.0))
> >>>> >>       goto 5
> >>>> >>       4:  # none, line 7:
> >>>> >>       retval =
> >>>> >> (Base.box)(Base.Float64,(Base.add_float)(retval::Float64,0.0))
> >>>> >>       5:
> >>>> >>       3:
> >>>> >>       unless
> >>>> >> 
> (Base.box)(Base.Bool,(Base.not_int)((Base.box)(Base.Bool,(Base.not_int)(#s1::Int64
> >>>> >> === (Base.box)(Base.Int,(Base.add_int)((top(getfield))(GenSym(0)
> >>>> >> ,:stop)::Int64,1))::Bool)))) goto 2
> >>>> >>       1:
> >>>> >>       0:  # none, line 10:
> >>>> >>       return retval::Float64
> >>>> >>   end::Float64
> >>>> >>
> >>>> >>
> >>>> >> julia> @code_warntype pretty_fast(features)
> >>>> >> Variables:
> >>>> >>   features::Array{Feature,1}
> >>>> >>   retval::Float64
> >>>> >>   #s1::Int64
> >>>> >>   i::Int64
> >>>> >>
> >>>> >> Body:
> >>>> >>   begin  # none, line 2:
> >>>> >>       retval = 0.0 # none, line 3:
> >>>> >>       GenSym(2) = 
> (Base.arraylen)(features::Array{Feature,1})::Int64
> >>>> >>       GenSym(0) = $(Expr(:new, UnitRange{Int64}, 1,
> >>>> >> 
> :(((top(getfield))(Base.Intrinsics,:select_value)::I)((Base.sle_int)(1,GenSym(2))::Bool,GenSym(2),(Base.box)
> >>>> >> (Int64,(Base.sub_int)(1,1)))::Int64)))
> >>>> >>       #s1 = (top(getfield))(GenSym(0),:start)::Int64
> >>>> >>       unless (Base.box)(Base.Bool,(Base.not_int)(#s1::Int64 ===
> >>>> >> 
> (Base.box)(Base.Int,(Base.add_int)((top(getfield))(GenSym(0),:stop)::Int64,1))::Bool))
> >>>> >> goto 1
> >>>> >>       2:
> >>>> >>       GenSym(4) = #s1::Int64
> >>>> >>       GenSym(5) = (Base.box)(Base.Int,(Base.add_int)(#s1::Int64,1))
> >>>> >>       i = GenSym(4)
> >>>> >>       #s1 = GenSym(5) # none, line 4:
> >>>> >>       unless
> >>>> >> 
> (Main.isa)((Base.arrayref)(features::Array{Feature,1},i::Int64)::Feature,Main.A)::Bool
> >>>> >> goto 4 # none, line 5:
> >>>> >>       retval =
> >>>> >> (Base.box)(Base.Float64,(Base.add_float)(retval::Float64,1.0))
> >>>> >>       goto 6
> >>>> >>       4:  # none, line 6:
> >>>> >>       unless
> >>>> >> 
> (Main.isa)((Base.arrayref)(features::Array{Feature,1},i::Int64)::Feature,Main.B)::Bool
> >>>> >> goto 5 # none, line 7:
> >>>> >>       retval =
> >>>> >> (Base.box)(Base.Float64,(Base.add_float)(retval::Float64,0.0))
> >>>> >>       goto 6
> >>>> >>       5:  # none, line 9:
> >>>> >>       GenSym(3) =
> >>>> >> 
> (Main.evaluate)((Base.arrayref)(features::Array{Feature,1},i::Int64)::Feature)::Float64
> >>>> >>       retval =
> >>>> >> 
> (Base.box)(Base.Float64,(Base.add_float)(retval::Float64,GenSym(3)))
> >>>> >>       6:
> >>>> >>       3:
> >>>> >>       unless
> >>>> >> 
> (Base.box)(Base.Bool,(Base.not_int)((Base.box)(Base.Bool,(Base.not_int)(#s1::Int64
> >>>> >> === (Base.box)(Base.Int,(Base.add_int)((top(getfield))(GenSym(0)
> >>>> >> ,:stop)::Int64,1))::Bool)))) goto 2
> >>>> >>       1:
> >>>> >>       0:  # none, line 12:
> >>>> >>       return retval::Float64
> >>>> >>   end::Float64
> >>>> >>
> >>>> >> Julia>
> >>>> >>
> >>>> >>
> >>>> >>
> >>>> >> On Sunday, April 3, 2016 at 8:04:35 AM UTC+10, Cedric St-Jean 
> wrote:
> >>>> >>>
> >>>> >>>
> >>>> >>>
> >>>> >>> On Saturday, April 2, 2016 at 5:39:45 PM UTC-4, Greg Plowman 
> wrote:
> >>>> >>>>
> >>>> >>>> Cedric,
> >>>> >>>> On my machine fast() and pretty_fast() run in the roughly the 
> same time.
> >>>> >>>> Are you sure pre-compiled first?
> >>>> >>>
> >>>> >>>
> >>>> >>> Yep. I'm using @benchmark on Julia 0.4.5 on OSX, and the time 
> difference
> >>>> >>> is definitely > 2X.
> >>>> >>>
> >>>> >>>>
> >>>> >>>>
> >>>> >>>> 1. If you add a default fallback method, say, 
> evaluate(f::Feature) =
> >>>> >>>> -1.0
> >>>> >>>> Theoretically, would that inform the compiler that evaluate() 
> always
> >>>> >>>> returns a Float64 and therefore is type stable?
> >>>> >>>
> >>>> >>>
> >>>> >>> No, because I might write
> >>>> >>>
> >>>> >>> type C <: Feature
> >>>> >>> end
> >>>> >>> evaluate(f::C) = 100
> >>>> >>>
> >>>> >>> and a Vector{Feature} might contain a C. Abstract types are 
> open-ended,
> >>>> >>> new types can be created a runtime, so the compiler can't assume 
> anything
> >>>> >>> about them, which is why they're not useful for performance. See 
> here.
> >>>> >>>
> >>>> >>>
> >>>> >>>>
> >>>> >>>> Does dynamic dispatch mean compiler has to look up method at run 
> time
> >>>> >>>> (because it can't know type ahead of time).
> >>>> >>>> This is equivalent to explicitly coding "static dispatch" with if
> >>>> >>>> statements?
> >>>> >>>> If so then why is it so much slower?
> >>>> >>>> Providing a fallback method as in 1. or explicitly returning 
> Float64
> >>>> >>>> doesn't seem to improve speed, which naively suggests that 
> slowness is from
> >>>> >>>> dynamic dispatch not from type instability???
> >>>> >>>>
> >>>> >>>> 3. Also, on my machine pretty_fast() is as fast as fast(). Why 
> is this
> >>>> >>>> so, if pretty_fast() is supposedly type unstable?
> >>>> >>>>
> >>>> >>>> As you can probably tell, I'm pretty confused on this.
> >>>> >>>>
> >>>> >>>>
> >>>> >>>> On Sunday, April 3, 2016 at 6:34:09 AM UTC+10, Cedric St-Jean 
> wrote:
> >>>> >>>>>
> >>>> >>>>> Thank you for the detailed explanation. I tried it out:
> >>>> >>>>>
> >>>> >>>>> function pretty_fast(features::Vector{Feature})
> >>>> >>>>>     retval = 0.0
> >>>> >>>>>     for i in 1 : length(features)
> >>>> >>>>>         if isa(features[i], A)
> >>>> >>>>>             x = evaluate(features[i]::A)
> >>>> >>>>>         elseif isa(features[i], B)
> >>>> >>>>>             x = evaluate(features[i]::B)
> >>>> >>>>>         else
> >>>> >>>>>             x = evaluate(features[i])
> >>>> >>>>>         end
> >>>> >>>>>         retval += x
> >>>> >>>>>     end
> >>>> >>>>>     retval
> >>>> >>>>> end
> >>>> >>>>>
> >>>> >>>>> On my laptop, fast runs in 10 microseconds, pretty_fast in 30, 
> and slow
> >>>> >>>>> in 210.
> >>>> >>>>>
> >>>> >>>>> On Saturday, April 2, 2016 at 12:24:18 PM UTC-4, Yichao Yu 
> wrote:
> >>>> >>>>>>
> >>>> >>>>>> On Sat, Apr 2, 2016 at 12:16 PM, Tim Wheeler <
> timwheel...@gmail.com>
> >>>> >>>>>> wrote:
> >>>> >>>>>> > Thank you for the comments. In my original code it means the
> >>>> >>>>>> > difference
> >>>> >>>>>> > between a 30 min execution with memory allocation in the 
> Gigabytes
> >>>> >>>>>> > and a few
> >>>> >>>>>> > seconds of execution with only 800 bytes using the second 
> version.
> >>>> >>>>>> > I thought under-the-hood Julia basically runs those if 
> statements
> >>>> >>>>>> > anyway for
> >>>> >>>>>> > its dispatch, and don't know why it needs to allocate any 
> memory.
> >>>> >>>>>> > Having the if-statement workaround will be fine though.
> >>>> >>>>>>
> >>>> >>>>>> Well, if you have a lot of these cheap functions being 
> dynamically
> >>>> >>>>>> dispatched I think it is not a good way to use the type. 
> Depending on
> >>>> >>>>>> your problem, you may be better off using a enum/flags/dict to
> >>>> >>>>>> represent the type/get the values.
> >>>> >>>>>>
> >>>> >>>>>> The reason for the allocation is that the return type is 
> unknown. It
> >>>> >>>>>> should be obvious to see if you check your code with 
> code_warntype.
> >>>> >>>>>>
> >>>> >>>>>> >
> >>>> >>>>>> > On Saturday, April 2, 2016 at 7:26:11 AM UTC-7, Cedric 
> St-Jean
> >>>> >>>>>> > wrote:
> >>>> >>>>>> >>
> >>>> >>>>>> >>
> >>>> >>>>>> >>> Therefore there's no way the compiler can rewrite the slow 
> version
> >>>> >>>>>> >>> to the
> >>>> >>>>>> >>> fast version.
> >>>> >>>>>> >>
> >>>> >>>>>> >>
> >>>> >>>>>> >> It knows that the element type is a Feature, so it could 
> produce:
> >>>> >>>>>> >>
> >>>> >>>>>> >> if isa(features[i], A)
> >>>> >>>>>> >>     retval += evaluate(features[i]::A)
> >>>> >>>>>> >> elseif isa(features[i], B)
> >>>> >>>>>> >>     retval += evaluate(features[i]::B)
> >>>> >>>>>> >> else
> >>>> >>>>>> >>     retval += evaluate(features[i])
> >>>> >>>>>> >> end
> >>>> >>>>>> >>
> >>>> >>>>>> >> and it would make sense for abstract types that have few 
> subtypes.
> >>>> >>>>>> >> I
> >>>> >>>>>> >> didn't realize that dispatch was an order of magnitude 
> slower than
> >>>> >>>>>> >> type
> >>>> >>>>>> >> checking. It's easy enough to write a macro generating this
> >>>> >>>>>> >> expansion, too.
> >>>> >>>>>> >>
> >>>> >>>>>> >> On Saturday, April 2, 2016 at 2:05:20 AM UTC-4, Yichao Yu 
> wrote:
> >>>> >>>>>> >>>
> >>>> >>>>>> >>> On Fri, Apr 1, 2016 at 9:56 PM, Tim Wheeler
> >>>> >>>>>> >>> <timwheel...@gmail.com>
> >>>> >>>>>> >>> wrote:
> >>>> >>>>>> >>> > Hello Julia Users.
> >>>> >>>>>> >>> >
> >>>> >>>>>> >>> > I ran into a weird slowdown issue and reproduced a 
> minimal
> >>>> >>>>>> >>> > working
> >>>> >>>>>> >>> > example.
> >>>> >>>>>> >>> > Maybe someone can help shed some light.
> >>>> >>>>>> >>> >
> >>>> >>>>>> >>> > abstract Feature
> >>>> >>>>>> >>> >
> >>>> >>>>>> >>> > type A <: Feature end
> >>>> >>>>>> >>> > evaluate(f::A) = 1.0
> >>>> >>>>>> >>> >
> >>>> >>>>>> >>> > type B <: Feature end
> >>>> >>>>>> >>> > evaluate(f::B) = 0.0
> >>>> >>>>>> >>> >
> >>>> >>>>>> >>> > function slow(features::Vector{Feature})
> >>>> >>>>>> >>> >     retval = 0.0
> >>>> >>>>>> >>> >     for i in 1 : length(features)
> >>>> >>>>>> >>> >         retval += evaluate(features[i])
> >>>> >>>>>> >>> >     end
> >>>> >>>>>> >>> >     retval
> >>>> >>>>>> >>> > end
> >>>> >>>>>> >>> >
> >>>> >>>>>> >>> > function fast(features::Vector{Feature})
> >>>> >>>>>> >>> >     retval = 0.0
> >>>> >>>>>> >>> >     for i in 1 : length(features)
> >>>> >>>>>> >>> >         if isa(features[i], A)
> >>>> >>>>>> >>> >             retval += evaluate(features[i]::A)
> >>>> >>>>>> >>> >         else
> >>>> >>>>>> >>> >             retval += evaluate(features[i]::B)
> >>>> >>>>>> >>> >         end
> >>>> >>>>>> >>> >     end
> >>>> >>>>>> >>> >     retval
> >>>> >>>>>> >>> > end
> >>>> >>>>>> >>> >
> >>>> >>>>>> >>> > using ProfileView
> >>>> >>>>>> >>> >
> >>>> >>>>>> >>> > features = Feature[]
> >>>> >>>>>> >>> > for i in 1 : 10000
> >>>> >>>>>> >>> >     push!(features, A())
> >>>> >>>>>> >>> > end
> >>>> >>>>>> >>> >
> >>>> >>>>>> >>> > slow(features)
> >>>> >>>>>> >>> > @time slow(features)
> >>>> >>>>>> >>> > fast(features)
> >>>> >>>>>> >>> > @time fast(features)
> >>>> >>>>>> >>> >
> >>>> >>>>>> >>> > The output is:
> >>>> >>>>>> >>> >
> >>>> >>>>>> >>> > 0.000136 seconds (10.15 k allocations: 166.417 KB)
> >>>> >>>>>> >>> > 0.000012 seconds (5 allocations: 176 bytes)
> >>>> >>>>>> >>> >
> >>>> >>>>>> >>> >
> >>>> >>>>>> >>> > This is a HUGE difference! Am I missing something big? 
> Is there
> >>>> >>>>>> >>> > a good
> >>>> >>>>>> >>> > way
> >>>> >>>>>> >>> > to inspect code to figure out where I am going wrong?
> >>>> >>>>>> >>>
> >>>> >>>>>> >>> This is because of type instability as you will find in the
> >>>> >>>>>> >>> performance
> >>>> >>>>>> >>> tips.
> >>>> >>>>>> >>> Note that slow and fast are not equivalent since the fast 
> version
> >>>> >>>>>> >>> only
> >>>> >>>>>> >>> accept `A` or `B` but the slow version accepts any subtype 
> of
> >>>> >>>>>> >>> feature
> >>>> >>>>>> >>> that you may ever define. Therefore there's no way the 
> compiler
> >>>> >>>>>> >>> can
> >>>> >>>>>> >>> rewrite the slow version to the fast version.
> >>>> >>>>>> >>> There are optimizations that can be applied to bring down 
> the gap
> >>>> >>>>>> >>> but
> >>>> >>>>>> >>> there'll always be a large difference between the two.
> >>>> >>>>>> >>>
> >>>> >>>>>> >>> >
> >>>> >>>>>> >>> >
> >>>> >>>>>> >>> > Thank you in advance for any guidance.
> >>>> >>>>>> >>> >
> >>>> >>>>>> >>> >
> >>>> >>>>>> >>> > -Tim
> >>>> >>>>>> >>> >
> >>>> >>>>>> >>> >
> >>>> >>>>>> >>> >
> >>>> >>>>>> >>> >
> >>>> >>>>>> >>> >
> >>>
> >>>
>

Re: [julia-users] dispatch slowdown when iterating over array with abstract values

Reply via email to