Thank you for the detailed explanation. I tried it out:

function pretty_fast(features::Vector{Feature})
    retval = 0.0
    for i in 1 : length(features)
        if isa(features[i], A)
            x = evaluate(features[i]::A)
        elseif isa(features[i], B)
            x = evaluate(features[i]::B)
        else
            x = evaluate(features[i])
        end
        retval += x
    end
    retval
end

On my laptop, fast runs in 10 microseconds, pretty_fast in 30, and slow in 
210.

On Saturday, April 2, 2016 at 12:24:18 PM UTC-4, Yichao Yu wrote:
>
> On Sat, Apr 2, 2016 at 12:16 PM, Tim Wheeler <timwheel...@gmail.com 
> <javascript:>> wrote: 
> > Thank you for the comments. In my original code it means the difference 
> > between a 30 min execution with memory allocation in the Gigabytes and a 
> few 
> > seconds of execution with only 800 bytes using the second version. 
> > I thought under-the-hood Julia basically runs those if statements anyway 
> for 
> > its dispatch, and don't know why it needs to allocate any memory. 
> > Having the if-statement workaround will be fine though. 
>
> Well, if you have a lot of these cheap functions being dynamically 
> dispatched I think it is not a good way to use the type. Depending on 
> your problem, you may be better off using a enum/flags/dict to 
> represent the type/get the values. 
>
> The reason for the allocation is that the return type is unknown. It 
> should be obvious to see if you check your code with code_warntype. 
>
> > 
> > On Saturday, April 2, 2016 at 7:26:11 AM UTC-7, Cedric St-Jean wrote: 
> >> 
> >> 
> >>> Therefore there's no way the compiler can rewrite the slow version to 
> the 
> >>> fast version. 
> >> 
> >> 
> >> It knows that the element type is a Feature, so it could produce: 
> >> 
> >> if isa(features[i], A) 
> >>     retval += evaluate(features[i]::A) 
> >> elseif isa(features[i], B) 
> >>     retval += evaluate(features[i]::B) 
> >> else 
> >>     retval += evaluate(features[i]) 
> >> end 
> >> 
> >> and it would make sense for abstract types that have few subtypes. I 
> >> didn't realize that dispatch was an order of magnitude slower than type 
> >> checking. It's easy enough to write a macro generating this expansion, 
> too. 
> >> 
> >> On Saturday, April 2, 2016 at 2:05:20 AM UTC-4, Yichao Yu wrote: 
> >>> 
> >>> On Fri, Apr 1, 2016 at 9:56 PM, Tim Wheeler <timwheel...@gmail.com> 
> >>> wrote: 
> >>> > Hello Julia Users. 
> >>> > 
> >>> > I ran into a weird slowdown issue and reproduced a minimal working 
> >>> > example. 
> >>> > Maybe someone can help shed some light. 
> >>> > 
> >>> > abstract Feature 
> >>> > 
> >>> > type A <: Feature end 
> >>> > evaluate(f::A) = 1.0 
> >>> > 
> >>> > type B <: Feature end 
> >>> > evaluate(f::B) = 0.0 
> >>> > 
> >>> > function slow(features::Vector{Feature}) 
> >>> >     retval = 0.0 
> >>> >     for i in 1 : length(features) 
> >>> >         retval += evaluate(features[i]) 
> >>> >     end 
> >>> >     retval 
> >>> > end 
> >>> > 
> >>> > function fast(features::Vector{Feature}) 
> >>> >     retval = 0.0 
> >>> >     for i in 1 : length(features) 
> >>> >         if isa(features[i], A) 
> >>> >             retval += evaluate(features[i]::A) 
> >>> >         else 
> >>> >             retval += evaluate(features[i]::B) 
> >>> >         end 
> >>> >     end 
> >>> >     retval 
> >>> > end 
> >>> > 
> >>> > using ProfileView 
> >>> > 
> >>> > features = Feature[] 
> >>> > for i in 1 : 10000 
> >>> >     push!(features, A()) 
> >>> > end 
> >>> > 
> >>> > slow(features) 
> >>> > @time slow(features) 
> >>> > fast(features) 
> >>> > @time fast(features) 
> >>> > 
> >>> > The output is: 
> >>> > 
> >>> > 0.000136 seconds (10.15 k allocations: 166.417 KB) 
> >>> > 0.000012 seconds (5 allocations: 176 bytes) 
> >>> > 
> >>> > 
> >>> > This is a HUGE difference! Am I missing something big? Is there a 
> good 
> >>> > way 
> >>> > to inspect code to figure out where I am going wrong? 
> >>> 
> >>> This is because of type instability as you will find in the 
> performance 
> >>> tips. 
> >>> Note that slow and fast are not equivalent since the fast version only 
> >>> accept `A` or `B` but the slow version accepts any subtype of feature 
> >>> that you may ever define. Therefore there's no way the compiler can 
> >>> rewrite the slow version to the fast version. 
> >>> There are optimizations that can be applied to bring down the gap but 
> >>> there'll always be a large difference between the two. 
> >>> 
> >>> > 
> >>> > 
> >>> > Thank you in advance for any guidance. 
> >>> > 
> >>> > 
> >>> > -Tim 
> >>> > 
> >>> > 
> >>> > 
> >>> > 
> >>> > 
>

Reply via email to