On Saturday, May 31, 2014 3:11:40 PM UTC+2, mike c wrote:

>
>
> On Saturday, 31 May 2014 21:07:58 UTC+10, Simon Kornblith wrote:
>>
>> It may not be desirable, but I think it's expected. The relevant comment 
>> in inference.jl says:
>> In this case, performance is 20% worse because the compiler can't infer 
>> the return type of intersect if there are >4 possible matching methods. 
>> You can restore it to the same level if you add a typeassert to the call to 
>> intersect.
>>
> Thanks.  So return types are inferred when there's up to 4 functions with 
> similar signatures. After that there's the performance drop off. 
>
>>
>> OTOH, the performance you're losing to failed type inference is nothing 
>> compared to the performance you're losing to dynamic dispatch. You don't 
>> know the type of object at compile time (it could be Sphere, Cube, or 
>> Plane), so the compiler can't optimize the call to intersect into a 
>> direct call. The following code does not have this problem, and is a little 
>> over 150x faster on my system.
>>
>
> Unfortunately, my example to demonstrate the performance drop off, doesn't 
> really encapsulate what the intersect methods are doing. The intersect() 
> functions are returning values dependant upon the inputs.  It's more like.
>
> function intersect(self::Sphere, i)
>   return self.a + i
> end
>
> where the same "i" is being sent to each intersect().  And instead of 
> summing, I'm really finding the minimum i.e.   min( intersect(object, i) ) 
>     
>
> In reality, it's even more complex than that, with the input "i" really 
> being a random-ish vector representing a light ray.
>

Yes, this is kind of a toy solution to a toy example. It sounds like what 
you describe is implementable as:

f(i, object, objects..) = min(intersect(object, i), f(i, objects...))
f(i) = 0.0

However, there is a caveat here. The compiler is generating specialized 
code for each potential set of object types, which is fine as long as you 
don't expect too many. Performance will fall for more than a few (7?) 
objects, since the compiler will cease to generate specialized code.

(It's also possible that dynamic dispatch isn't actually a bottleneck in 
your code, depending on how much is happening in the intersect function.)


>> f(object, objects...) = intersect(object) + f(objects...)
>> f() = 0.0
>> ...
>>         ss += f(objects...)
>>
>  
> In this simple case, is this just pre-calculating the sum of intersect()s 
> once and then using it repeatedly?
>

Yes, it looks like the optimizer is smart enough to do this, but not quite 
smart enough to elide the loop entirely. Still, even if there were 3 adds 
in the inner loop instead of 1, this code would still be much faster than 
the original.
 

> Thanks for the info & code. Changing objects to a tuple speeds up that 
> original code I posted by a factor of 3!
>

Yes, I noticed that too. Type inference doesn't appear to know that objects 
= [Sphere(1.0), Cube(2.0), Plane(3.0)] will be an array at compile time, so 
iteration becomes slow. You will get the same speedup if you pass that 
array into the function, or if you define it as objects = 
Object[Sphere(1.0), Cube(2.0), Plane(3.0)].

Simon

Reply via email to