Re: Shocking difference in performance between RB and C when copying float data between memoryblocks

Frank Condello Wed, 16 May 2007 10:44:41 -0700

On 16-May-07, at 10:44 AM, [EMAIL PROTECTED] wrote:

> On May 16, 2007, at 09:14 UTC, Frank Condello wrote:
>
>>> So - here is the 40 million dollar question - WHY is RB nearly 10
>>> times slower than C ?
>>
>> Typically the blame is laid on the lack of (or lax) compiler
>> optimizations.
>
> That blame would be misplaced in this case, though.


In this case, I think you're right, but...

>> RB loops are simply much slower than optimized C
>> loops, even with all the "speedy" pragmas in place.
>
> No, I don't think they are.  Looping is pretty fast.  But this
> particular loop is doing two virtual method calls per inner iteration,
> compared to the C code which is doing no function calls of any sort.
> That is almost certainly where all the extra time is going.

Daniel's profiling a loop that uses direct Ptr access - there are no  
function calls. His loops may not be entirely equivalent but he's  
using a reasonable approach in both cases. Unfortunately RB forces  
you to write uglier code due to the lack of strongly typed pointers,  
but that's a different problem...

The report evaluation suggests he wasn't testing in release builds  
which is why he was seeing such a big discrepancy. However the report  
makes some incorrect assumptions as well - basically this test is  
flawed. Daniel's code is making a function call in the dylib case and  
no function calls in the RB case so if all things were equal RB  
should've been _way_ faster, not just marginally faster.

For a proper comparison the C loop would need to contain the outer  
loop as well - Trust me, I've done a lot of testing here myself, and  
even when using Ptr and all the appropriate pragmas C loops simply  
spank RB loops in real world applications. I typically get around a  
5x increase with C code, more if using trig functions since RB  
insists on wrapping those in an extra function call. C also gives you  
an opportunity to optimize further in some situations - E.g. a vector  
array Normalize function can use an inlined single-precision sqrt  
approximation that's impossible to reasonably implement in RB code.  
My C vec3.normalize function is over 30 times faster than anything I  
could come up with in RB, and it's not for the lack of trying.

You can gain a lot speed in RB by manually unrolling loops but that's  
not always appropriate (or pretty). I think we simply need a more  
reasonable test case to pinpoint the problem areas.

Frank.
<http://developer.chaoticbox.com/>



_______________________________________________
Unsubscribe or switch delivery mode:
<http://www.realsoftware.com/support/listmanager/>

Search the archives:
<http://support.realsoftware.com/listarchives/lists.html>

Re: Shocking difference in performance between RB and C when copying float data between memoryblocks

Reply via email to