Re: Shocking difference in performance between RB and C when copying float data between memoryblocks

Daniel Stenning Wed, 16 May 2007 12:27:47 -0700

On 16/5/07 18:44, "Frank Condello" <[EMAIL PROTECTED]> wrote:


> On 16-May-07, at 10:44 AM, [EMAIL PROTECTED] wrote:
> 
>> On May 16, 2007, at 09:14 UTC, Frank Condello wrote:
>> 
>>>> So - here is the 40 million dollar question - WHY is RB nearly 10
>>>> times slower than C ?
>>> 
>>> Typically the blame is laid on the lack of (or lax) compiler
>>> optimizations.
>> 
>> That blame would be misplaced in this case, though.
> 
> In this case, I think you're right, but...
> 
>>> RB loops are simply much slower than optimized C
>>> loops, even with all the "speedy" pragmas in place.
>> 
>> No, I don't think they are.  Looping is pretty fast.  But this
>> particular loop is doing two virtual method calls per inner iteration,
>> compared to the C code which is doing no function calls of any sort.
>> That is almost certainly where all the extra time is going.
> 
> Daniel's profiling a loop that uses direct Ptr access - there are no
> function calls. 

 actually this is something I'd like clarified - when I use
myPtr.Single(offset) is RB calling a function or does the compiler merely
interpret this syntax and generate machine code to retrieve/set the
dereferenced data directly ? ( ala C ).

  In any case, using the Ptr syntax was definitaly quicker than the
memoryblock.SingleValue() version. But of course nowhere near as fast as the
C assisted version.

>His loops may not be entirely equivalent but he's
> using a reasonable approach in both cases. Unfortunately RB forces
> you to write uglier code due to the lack of strongly typed pointers,
> but that's a different problem...
> 
> The report evaluation suggests he wasn't testing in release builds
> which is why he was seeing such a big discrepancy. However the report
> makes some incorrect assumptions as well -

> basically this test is
> flawed. Daniel's code is making a function call in the dylib case and
> no function calls in the RB case so if all things were equal RB
> should've been _way_ faster, not just marginally faster.

I don't quite get your point here. I simply wrote the most efficient code to
do the task in either case.  Its not a matter of like for like. Although I
do understand such a test might be useful in finding out where the slowdown
occurs.

> For a proper comparison the C loop would need to contain the outer
> loop as well - Trust me, I've done a lot of testing here myself, and
> even when using Ptr and all the appropriate pragmas C loops simply
> spank RB loops in real world applications.

I guess we are basically in agreement then. It remains for RS to find out
why looping is so slow in RB and improve things.

> I typically get around a
> 5x increase with C code, more if using trig functions since RB
> insists on wrapping those in an extra function call. C also gives you
> an opportunity to optimize further in some situations - E.g. a vector
> array Normalize function can use an inlined single-precision sqrt
> approximation that's impossible to reasonably implement in RB code.
> My C vec3.normalize function is over 30 times faster than anything I
> could come up with in RB, and it's not for the lack of trying.
> 
> You can gain a lot speed in RB by manually unrolling loops but that's
> not always appropriate (or pretty). I think we simply need a more
> reasonable test case to pinpoint the problem areas.



> 
> Frank.
> <http://developer.chaoticbox.com/>
> 
> 
> 
> _______________________________________________
> Unsubscribe or switch delivery mode:
> <http://www.realsoftware.com/support/listmanager/>
> 
> Search the archives:
> <http://support.realsoftware.com/listarchives/lists.html>
> 

Regards,

Dan



_______________________________________________
Unsubscribe or switch delivery mode:
<http://www.realsoftware.com/support/listmanager/>

Search the archives:
<http://support.realsoftware.com/listarchives/lists.html>

Re: Shocking difference in performance between RB and C when copying float data between memoryblocks

Reply via email to