On 16/5/07 18:44, "Frank Condello" <[EMAIL PROTECTED]> wrote:
> On 16-May-07, at 10:44 AM, [EMAIL PROTECTED] wrote: > >> On May 16, 2007, at 09:14 UTC, Frank Condello wrote: >> >>>> So - here is the 40 million dollar question - WHY is RB nearly 10 >>>> times slower than C ? >>> >>> Typically the blame is laid on the lack of (or lax) compiler >>> optimizations. >> >> That blame would be misplaced in this case, though. > > In this case, I think you're right, but... > >>> RB loops are simply much slower than optimized C >>> loops, even with all the "speedy" pragmas in place. >> >> No, I don't think they are. Looping is pretty fast. But this >> particular loop is doing two virtual method calls per inner iteration, >> compared to the C code which is doing no function calls of any sort. >> That is almost certainly where all the extra time is going. > > Daniel's profiling a loop that uses direct Ptr access - there are no > function calls. actually this is something I'd like clarified - when I use myPtr.Single(offset) is RB calling a function or does the compiler merely interpret this syntax and generate machine code to retrieve/set the dereferenced data directly ? ( ala C ). In any case, using the Ptr syntax was definitaly quicker than the memoryblock.SingleValue() version. But of course nowhere near as fast as the C assisted version. >His loops may not be entirely equivalent but he's > using a reasonable approach in both cases. Unfortunately RB forces > you to write uglier code due to the lack of strongly typed pointers, > but that's a different problem... > > The report evaluation suggests he wasn't testing in release builds > which is why he was seeing such a big discrepancy. However the report > makes some incorrect assumptions as well - > basically this test is > flawed. Daniel's code is making a function call in the dylib case and > no function calls in the RB case so if all things were equal RB > should've been _way_ faster, not just marginally faster. I don't quite get your point here. I simply wrote the most efficient code to do the task in either case. Its not a matter of like for like. Although I do understand such a test might be useful in finding out where the slowdown occurs. > For a proper comparison the C loop would need to contain the outer > loop as well - Trust me, I've done a lot of testing here myself, and > even when using Ptr and all the appropriate pragmas C loops simply > spank RB loops in real world applications. I guess we are basically in agreement then. It remains for RS to find out why looping is so slow in RB and improve things. > I typically get around a > 5x increase with C code, more if using trig functions since RB > insists on wrapping those in an extra function call. C also gives you > an opportunity to optimize further in some situations - E.g. a vector > array Normalize function can use an inlined single-precision sqrt > approximation that's impossible to reasonably implement in RB code. > My C vec3.normalize function is over 30 times faster than anything I > could come up with in RB, and it's not for the lack of trying. > > You can gain a lot speed in RB by manually unrolling loops but that's > not always appropriate (or pretty). I think we simply need a more > reasonable test case to pinpoint the problem areas. > > Frank. > <http://developer.chaoticbox.com/> > > > > _______________________________________________ > Unsubscribe or switch delivery mode: > <http://www.realsoftware.com/support/listmanager/> > > Search the archives: > <http://support.realsoftware.com/listarchives/lists.html> > Regards, Dan _______________________________________________ Unsubscribe or switch delivery mode: <http://www.realsoftware.com/support/listmanager/> Search the archives: <http://support.realsoftware.com/listarchives/lists.html>
