Which config.h are you refering to in the armasm_memcpy.S? Craig
On Wednesday 25 March 2009 12:55:07 pm Craig Matsuura wrote: > I took the new armasm memcpy patch and applied it to my DirectFB-1.1.1 and > ran it on my davinci based system using gcc-3.4 and real libc. And it is > now faster than the libc. Nice job, the original patch actually slowed > things down. > > Thank, > Craig > > On Wednesday 25 March 2009 2:50:35 am vince wrote: > > Niels, > > > > Here is a new version of the patch with the second version of memcpy and > > a conditional to remove big-endian. > > > > Let me know if you have any trouble with it. > > > > Regards, > > > > Vince > > > > On Tue, 2009-03-24 at 16:36 +0100, Niels Roest wrote: > > > Hi John, > > > thanks for the comments, > > > just want to mention 1 or 2 things too. > > > > > > The testing routines do have a single cold, unmeasured, run first to > > > rule out previous cache state influence. > > > > > > The test itself is in fact really simple - a continuous copy of a large > > > region. So no repeats. This does focus on the use case that is most > > > obvious for DirectFB, namely copying chunks and lines of graphics > > > between surfaces, which will normally lead to cache misses anyway. I am > > > most concerned about alignment, since this is really unpredictable. > > > > > > I am not sure if we will benefit much from shuffling the code or using > > > different memory regions; you have to remember that the testing > > > routines produce a single score only, so these will need to be fine > > > tuned a lot, and we may even need to revert to multiple memcpy routines > > > which are optimised for multiple use cases. This might be an > > > interesting approach, it is one I will follow if performance > > > measurements show that we can expect a proper benefit from this - > > > forgetting that DirectFB is mainly about hardware acceleration anyway. > > > > > > For me I am very happy with the changes that Vince made, thanks Vince, > > > and if I have a BE/LE lock, I will include the patch. > > > > > > Greets > > > Niels > > > > > > John Williams wrote: > > > > Hi Vince, > > > > > > > > On Wed, Mar 25, 2009 at 12:57 AM, vince <vi...@bluush.com> wrote: > > > >> Ive change my benchmark to invalidate the cache before every test. > > > >> My result are the same. Attached is my test program. > > > > > > > > No worries - just wanted to make sure we weren't missing the obvious! > > > > > > > > Might also be worth shuffling the sequencing of the tests (armasm, > > > > armasm2, libc), see if that has any impact. I'm not intimate with > > > > ARM cache details, but with a write-back cache you could be stalling > > > > on cacheline evictions later in the test. > > > > > > > > Another safety would be to perform the tests in different memory > > > > regions, with a complete cache flush and invalidate between each run. > > > > > > > > Not saying there's anything wrong with your code, just know its easy > > > > to get false results from simple benchmark code. Memory tests are > > > > another one where the obvious approach is often wrong. > > > > > > > > Cheers, > > > > > > > > John > > > > _______________________________________________ > > > > directfb-dev mailing list > > > > directfb-dev@directfb.org > > > > http://mail.directfb.org/cgi-bin/mailman/listinfo/directfb-dev -- Craig Matsuura - Principal Engineer Control4 11734 South Election Road - Suite 200 Salt Lake City, UT 84020-6432 PH: 801-523-3161 FX: 801-523-3199
_______________________________________________ directfb-dev mailing list directfb-dev@directfb.org http://mail.directfb.org/cgi-bin/mailman/listinfo/directfb-dev