I took the new armasm memcpy patch and applied it to my DirectFB-1.1.1 and ran it on my davinci based system using gcc-3.4 and real libc. And it is now faster than the libc. Nice job, the original patch actually slowed things down.
Thank, Craig On Wednesday 25 March 2009 2:50:35 am vince wrote: > Niels, > > Here is a new version of the patch with the second version of memcpy and > a conditional to remove big-endian. > > Let me know if you have any trouble with it. > > Regards, > > Vince > > On Tue, 2009-03-24 at 16:36 +0100, Niels Roest wrote: > > Hi John, > > thanks for the comments, > > just want to mention 1 or 2 things too. > > > > The testing routines do have a single cold, unmeasured, run first to > > rule out previous cache state influence. > > > > The test itself is in fact really simple - a continuous copy of a large > > region. So no repeats. This does focus on the use case that is most > > obvious for DirectFB, namely copying chunks and lines of graphics > > between surfaces, which will normally lead to cache misses anyway. I am > > most concerned about alignment, since this is really unpredictable. > > > > I am not sure if we will benefit much from shuffling the code or using > > different memory regions; you have to remember that the testing routines > > produce a single score only, so these will need to be fine tuned a lot, > > and we may even need to revert to multiple memcpy routines which are > > optimised for multiple use cases. This might be an interesting approach, > > it is one I will follow if performance measurements show that we can > > expect a proper benefit from this - forgetting that DirectFB is mainly > > about hardware acceleration anyway. > > > > For me I am very happy with the changes that Vince made, thanks Vince, > > and if I have a BE/LE lock, I will include the patch. > > > > Greets > > Niels > > > > John Williams wrote: > > > Hi Vince, > > > > > > On Wed, Mar 25, 2009 at 12:57 AM, vince <vi...@bluush.com> wrote: > > >> Ive change my benchmark to invalidate the cache before every test. My > > >> result are the same. Attached is my test program. > > > > > > No worries - just wanted to make sure we weren't missing the obvious! > > > > > > Might also be worth shuffling the sequencing of the tests (armasm, > > > armasm2, libc), see if that has any impact. I'm not intimate with ARM > > > cache details, but with a write-back cache you could be stalling on > > > cacheline evictions later in the test. > > > > > > Another safety would be to perform the tests in different memory > > > regions, with a complete cache flush and invalidate between each run. > > > > > > Not saying there's anything wrong with your code, just know its easy > > > to get false results from simple benchmark code. Memory tests are > > > another one where the obvious approach is often wrong. > > > > > > Cheers, > > > > > > John > > > _______________________________________________ > > > directfb-dev mailing list > > > directfb-dev@directfb.org > > > http://mail.directfb.org/cgi-bin/mailman/listinfo/directfb-dev -- Craig Matsuura - Principal Engineer Control4 11734 South Election Road - Suite 200 Salt Lake City, UT 84020-6432 PH: 801-523-3161 FX: 801-523-3199
_______________________________________________ directfb-dev mailing list directfb-dev@directfb.org http://mail.directfb.org/cgi-bin/mailman/listinfo/directfb-dev