Simon Pickering wrote:

I don't know what version flash image that c760 was running, but my c750
produces better results. I compiled your test program in two versions - arm4
and arm5 using the default toolchains built by bitbake + OpenEmbedded for
the Zaurus sl-5500 and c7x0 machines respectively.

I'll ask for this information.

Note that OpenZaurus uses significantly more up-to-date versions of
libraries, etc. than the standard Sharp images, so this probably accounts
for the difference in speed that you see below. If you want more info on the
actual lib versions and patches then let me know.

I ran these two binaries on my Zaurus sl-5500 and Zaurus sl-c750 (both
running the latest OpenZaurus 3.5.4 flash image), and on my Nokia-770. The
arm5 binary ran fine on the arm4 arch sl-5500, obviously there were no arm5
instructions included, but the times are slightly different - I don't know
whether this is to do with background processes or something to do with the
compiler (though I don't suppose the compiler really has much to do in this
case.)

Results as follows:

[results skipped]


Compiling for arm4 or arm5 should not make any difference as all the
code that is benchmarked is not generated by compiler anyway (standard
functions are in standard libraries, tested functions are implemented as
inline assembler). Slight times differences should be ignored and are
just random deviations (+-1MB/s does not change overall picture much).

From these results looks like that:

Optimized memset works good on all tested platforms, providing the
same or much better results. Memset performance is critical for clearing
bitmaps to some color and drawing rectangles, so optimizing it makes
sense.

The same code for memcpy works good for Nokia 770 and StrongARM, but
XScale needs different optimizations. Seems like reading memory is
important here, maybe prefetch (PLD instruction) could improve
performance. I tried using prefetch when writing optimized memcpy for
Nokia 770, but it did not have any effect at all. But prefetch requires
armv5, so it affects portability.

An interesting observation is standard memset vs. memcpy performance on
StrongARM. In spite of doing more work, memcpy is even faster :)

It would be interesting to make some tests on PXA270 too.

_______________________________________________
maemo-developers mailing list
maemo-developers@maemo.org
https://maemo.org/mailman/listinfo/maemo-developers

Reply via email to