Siarhei Siamashka wrote:

> ...
It is strange that such 16-byte alignment trick was neither used in
uclibc nor in glibc until now. One more option is that this improvement
is only Nokia 770 specific and nobody else ever encountered it or had to
use. Well, do we really care anyway? ;)

Now I just really badly want to see the benchmark results from some
other cpu, preferably intel xscale :)

Just got report from running my test on Sharp Zaurus SL-C760:

--- running correctness tests ---
all the correctness tests passed
--- running performance tests (memory bandwidth benchmark) ---:
memset() memory bandwidth: 80.35MB/s
memset8() memory bandwidth: 83.55MB/s
memcpy() memory bandwidth (perfectly aligned): 45.29MB/s
memcpy16() memory bandwidth (perfectly aligned): 45.20MB/s
memcpy() memory bandwidth (16-bit aligned): 43.15MB/s
memcpy16() memory bandwidth (16-bit aligned): 38.27MB/s
--- testing performance for random blocks (size 0-15 bytes) ---
memset time: 0.960
memset8 time: 0.880
--- testing performance for random blocks (size 0-511 bytes) ---
memset time: 3.840
memset8 time: 3.670

So memory copying functions on Zaurus are already optimal for this
Zaurus and my implementation only causes performance degradation :)

There are two possibilities now:
1. This particular Zaurus has a much better memcpy implementation worth
   looking at
2. Optimizations for memcpy are very cpu dependant and good code for
   Nokia does not necessery work good for Zaurus and vice versa.

PS. Nokia seems to have a much faster memory than Zaurus by the way :)


_______________________________________________
maemo-developers mailing list
maemo-developers@maemo.org
https://maemo.org/mailman/listinfo/maemo-developers

Reply via email to