> > Now I just really badly want to see the benchmark results from some 
> > other cpu, preferably intel xscale :)
> 
> Just got report from running my test on Sharp Zaurus SL-C760:
> 
> --- running correctness tests ---
> all the correctness tests passed
> --- running performance tests (memory bandwidth benchmark) ---:
> memset() memory bandwidth: 80.35MB/s
> memset8() memory bandwidth: 83.55MB/s
> memcpy() memory bandwidth (perfectly aligned): 45.29MB/s
> memcpy16() memory bandwidth (perfectly aligned): 45.20MB/s
> memcpy() memory bandwidth (16-bit aligned): 43.15MB/s
> memcpy16() memory bandwidth (16-bit aligned): 38.27MB/s
> --- testing performance for random blocks (size 0-15 bytes) 
> --- memset time: 0.960
> memset8 time: 0.880
> --- testing performance for random blocks (size 0-511 bytes) 
> --- memset time: 3.840
> memset8 time: 3.670
> 
> So memory copying functions on Zaurus are already optimal for 
> this Zaurus and my implementation only causes performance 
> degradation :)
> 

I don't know what version flash image that c760 was running, but my c750
produces better results. I compiled your test program in two versions - arm4
and arm5 using the default toolchains built by bitbake + OpenEmbedded for
the Zaurus sl-5500 and c7x0 machines respectively.

Note that OpenZaurus uses significantly more up-to-date versions of
libraries, etc. than the standard Sharp images, so this probably accounts
for the difference in speed that you see below. If you want more info on the
actual lib versions and patches then let me know.

I ran these two binaries on my Zaurus sl-5500 and Zaurus sl-c750 (both
running the latest OpenZaurus 3.5.4 flash image), and on my Nokia-770. The
arm5 binary ran fine on the arm4 arch sl-5500, obviously there were no arm5
instructions included, but the times are slightly different - I don't know
whether this is to do with background processes or something to do with the
compiler (though I don't suppose the compiler really has much to do in this
case.)

Results as follows:

================================================================
================================================================
Sharp Zaurus sl-C750 (c7x0/Shepherd)
XScale-PXA255 rev 6 (v5l), 400MHz
================================================================
================================================================

[EMAIL PROTECTED]:/media/cf/other# ./arm5-fastmem-arm-test 
--- running correctness tests ---
all the correctness tests passed
--- running performance tests (memory bandwidth benchmark) ---:
memset() memory bandwidth: 182.36MB/s
memset8() memory bandwidth: 182.36MB/s
memcpy() memory bandwidth (perfectly aligned): 80.04MB/s
memcpy16() memory bandwidth (perfectly aligned): 34.49MB/s
memcpy() memory bandwidth (16-bit aligned): 73.07MB/s
memcpy16() memory bandwidth (16-bit aligned): 31.02MB/s
--- testing performance for random blocks (size 0-15 bytes) ---
memset time: 0.820
memset8 time: 0.750
--- testing performance for random blocks (size 0-511 bytes) ---
memset time: 2.080
memset8 time: 2.060

================================================================

[EMAIL PROTECTED]:/media/cf/other# ./arm4-fastmem-arm-test 
--- running correctness tests ---
all the correctness tests passed
--- running performance tests (memory bandwidth benchmark) ---:
memset() memory bandwidth: 183.96MB/s
memset8() memory bandwidth: 182.36MB/s
memcpy() memory bandwidth (perfectly aligned): 81.92MB/s
memcpy16() memory bandwidth (perfectly aligned): 34.89MB/s
memcpy() memory bandwidth (16-bit aligned): 74.63MB/s
memcpy16() memory bandwidth (16-bit aligned): 31.35MB/s
--- testing performance for random blocks (size 0-15 bytes) ---
memset time: 0.790
memset8 time: 0.720
--- testing performance for random blocks (size 0-511 bytes) ---
memset time: 2.060
memset8 time: 2.060




================================================================
================================================================
Sharp Zaurus sl-5500 (Collie)
StrongARM-1110 rev 9 (v4l) 206MHz
================================================================
================================================================

[EMAIL PROTECTED]:/media/cf/other# ./arm5-fastmem-arm-test 
--- running correctness tests ---
all the correctness tests passed
--- running performance tests (memory bandwidth benchmark) ---:
memset() memory bandwidth: 35.67MB/s
memset8() memory bandwidth: 101.80MB/s
memcpy() memory bandwidth (perfectly aligned): 59.07MB/s
memcpy16() memory bandwidth (perfectly aligned): 59.24MB/s
memcpy() memory bandwidth (16-bit aligned): 48.88MB/s
memcpy16() memory bandwidth (16-bit aligned): 59.24MB/s
--- testing performance for random blocks (size 0-15 bytes) ---
memset time: 0.740
memset8 time: 0.540
--- testing performance for random blocks (size 0-511 bytes) ---
memset time: 7.840
memset8 time: 3.090

================================================================

[EMAIL PROTECTED]:/media/cf/other# ./arm4-fastmem-arm-test 
--- running correctness tests ---
all the correctness tests passed
--- running performance tests (memory bandwidth benchmark) ---:
memset() memory bandwidth: 35.67MB/s
memset8() memory bandwidth: 101.80MB/s
memcpy() memory bandwidth (perfectly aligned): 59.07MB/s
memcpy16() memory bandwidth (perfectly aligned): 58.91MB/s
memcpy() memory bandwidth (16-bit aligned): 49.00MB/s
memcpy16() memory bandwidth (16-bit aligned): 59.07MB/s
--- testing performance for random blocks (size 0-15 bytes) ---
memset time: 0.730
memset8 time: 0.540
--- testing performance for random blocks (size 0-511 bytes) ---
memset time: 7.850
memset8 time: 3.100




================================================================
================================================================
Nokia N770
ARM926EJ-Sid(wb) rev 3 (v5l) 200MHz
OMAP1710 ? 
================================================================
================================================================

./arm5-fastmem-arm-test 
--- running correctness tests ---
all the correctness tests passed
--- running performance tests (memory bandwidth benchmark) ---:
memset() memory bandwidth: 117.16MB/s
memset8() memory bandwidth: 262.14MB/s
memcpy() memory bandwidth (perfectly aligned): 102.30MB/s
memcpy16() memory bandwidth (perfectly aligned): 110.96MB/s
memcpy() memory bandwidth (16-bit aligned): 69.21MB/s
memcpy16() memory bandwidth (16-bit aligned): 99.39MB/s
--- testing performance for random blocks (size 0-15 bytes) ---
memset time: 0.400
memset8 time: 0.280
--- testing performance for random blocks (size 0-511 bytes) ---
memset time: 2.430
memset8 time: 1.190

================================================================

./arm4-fastmem-arm-test 
--- running correctness tests ---
all the correctness tests passed
--- running performance tests (memory bandwidth benchmark) ---:
memset() memory bandwidth: 119.16MB/s
memset8() memory bandwidth: 265.46MB/s
memcpy() memory bandwidth (perfectly aligned): 100.82MB/s
memcpy16() memory bandwidth (perfectly aligned): 109.80MB/s
memcpy() memory bandwidth (16-bit aligned): 68.53MB/s
memcpy16() memory bandwidth (16-bit aligned): 98.46MB/s
--- testing performance for random blocks (size 0-15 bytes) ---
memset time: 0.400
memset8 time: 0.280
--- testing performance for random blocks (size 0-511 bytes) ---
memset time: 2.430
memset8 time: 1.170

================================================================

Cheers,


Simon

_______________________________________________
maemo-developers mailing list
maemo-developers@maemo.org
https://maemo.org/mailman/listinfo/maemo-developers

Reply via email to