Below I'm posting the results for running:
python measure_gpuarray_speed_random.py
on a Windows XP desktop with a 9800GT card vs a MacBook with a 9400M card.

Whilst any real speed-ups are likely to depend on the type of problem you're
solving I figure that sharing these numbers might help others decide what
kind of card they might need to experiment with.

I think I'm right in saying that this particular test just fills
increasingly large arrays with random numbers via the GPU and the CPU so
mostly we're looking at memory operations rather than raw processing power?

I'm using:
Python 2.6
pyCUDA 0.94beta (as of the first week of January)
numpy 1.4
Boost 1.38 (Windows) and 1.41 (Mac)
CUDA 2.3

"python measure_gpuarray_speed_random.py" on Windows XP using 9800GT
with an Intel Core 2 Duo CPU at 2.66GHz (though only 1 CPU seems to be
used), 1GB RAM

====
1024
kernel.cu
tmpxft_00000d5c_00000000-3_kernel.cudafe1.gpu
tmpxft_00000d5c_00000000-8_kernel.cudafe2.gpu
kernel.cu
tmpxft_00000b98_00000000-3_kernel.cudafe1.gpu
tmpxft_00000b98_00000000-8_kernel.cudafe2.gpu
2048
<snip>
16777216
Size    |Time GPU        |Size/Time GPU|Time CPU         |Size/Time CPU|GPU
vs CPU speedup
--------+----------------+-------------+-----------------+-------------+------------------
1024    |0.003523625
|290609.812338|4.28882865906e-05|23875982.9642|0.0121716376148
2048    |0.00218711889648|936391.708422|5.3062335968e-05
|38596114.6007|0.0242612946435
4096    |0.0021967746582
|1864551.73484|0.000110264831543|37146930.1924|0.0501939655627
8192
|0.00221661279297|3695728.91846|0.000205230010986|39916189.4531|0.0925872175949
16384
|0.00223460498047|7331944.63594|0.000410583557129|39904179.5891|0.183738763995
32768   |0.00231461450195|14157001.0783|0.00081089276123
|40409782.3617|0.350335989231
65536   |0.00240575585938|27241334.4624|0.00189025       |34670546.224
|0.785719794731
131072  |0.00242032714844|54154662.5565|0.00420794775391
|31148675.7121|1.73858635459
262144  |0.00271729394531|96472448.4269|0.00867364941406 |30223033.868
|3.19201734837
524288  |0.00303069091797|172992896.402|0.0177443378906
|29546777.3005|5.85488206185
1048576 |0.00344940429687|303987561.258|0.035467828125
|29564144.6187|10.282305312
2097152 |0.00441471038818|475037276.65 |0.0707175488281
|29655326.5031|16.0186156305
4194304 |0.0064045324707 |654896203.46 |0.150012763672
|27959647.5482|23.4229062555
8388608 |0.010213684082  |821310697.749|0.278191074219
|30154123.4691|27.2370940774
16777216|0.0179603649902 |934124446.197|0.556249179688
|30161331.6705|30.9709284856
====


"python measure_gpuarray_speed_random.py" on Mac OS X (Leopard) using 9400M
with an Intel Core 2 Duo at 2GHz (again only 1 CPU seems to be used), 2GB
RAM

====
1024
<snip>
16777216
Size    |Time GPU        |Size/Time GPU|Time CPU         |Size/Time CPU|GPU
vs CPU speedup
--------+----------------+-------------+-----------------+-------------+------------------
1024    |0.00362544628906|282447.985256|4.3817150116e-05
|23369844.8505|0.0120860017284
2048
|0.00307543896484|665921.198051|0.000121678268433|16831271.7331|0.0395645206501

4096    |0.00287928857422|1422573.6304
|0.000244288223267|16767079.2526|0.0848432579679
8192
|0.00307496777344|2664092.96083|0.000451083526611|18160716.4011|0.146695367187

16384   |0.00307557250977|5327138.26384|0.000869509094238|18842816.146
|0.282714548747
32768   |0.0033080065918 |9905663.45341|0.00175915466309
|18627128.5224|0.531786927949
65536   |0.00350910644531|18675979.4897|0.00342910791016
|19111676.1902|0.977202590915
131072  |0.00442680712891|29608699.0427|0.00722674365234
|18137076.1584|1.63249571122
262144  |0.00605847070312|43269005.141 |0.0149418457031
|17544285.0374|2.46627349298
524288  |0.00753005224609|69626077.3319|0.0304311484375
|17228662.9628|4.04129313356
1048576 |0.012277375     |85407181.9098|0.0603436679688
|17376736.2061|4.91503012401
2097152 |0.0224226953125 |93528096.0104|0.133947558594
|15656515.2961|5.97374921823
4194304 |0.0410210205078 |102247675.657|0.242842695312
|17271691.0204|5.91995743417
8388608 |0.0780941503906 |107416598.529|0.482362148437
|17390684.6281|6.17667451435
16777216|0.153209472656  |109505082.872|0.949726328125   |17665316.316
|6.19887472791
====

Conclusion?

Is it ok to say that the 9800GT GPU is about 10* faster than the 9400M for
this test for the 16777216 problem (the time taken is 0.017 vs 0.15)?  Is
the timing dependent on the bus speed at all, or is the GPU time purely down
to the GPU's speed?

Cheers,
Ian.

-- 
Ian Ozsvald (Professional Screencaster)
[email protected]

http://ProCasts.co.uk/examples.html
http://TheScreencastingHandbook.com
http://IanOzsvald.com + http://ShowMeDo.com
http://twitter.com/ianozsvald
_______________________________________________
PyCUDA mailing list
[email protected]
http://tiker.net/mailman/listinfo/pycuda_tiker.net

Reply via email to