I am trying to learn the
http://wiki.tiker.net/PyCuda/Examples/MatrixmulSimple and its working
so far but for only smaller size matrix. When I increase the size of
the matrix the CPU and GPU values diverge as far as 5.9e+01.

I suspect its due to block and grid parameters I need to pass to
matrixmul(). Is that correct? How can I pick the most optimal values?

Or is there something else I should be considering?

My matrix size is 10000x3

_______________________________________________
PyCUDA mailing list
PyCUDA@tiker.net
http://lists.tiker.net/listinfo/pycuda

Reply via email to