hi andreas, thanks for the info and the pointer to scikits.cuda.
regards, bryan On Sun, Oct 7, 2012 at 3:58 PM, Andreas Kloeckner <[email protected]>wrote: > Hi Bryan, > > "W. Bryan Smith" <[email protected]> writes: > > i am just getting started with pycuda, and wanted to check the > performance > > on matrix multiplication. > > > > i copied the demo at > > http://wiki.tiker.net/PyCuda/Examples/DemoMetaMatrixmulCheetah, and i > can > > get it to run just fine. but the performance, as measured by the > returned > > gputime, is consistently a little slower than using numpy's builtin > > linalg.dot() function. i have left all the default settings as defined > in > > the demo file, and on a test matrix of size (10000,250) the pycuda > version > > of the inner product takes about 10-15% longer than the numpy version. > are > > there default settings i can tweak to make this faster? or, > alternatively, > > is there something else i should be doing to test this? > > > > I am running the CUDA-5.0 libraries, pycuda 2012.1, and Cheetah 2.4.4 on > OS > > X 10.7.4 > > First, DemoMetaMatrixmulCheetah is (GT200-generation, IIRC) a > demo. PyCUDA per se does not come with an optimize matmul > implementation, but scikits.cuda wraps CUBLAS and should give you > competitive performance. > > > http://lebedov.github.com/scikits.cuda/generated/scikits.cuda.linalg.dot.html > > Also, the memory error that you saw was likely due to a refcounting bug > that was recently fixed in git. > > Andreas >
_______________________________________________ PyCUDA mailing list [email protected] http://lists.tiker.net/listinfo/pycuda
