hi andreas,

thanks for the info and the pointer to scikits.cuda.

regards,
bryan

On Sun, Oct 7, 2012 at 3:58 PM, Andreas Kloeckner
<[email protected]>wrote:

> Hi Bryan,
>
> "W. Bryan Smith" <[email protected]> writes:
> > i am just getting started with pycuda, and wanted to check the
> performance
> > on matrix multiplication.
> >
> > i copied the demo at
> > http://wiki.tiker.net/PyCuda/Examples/DemoMetaMatrixmulCheetah, and i
> can
> > get it to run just fine.  but the performance, as measured by the
> returned
> > gputime, is consistently a little slower than using numpy's builtin
> > linalg.dot() function.  i have left all the default settings as defined
> in
> > the demo file, and on a test matrix of size (10000,250) the pycuda
> version
> > of the inner product takes about 10-15% longer than the numpy version.
>  are
> > there default settings i can tweak to make this faster?  or,
> alternatively,
> > is there something else i should be doing to test this?
> >
> > I am running the CUDA-5.0 libraries, pycuda 2012.1, and Cheetah 2.4.4 on
> OS
> > X 10.7.4
>
> First, DemoMetaMatrixmulCheetah is (GT200-generation, IIRC) a
> demo. PyCUDA per se does not come with an optimize matmul
> implementation, but scikits.cuda wraps CUBLAS and should give you
> competitive performance.
>
>
> http://lebedov.github.com/scikits.cuda/generated/scikits.cuda.linalg.dot.html
>
> Also, the memory error that you saw was likely due to a refcounting bug
> that was recently fixed in git.
>
> Andreas
>
_______________________________________________
PyCUDA mailing list
[email protected]
http://lists.tiker.net/listinfo/pycuda

Reply via email to