Dnia 2010-09-28, wto o godzinie 00:29 -0700, jmcarval pisze:
> Thanks for your reply.
> I've read the first thread you mention, that ends without a solution
> http://pycuda.2962900.n2.nabble.com/PyCUDA-pycuda-test-failures-tp5320194p5320194.html
> 
> Maybe I'm doing a huge mistake but it does not seem to be a precision
> detail.
> The following code (a simplification of test_gpuarray), returns 30 from the
> CPU and 14 from the GTX480, either with integer, float32 or float64.
> I don't get it. Can anybody explain me what I'm doing wrong please?
> Thanks
> 
> import pycuda.autoinit
> import numpy
> import pycuda.gpuarray as gpuarray
> from pycuda.curandom import rand as curand
> 
> a = numpy.array([1,2,3,4])#.astype(numpy.float32)
> a_gpu = gpuarray.to_gpu(a)
> b = a
> b_gpu = gpuarray.to_gpu(b)
> 
> dot_ab = numpy.dot(a, b)
> 
> dot_ab_gpu = gpuarray.dot(a_gpu, b_gpu).get()
> 
> print "CPU dot product:", dot_ab
> print "GPU dot product:", dot_ab_gpu
> 
> 

I have idea for (maybe) checking whether problem is with PyCUDA,
CUDA toolkit, or driver.
Can you force PyCUDA to generate not sm_20 code, but 1x?
I have found that it is determined in line 190 of file
pycuda/compiler.py:
arch = "sm_%d%d" % Context.get_device().compute_capability()
Try to change it to
arch = "sm_10"
and so on, and check whether you get incorrect 14 in such
a case.

If there is simpler way of changing architecture to which
PyCUDA generates code, feel free to use it and share this
information.

Regards.

-- 
Tomasz Rybak <bogom...@post.pl> GPG/PGP key ID: 2AD5 9860
Fingerprint A481 824E 7DD3 9C0E C40A  488E C654 FB33 2AD5 9860
http://member.acm.org/~tomaszrybak

Attachment: signature.asc
Description: This is a digitally signed message part

_______________________________________________
PyCUDA mailing list
PyCUDA@tiker.net
http://lists.tiker.net/listinfo/pycuda

Reply via email to