On Sonntag 01 März 2009, you wrote:
> > What's the failure? If it's something non-intuitive, we should catch it
> > in PyCuda and give a nicer warning.
>
> The failure is the wrong data is transferred to the kernel; it appeared to
> be something like the array transposed (which, needless to say, can be very
> bad, particularly if loop bounds are taken from corrupted memory).

numpy supports arbitrary strides in its arrays, which, among other things, can 
make them column- or row-major (ie. have Fortran or C order). GPUArray 
currently has no stride support whatsoever. In the long run, having stride 
support in GPUArray would likely be desirable. Introducing strides would allow 
us to introduce indexing in the same way that numpy allows.

Further, numpy allows many types of funky arrays (non-contiguous, for 
example). PyCuda currently does very little to support these funky arrays, but 
at least it doesn't behave incorrectly:

>>> import pycuda.autoinit
>>> import pycuda.gpuarray as ga
>>> import numpy
>>> z = numpy.zeros((10,10), dtype=numpy.float32)
>>> ga.to_gpu(z[:,2:3])
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/kloeckner/src/env/lib/python2.5/site-packages/pycuda-0.93beta-
py2.5-linux-x86_64.egg/pycuda/gpuarray.py", line 401, in to_gpu
    result.set(ary, stream)
  File "/home/kloeckner/src/env/lib/python2.5/site-packages/pycuda-0.93beta-
py2.5-linux-x86_64.egg/pycuda/gpuarray.py", line 91, in set
    drv.memcpy_htod(self.gpudata, ary, stream)
TypeError: expected a single-segment buffer object

This is easy to work around for now--a simple .copy() and things work.

> Looks like C_CONTIGUOUS is what we're looking for. The numpy documentation
> mentions this and a possibly applicable function call:
> http://numpy.scipy.org/numpydoc/numpy-13.html#marker-59740

In a sense, PyCuda merely did what it was asked to do, which is transfer the 
numpy array in the exact same layout that it had on the host. On the one hand, 
I intentionally transfer Fortran-layout arrays onto the GPU in some of my 
code, and I think that's perfectly fine behavior.

You have a point in that, at present, none of the stride information in the 
numpy array is preserved in a GPUArray copy, which means that 
gpuarray.to_gpu(a).get() may result in many funny things, but only for C-
contiguous arrays will you get back out what you put in. This is a bug and 
needs to be fixed, but the fix would likely be a part of the stride 
implementation cited above.

If, in the meantime, you want to phrase a warning for the documentation, I'd 
be happy to merge that.

Andreas

Attachment: signature.asc
Description: This is a digitally signed message part.

_______________________________________________
PyCuda mailing list
[email protected]
http://tiker.net/mailman/listinfo/pycuda_tiker.net

Reply via email to