Re: [PyOpenCL] [PyCUDA] gpuarray.to_gpu() strange behavior

Frédéric Bastien Fri, 29 Apr 2011 12:53:43 -0700

Hi,

sorry for the delay, I was working on many other thing at the same
time and I was sick. As the conference is the 7-9 mai, I need to
finish a draft proposal next week. So thing should advance faster.

2011/4/22 Andreas Kloeckner <[email protected]>:
> Hi Frédéric,
>
> On Fri, 22 Apr 2011 08:35:03 -0400, Frédéric Bastien <[email protected]> wrote:
>> Thanks for yours modification. Here is some questions/comments:
>>
>> I'm not sure I understand the idea in the to_gpu() function that put
>> the same stride on the GpuArray as on the host, but we copy the data
>> to make it c contiguous before we transfer it. That way the stride in
>> the GpuArray won't represent what is on the gpu. In that case, I think
>> we should put the stride as the contiguous version of the data. That
>> way the stride on the GpuArray represent what is on the gpu, not the
>> original strides.
>
> I disagree. As long as the host data is contiguous (C or Fortran, or in
> principle any other way), a straight copy is the fastest and most
> logical thing to do, IMO.

I agree with what you said. But I was trying to tell something else.
Here is an example that should make it more clear what I want to say:

import pycuda.autoinit
import pycuda.gpuarray
import numpy
x = numpy.random.rand(5,4)
print x.shape #(5, 4)
print x.strides #(32, 8)
x_strided = numpy.random.rand(5,4)[::2]
print x_strided.shape #(3, 4)
print x_strided.strides #(64, 8)
x_fortran = numpy.asfortranarray(x)
print x_fortran.shape #(5, 4)
print x_fortran.strides #(8, 40)

gx = pycuda.gpuarray.to_gpu(x)
assert gx.shape == x.shape
assert gx.strides == x.strides
assert (gx.get() == x).all()
gx_fortran = pycuda.gpuarray.to_gpu(x_fortran)
assert gx_fortran.shape == x_fortran.shape
assert gx_fortran.strides == x_fortran.strides
assert (gx_fortran.get() == x_fortran).all()

gx_NOT_strided = pycuda.gpuarray.to_gpu(x_strided)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File 
"/u/bastienf/repos/pycuda.git/build/lib.linux-x86_64-2.5/pycuda/gpuarray.py",
line 784, in to_gpu
    result.set(ary)
  File 
"/u/bastienf/repos/pycuda.git/build/lib.linux-x86_64-2.5/pycuda/gpuarray.py",
line 222, in set
    assert self.flags.forc
AssertionError
assert gx_NOT_strided.shape == x_strided.shape
assert gx_NOT_strided.strides != x_strided.strides
assert (gx_NOT_strided.get() == x_strided).all()

As you see the last case don't work. I think that the attached patch
strided_transfer.diff should fix that.

>> When I talked about having a strided GpuArray, I was thinking about
>> having a new object with a different attribute name for the gpudata
>> attribute. The goal is to make it not compatible with current code
>> without modification.
>
> Are you talking about package code or user code? As far as user code is
> concerned, the new code accepts the same arrays that the old code did,
> and it doesn't change semantics on any one. Thus no need to switch to a
> different type, IMO. In general, nothing generates non-contiguous GPU
> arrays for now.

Suppose that I implement today slicing (gpu_array[::2]) that generate
strided gpuarray and modif all pycuda code to check if the input if
contiguous before working. Then all in pycuda will be consistent. In
that case, there is no code that will use this new gpu_array in
pycuda, but error will be raised if it get used.

In that case, I still see a problem that will give bad result without
error or warning. I think that is problem WILL happen (this is why I
continue discussing it).

The problem is that if one user made its own gpu code and forget to
modify it to check for the stride. Then if he start using the new
strided gpuarray with its own code, he will have bad result without
error/warning. It is easy to tell that it is the user fault(and it
is...) But in the case where there is a few people doing gpu code and
many using it(like in our lab) If the user that do the gpu code don't
have time to update all the code and some other user start using the
strided gpu array, it get harder to tell it is the user fault.

Also think of people using the strided gpu array with pycula or other
library that tell they support the pycuda.gpuarray. In that case
everybody need to update the code at the same time to make this
work...

I have a simple work around on this. Just do a class that inherit from
GpuArray. In the init call GpuArray.__init__ and after that rename the
self.gpudata to self.gpunddata. Also, don't allow GpuArray to have
stride that are not contiguous. So if someone use old code with
strided gpu array, it will raise an error(without good message, but we
could document it well). To make it a little bit easier on the
interface, we can keep the self.gpudata when the array is contiguous.
So people will be able to use the new class with old code if they
don't use the stride feature.

I prefer 2 arrays and that we deprecate one instead of having silent
wrong error computed. As I said, I don't think it is hard or long to
do. I will try to do one next week.

> Great--please do keep me (and the lists!) in the loop, and if possible,
> consider that similar changes are needed in PyOpenCL and PyCUDA.

I will keep the list and you updated. I always thinked that PyOpenCL
had a similar python interface then PyCUDA. So I think the change to
PyCUDA could be redone to PyOpenCL without too much difficulty. But I
can be wrong as I didn't used it.

>> I think we can compare the current python gpu base array object as
>> before numpy exit. i.e. There is many variable that are not directly
>> compatible, but very close.
>
> I disagree--GPUArray was explicitly conceived as a numpy array
> workalike, and it'll slowly grow to get closer to that goal.

I was refering to the fact that in Theano currently we use our own
CudaNdarray object that is not directly compatible with your GpuArray.
There is also a different base type in CudaMat. At that level, I still
think we need to find a common base object that could be used by
everybody in the python gpu community. That will help to make code
more portable and will help to keep the community closer(instead of
being fragmented as now)

thanks

Frédéric

commit 55f218ca4aa9d7491dff65158b06642bf18083a9
Author: Frederic <[email protected]>
Date:   Fri Apr 29 15:11:10 2011 -0400

    Fix transfert of strided ndarray to gpu.

diff --git a/pycuda/gpuarray.py b/pycuda/gpuarray.py
index c4814a9..3553515 100644
--- a/pycuda/gpuarray.py
+++ b/pycuda/gpuarray.py
@@ -780,6 +780,8 @@ class GPUArray(object):
 
 def to_gpu(ary, allocator=drv.mem_alloc):
     """converts a numpy array to a GPUArray"""
+    if not ary.flags.forc:
+        ary = ary.copy()
     result = GPUArray(ary.shape, ary.dtype, allocator, strides=ary.strides)
     result.set(ary)
     return result

_______________________________________________
PyOpenCL mailing list
[email protected]
http://lists.tiker.net/listinfo/pyopencl

Re: [PyOpenCL] [PyCUDA] gpuarray.to_gpu() strange behavior

Reply via email to