Hi, sorry for the delay, I was working on many other thing at the same time and I was sick. As the conference is the 7-9 mai, I need to finish a draft proposal next week. So thing should advance faster.
2011/4/22 Andreas Kloeckner <[email protected]>: > Hi Frédéric, > > On Fri, 22 Apr 2011 08:35:03 -0400, Frédéric Bastien <[email protected]> wrote: >> Thanks for yours modification. Here is some questions/comments: >> >> I'm not sure I understand the idea in the to_gpu() function that put >> the same stride on the GpuArray as on the host, but we copy the data >> to make it c contiguous before we transfer it. That way the stride in >> the GpuArray won't represent what is on the gpu. In that case, I think >> we should put the stride as the contiguous version of the data. That >> way the stride on the GpuArray represent what is on the gpu, not the >> original strides. > > I disagree. As long as the host data is contiguous (C or Fortran, or in > principle any other way), a straight copy is the fastest and most > logical thing to do, IMO. I agree with what you said. But I was trying to tell something else. Here is an example that should make it more clear what I want to say: import pycuda.autoinit import pycuda.gpuarray import numpy x = numpy.random.rand(5,4) print x.shape #(5, 4) print x.strides #(32, 8) x_strided = numpy.random.rand(5,4)[::2] print x_strided.shape #(3, 4) print x_strided.strides #(64, 8) x_fortran = numpy.asfortranarray(x) print x_fortran.shape #(5, 4) print x_fortran.strides #(8, 40) gx = pycuda.gpuarray.to_gpu(x) assert gx.shape == x.shape assert gx.strides == x.strides assert (gx.get() == x).all() gx_fortran = pycuda.gpuarray.to_gpu(x_fortran) assert gx_fortran.shape == x_fortran.shape assert gx_fortran.strides == x_fortran.strides assert (gx_fortran.get() == x_fortran).all() gx_NOT_strided = pycuda.gpuarray.to_gpu(x_strided) Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/u/bastienf/repos/pycuda.git/build/lib.linux-x86_64-2.5/pycuda/gpuarray.py", line 784, in to_gpu result.set(ary) File "/u/bastienf/repos/pycuda.git/build/lib.linux-x86_64-2.5/pycuda/gpuarray.py", line 222, in set assert self.flags.forc AssertionError assert gx_NOT_strided.shape == x_strided.shape assert gx_NOT_strided.strides != x_strided.strides assert (gx_NOT_strided.get() == x_strided).all() As you see the last case don't work. I think that the attached patch strided_transfer.diff should fix that. >> When I talked about having a strided GpuArray, I was thinking about >> having a new object with a different attribute name for the gpudata >> attribute. The goal is to make it not compatible with current code >> without modification. > > Are you talking about package code or user code? As far as user code is > concerned, the new code accepts the same arrays that the old code did, > and it doesn't change semantics on any one. Thus no need to switch to a > different type, IMO. In general, nothing generates non-contiguous GPU > arrays for now. Suppose that I implement today slicing (gpu_array[::2]) that generate strided gpuarray and modif all pycuda code to check if the input if contiguous before working. Then all in pycuda will be consistent. In that case, there is no code that will use this new gpu_array in pycuda, but error will be raised if it get used. In that case, I still see a problem that will give bad result without error or warning. I think that is problem WILL happen (this is why I continue discussing it). The problem is that if one user made its own gpu code and forget to modify it to check for the stride. Then if he start using the new strided gpuarray with its own code, he will have bad result without error/warning. It is easy to tell that it is the user fault(and it is...) But in the case where there is a few people doing gpu code and many using it(like in our lab) If the user that do the gpu code don't have time to update all the code and some other user start using the strided gpu array, it get harder to tell it is the user fault. Also think of people using the strided gpu array with pycula or other library that tell they support the pycuda.gpuarray. In that case everybody need to update the code at the same time to make this work... I have a simple work around on this. Just do a class that inherit from GpuArray. In the init call GpuArray.__init__ and after that rename the self.gpudata to self.gpunddata. Also, don't allow GpuArray to have stride that are not contiguous. So if someone use old code with strided gpu array, it will raise an error(without good message, but we could document it well). To make it a little bit easier on the interface, we can keep the self.gpudata when the array is contiguous. So people will be able to use the new class with old code if they don't use the stride feature. I prefer 2 arrays and that we deprecate one instead of having silent wrong error computed. As I said, I don't think it is hard or long to do. I will try to do one next week. > Great--please do keep me (and the lists!) in the loop, and if possible, > consider that similar changes are needed in PyOpenCL and PyCUDA. I will keep the list and you updated. I always thinked that PyOpenCL had a similar python interface then PyCUDA. So I think the change to PyCUDA could be redone to PyOpenCL without too much difficulty. But I can be wrong as I didn't used it. >> I think we can compare the current python gpu base array object as >> before numpy exit. i.e. There is many variable that are not directly >> compatible, but very close. > > I disagree--GPUArray was explicitly conceived as a numpy array > workalike, and it'll slowly grow to get closer to that goal. I was refering to the fact that in Theano currently we use our own CudaNdarray object that is not directly compatible with your GpuArray. There is also a different base type in CudaMat. At that level, I still think we need to find a common base object that could be used by everybody in the python gpu community. That will help to make code more portable and will help to keep the community closer(instead of being fragmented as now) thanks Frédéric
commit 55f218ca4aa9d7491dff65158b06642bf18083a9 Author: Frederic <[email protected]> Date: Fri Apr 29 15:11:10 2011 -0400 Fix transfert of strided ndarray to gpu. diff --git a/pycuda/gpuarray.py b/pycuda/gpuarray.py index c4814a9..3553515 100644 --- a/pycuda/gpuarray.py +++ b/pycuda/gpuarray.py @@ -780,6 +780,8 @@ class GPUArray(object): def to_gpu(ary, allocator=drv.mem_alloc): """converts a numpy array to a GPUArray""" + if not ary.flags.forc: + ary = ary.copy() result = GPUArray(ary.shape, ary.dtype, allocator, strides=ary.strides) result.set(ary) return result
_______________________________________________ PyOpenCL mailing list [email protected] http://lists.tiker.net/listinfo/pyopencl
