Re: [PyCUDA] GPUArray slices in CUDA Kernels

Bryan Catanzaro Thu, 05 Aug 2010 17:25:33 -0700

No problem.  It seems that I didn't explain myself clearly enough - my problem 
comes when using slices of GPUArray objects in C++ code outside of PyCUDA.  
After some looking around, I discovered the PointerHolderBase class, which 
solves this particular problem - I just have to ensure that the .gpudata member 
of a GPUArray is always either a DeviceAllocation object, or something derived 
from PointerHolderBase.  Then my C++ code which tries to extract<CUdeviceptr> 
is happy.  
So I'm satisfied and my earlier patch is unnecessary.


Thanks,
bryan

On Aug 2, 2010, at 6:21 PM, Andreas Kloeckner wrote:

> Hey Bryan,
> 
> first of all sorry for the late reply. I just returned from a crazy
> sequence of trips and am now working my way through the backlog...
> 
> On Wed, 21 Jul 2010 20:54:14 -0700, Bryan Catanzaro 
> <catan...@eecs.berkeley.edu> wrote:
>> Using numpy.intp works for kernels launched by pycuda.  But it doesn't
>> work for Pycuda memcpy, which complains that numpy.intp (, which seems
>> aliased to numpy.int32 on my machine,) doesn't match the c++
>> signature:
>> 
>> Boost.Python.ArgumentError: Python argument types in
>>    pycuda._driver.memcpy_dtod(numpy.int32, DeviceAllocation, int)
>> did not match C++ signature:
>>    memcpy_dtod(unsigned int dest, unsigned int src, unsigned int size)
> 
> Ok, I'm beginning to understand what's at work here--Boost Python simply
> doesn't like numpy array scalars as integer arguments. My pyublas module
> fixes that to some extent, but that's really not viable here. Kernel
> arguments go through the buffer interface, so they are an entirely
> different affair.
> 
> At first, I was leaning towards believing that's a bug, but the more I
> thought about it, the less I could actually believe it. If you use the
> following guidelines, I don't think you should run into trouble:
> 
> - When storing GPU pointers in your code, use either DeviceAllocation
>  objects or bare Python int (or long) objects. These two should be
>  interchangeable in all things PyCUDA where a device pointer is
>  requested. If you need arithmetic to work, add int() casts, but be
>  aware that killing the DeviceAllocation makes your memory go away. Do
>  not use numpy scalars to store pointers.
> 
> - IF you are using the unprepared kernel invocation syntax (and only
>  then), you need to convert pointers to numpy.intp to pass as
>  arguments.
> 
> I believe that if you use these guidelines, you should be ok.
> Anything I'm overlooking?
> 
>> And it doesn't work for my own C++ functions (which call CUDA
>> functions), which I'm interoperating with Pycuda.  They also expect to
>> be able to extract a pointer out of gpudata, and when they get a
>> numpy.int32, they die.
> 
> See above--is anything requiring that you use numpy scalars?
> 
>> The patch I attached yesterday goes a step in that direction, but
>> ultimately what I really want is a C++ implementation of GPUArray.  =)
> 
> With all the code generation going on in GPUArray, I actually highly
> doubt you want that, but ok. :)
> 
> Andreas


_______________________________________________
PyCUDA mailing list
PyCUDA@tiker.net
http://lists.tiker.net/listinfo/pycuda

Re: [PyCUDA] GPUArray slices in CUDA Kernels

Reply via email to