On Fri, Oct 7, 2011 at 4:00 PM, Andreas Kloeckner
<li...@informa.tiker.net> wrote:
> On Fri, 7 Oct 2011 15:38:59 +0200, Nick Rayrider 
> <nick.rayri...@googlemail.com> wrote:
>> On Thu, Oct 6, 2011 at 2:47 PM, Andreas Kloeckner
>> <li...@informa.tiker.net> wrote:
>> > On Thu, 6 Oct 2011 12:25:12 +0200, Nick Rayrider 
>> > <nick.rayri...@googlemail.com> wrote:
>> >> Hi,
>> >>
>> >> first thanks for this fine piece of software.
>> >>
>> >> Optimizing my kernels, the nvidia's visual profiler recommended, that
>> >> I should use more pinned memory. I read the PyCUDA documentation [1]
>> >> and tried to understand the sparse solve example [3], but I could not
>> >> make out how to turn an existing numpy array into pinned memory. I did
>> >> not find further examples of PageLockedMemoryPool [2].
>> >
>> > pool = PageLockedMemoryPool()
>> > empty_pinned_array = pool.allocate((300, 300), np.float64)
>> >
>> > Now empty_pinned_array is backed by pinned storage, and when you memcpy
>> > to/from it, it'll go faster.
>>
>> Thanks for the fast answer. I read that you shouldn't use in() as it
>> performs a copy [1], so I tried following version to no avail.
>> What am I missing? Problably something about how numnpy and pycuda
>> handle pointers...
>>
>> from pycuda.tools import PageLockedMemoryPool
>> pool = PageLockedMemoryPool()
>> empty_pinned_array = pool.allocate(data.shape,np.float32)
>> empty_pinned_array = gpuarray.to_gpu(data)
>> my_kernel(empty_pinned_array,... )
>>
>> [1] http://lists.tiker.net/pipermail/pycuda/2009-August/001784.html
>
> If you'd like to pass a GPUArray to a kernel, you need to pass
> empty_pinned_array.gpudata.

I think I am totally on the wrong path. I attached what I have so far.
The example compiles and 'calculates' the correct results, but the
visual profiler still says that the "host mem transfer" is still
pageable.

#!/usr/bin/env python
# -*- coding: utf-8 -*-

import numpy as np
from pycuda import driver, compiler, gpuarray
from pycuda.tools import PageLockedMemoryPool
import pycuda.autoinit

pool = PageLockedMemoryPool()

data = np.random.randn(5).astype(np.float32)
odata = gpuarray.zeros(5, dtype=np.float32)
print data

empty_pinned_array = pool.allocate(data.shape,np.float32)
empty_pinned_array = gpuarray.to_gpu(data)

kernel_code = """
// the kernel definition
  __global__ void kernel(float *data, float *odata) {
  int index = blockIdx.x * blockDim.x + threadIdx.x ;
  odata[index] = data[index]+1.0f;

  }
  """

mod = compiler.SourceModule(kernel_code)
kernel = mod.get_function("kernel")

kernel(
  empty_pinned_array.gpudata,odata,
  grid = (20,1),
  block = (1, 1, 1),
)
print odata.get()
pool.free_held()

_______________________________________________
PyCUDA mailing list
PyCUDA@tiker.net
http://lists.tiker.net/listinfo/pycuda

Reply via email to