Re: [PyCUDA] CURAND - Sobol initialisation

Tomasz Rybak Mon, 09 May 2011 08:33:25 -0700

Dnia 2011-05-06, pią o godzinie 08:30 -0400, Andreas Kloeckner pisze:
> On Fri, 06 May 2011 11:00:08 +0200, Tomasz Rybak <bogom...@post.pl> wrote:
> > Dnia 2011-04-24, nie o godzinie 17:50 +0200, Tomasz Rybak pisze:
> > > Dnia 2011-04-24, nie o godzinie 01:22 -0400, Andreas Kloeckner pisze:
> > > > On Sun, 17 Apr 2011 17:10:30 +0200, "=?UTF-8?B?VG9tYXN6IFJ5YmFr?=" 
> > > > <bogom...@post.pl> wrote:
> > > > > Dnia 2011-04-16 00:48 Andreas Kloeckner napisał(a):
> > > > > 
> > > > > >Hi Tomasz,
> > > > > >
> > > > > >any progress on the things below? Has maybe another email gone 
> > > > > >missing?
> > > > > >I'd like to release PyCUDA 2011.1 soon.
> > > > > 
> > > > > 
> > > > > I have send the patch to the mailing list on 2011-03-18 and 
> > > > > 2011-03-24.
> > > > > Please check the archives.
> > > > 
> > > > Weird. I'm really not sure what happened there--I could've sworn I never
> > > > saw these emails.
> > > > 
> > > > In any case, I've merged your patch into the
> > > > curand-wrapper-v2-from-tomasz branch, which I've also brought up to
> > > > current master.
> > > 
> > > Thanks!
> > > 
> > > > 
> > > > Looks like we're almost done here--only initialization is missing.
> > > 
> > > I hope to have it done by the end of the next week.
> > 
> > I have troubles with transferring data between C and Python.
> > I want to:
> > 1. call curandGetDirectionVectors32 which returns pointer
> > to the 32 int32
> > 2. transfer this data to the device memory
> > 3. create GPUArray from it
> > 4. call prepare() kernel which will pass appropriate
> > direction vectors to the curand_init() kernel
> > 
> > I have tried two approaches (curand-hostptr.diff
> > and curand-memcpy.diff). The former, when compiled,
> > causes _curand module to misbehave - e.g. it get_curand_version()
> > returns None instead of (3, 2, 0) or (0, 0, 0)
> > The latter compiles but cuMemcpy call fails because of
> > parameter type mismatch.
> > 
> > Can someone give me some direction how to transfer raw
> > pointer from C to Python?
> 
> What's wrong with creating a numpy array for the data?


Thanks for the tip.
Here is the patch. It works, and generates
quasi-random numbers. The only problem is when
with calling curandGetDirectionVectors - it generates
only 20000 vectors, so we will have repeated generators
when we have GPUs with more than 20000 cores ;-)

BTW - there is difference in wrap_cudadrv.cpp between
master and curand branch. I had to fix it but
did not include this change in the patch.

Please apply this patch - and I believe
that we could think about merging curand branch
into master.

Best regards.


-- 
Tomasz Rybak <bogom...@post.pl> GPG/PGP key ID: 2AD5 9860
Fingerprint A481 824E 7DD3 9C0E C40A  488E C654 FB33 2AD5 9860
http://member.acm.org/~tomaszrybak

diff --git a/doc/source/array.rst b/doc/source/array.rst
index 4dabe59..431aa37 100644
--- a/doc/source/array.rst
+++ b/doc/source/array.rst
@@ -399,6 +399,21 @@ algorithm designed to fill an n-dimensional space evenly.
 
 Quasirandom numbers are more expensive to generate.
 
+.. function:: get_curand_version()
+
+    Obtain the version of CURAND against which PyCUDA was compiled. Returns a
+    3-tuple of integers as *(major, minor, revision)*.
+
+.. function:: seed_getter_uniform(N)
+
+    Return an :class:`GPUArray` filled with one random `int32` repeated `N`
+    times which can be used as a seed for XORWOW generator.
+
+.. function:: seed_getter_unique(N)
+
+    Return an :class:`GPUArray` filled with `N` random `int32` which can
+    be used as a seed for XORWOW generator.
+
 .. class:: XORWOWRandomNumberGenerator(seed_getter=None, offset=0)
 
     :arg seed_getter: a function that, given an integer count, will yield an
@@ -432,7 +447,18 @@ Quasirandom numbers are more expensive to generate.
         Accepts array i of integer values, telling each generator how many
         values to skip.
 
-.. class:: Sobol32RandomNumberGenerator(vector, offset)
+.. function:: generate_direction_vectors(count, direction=direction_vector_set.VECTOR_32)
+
+    Return an :class:`GPUArray` `count` filled with direction vectors
+    used to initialize Sobol32 generators.
+
+.. class:: Sobol32RandomNumberGenerator(dir_vector=None, offset=0)
+
+    :arg dir_vector: a :class:`GPUArray` of 32-element `int32` vectors which
+      are used to initialize quasirandom generator; it must contain one vector
+      for each initialized generator
+    :arg offset: Starting index into the Sobol32 sequence, given direction
+      vector.
 
     Provides quasirandom numbers. Generates
     sequences with period of :math:`2^32`.
diff --git a/pycuda/curandom.py b/pycuda/curandom.py
index 73ca18d..5982b33 100644
--- a/pycuda/curandom.py
+++ b/pycuda/curandom.py
@@ -255,7 +255,7 @@ else:
 
 if get_curand_version() >= (3, 2, 0):
     direction_vector_set = _curand.direction_vector_set
-    get_direction_vectors32 = _curand.get_direction_vectors32
+    _get_direction_vectors = _curand._get_direction_vectors
 
 # {{{ Base class
 
@@ -526,9 +526,14 @@ if get_curand_version() >= (3, 2, 0):
 
 # {{{ Sobol32 RNG
 
+def generate_direction_vectors(count, direction=direction_vector_set.VECTOR_32):
+    result = np.empty((count, 32), dtype=np.int32)
+    _get_direction_vectors(direction, result, count)
+    return pycuda.gpuarray.to_gpu(result)
+
 sobol32_random_source = """
 extern "C" {
-__global__ void prepare(curandStateSobol32 *s, const int n, unsigned int **v,
+__global__ void prepare(curandStateSobol32 *s, const int n, curandDirectionVectors32_t *v,
     const unsigned int o)
 {
   const int id = blockIdx.x*blockDim.x+threadIdx.x;
@@ -578,10 +583,10 @@ if get_curand_version() >= (3, 2, 0):
                     dev = drv.Context.get_device()
                     if dev.compute_capability() >= (2, 0):
                         p.prepared_call((self.block_count, 1), self.state,
-                            self.block_count * self.generators_per_block, vector, offset)
+                            self.block_count * self.generators_per_block, dir_vector.gpudata, offset)
                     else:
                         p.prepared_call((2 * self.block_count, 1), self.state,
-                            self.block_count * self.generators_per_block // 2, vector, offset)
+                            self.block_count * self.generators_per_block // 2, dir_vector.gpudata, offset)
                 except drv.LaunchError:
                     raise ValueError("Initialisation failed. Decrease number of threads.")
 
diff --git a/src/cpp/curand.hpp b/src/cpp/curand.hpp
index e025b0c..9783239 100644
--- a/src/cpp/curand.hpp
+++ b/src/cpp/curand.hpp
@@ -3,7 +3,28 @@
 
 
 #if CUDAPP_CUDA_VERSION >= 3020
-#include <curand.h>
+  #include <curand.h>
+
+  #ifdef CUDAPP_TRACE_CUDA
+    #define CURAND_PRINT_ERROR_TRACE(NAME, CODE) \
+      if (CODE != CURAND_STATUS_SUCCESS) \
+        std::cerr << NAME << " failed with code " << CODE << std::endl;
+  #else
+    #define CURAND_PRINT_ERROR_TRACE(NAME, CODE) /*nothing*/
+  #endif
+
+  #define CURAND_CALL_GUARDED(NAME, ARGLIST) \
+    { \
+      CUDAPP_PRINT_CALL_TRACE(#NAME); \
+      curandStatus_t cu_status_code; \
+      cu_status_code = NAME ARGLIST; \
+      CURAND_PRINT_ERROR_TRACE(#NAME, cu_status_code); \
+      if (cu_status_code != CURAND_STATUS_SUCCESS) \
+        throw pycuda::error(#NAME, CUDA_SUCCESS);\
+    }
+#else
+  #define CURAND_PRINT_ERROR_TRACE(NAME, CODE) /*nothing*/
+  #define CURAND_CALL_GUARDED(NAME, ARGLIST) /*nothing*/
 #endif
 
 
@@ -22,10 +43,26 @@ namespace pycuda { namespace curandom {
   }
 
 #if CUDAPP_CUDA_VERSION >= 3020
-  void py_curand_get_direction_vectors32(curandDirectionVectors32_t *vectors[],
-      curandDirectionVectorSet_t set)
-// TODO: checking; cannot use CUDAPP_CALL_GUARDED because function returns CURAND enum
-  { curandGetDirectionVectors32(vectors, set); }
+  void py_curand_get_direction_vectors(
+      curandDirectionVectorSet_t set, py::object dst, int count)
+  {
+    void *buf;
+    PYCUDA_BUFFER_SIZE_T len;
+    int n = 0;
+
+    if (PyObject_AsWriteBuffer(dst.ptr(), &buf, &len))
+      throw py::error_already_set();
+    if (CURAND_DIRECTION_VECTORS_32_JOEKUO6 == set) {
+      curandDirectionVectors32_t *vectors;
+      CURAND_CALL_GUARDED(curandGetDirectionVectors32, (&vectors, set));
+      while (count > 0) {
+        int size = ((count > 20000) ? 20000 : count)*sizeof(curandDirectionVectors32_t);
+        memcpy((int *)buf+n*20000*sizeof(curandDirectionVectors32_t)/sizeof(unsigned int), vectors, size);
+	count -= size/sizeof(curandDirectionVectors32_t);
+        n++;
+      }
+    }
+  }
 #endif
 
 } }
diff --git a/src/wrapper/wrap_curand.cpp b/src/wrapper/wrap_curand.cpp
index 71adf57..2c1f2e3 100644
--- a/src/wrapper/wrap_curand.cpp
+++ b/src/wrapper/wrap_curand.cpp
@@ -25,7 +25,8 @@ void pycuda_expose_curand()
   py::def("get_curand_version", py_curand_version);
 
 #if CUDAPP_CUDA_VERSION >= 3020
-  py::def("get_direction_vectors32", py_curand_get_direction_vectors32, (arg("vectors"), arg("set")));
+  py::def("_get_direction_vectors", py_curand_get_direction_vectors,
+      (arg("set"), arg("dst"), arg("count")));
 #endif
 }

signature.asc
Description: This is a digitally signed message part

_______________________________________________
PyCUDA mailing list
PyCUDA@tiker.net
http://lists.tiker.net/listinfo/pycuda

Re: [PyCUDA] CURAND - Sobol initialisation

Reply via email to