Dnia 2011-09-09, piÄ… o godzinie 12:12 +1000, Bogdan Opanchuk pisze:
> Hello Tomasz,
> 
> Sorry for taking so long to check your code. I think I found the error
> which cause ScrambledSobol64 to fail. In curandom.py:
> 
> class 
> ScrambledSobol64RandomNumberGenerator(_ScrambledSobolRandomNumberGeneratorBase):
> ...
>         def __init__(self, dir_vector=None, scramble_vector=None, offset=0):
> ...
>                 'curandStateScrambledSobol64',
> sobol64_random_source+random_skip_ahead64_source)
> 
> The last line should be
>                 'curandStateScrambledSobol64',
> scrambledsobol64_random_source+random_skip_ahead64_source)
> (artifact of copy-paste, I assume ;)
> 
> With this change test_gpuarray.py passes on my Fermi box.

Thanks!
I have looked so long at this code, telling myself
"Where is the error, those classes are almost the same?"
It looks like the problem was in that they were too much
"the same" :-)
I can confirm that it works on GeForce 460.

Andreas - please apply attached patch so we can finally have
CURAND4 in PyCUDA.


Best regards, and once again thanks for spotting this error.

-- 
Tomasz Rybak <bogom...@post.pl> GPG/PGP key ID: 2AD5 9860
Fingerprint A481 824E 7DD3 9C0E C40A  488E C654 FB33 2AD5 9860
http://member.acm.org/~tomaszrybak
diff --git a/doc/source/array.rst b/doc/source/array.rst
index 431aa37..e2a3427 100644
--- a/doc/source/array.rst
+++ b/doc/source/array.rst
@@ -447,10 +447,31 @@ Quasirandom numbers are more expensive to generate.
         Accepts array i of integer values, telling each generator how many
         values to skip.
 
+    .. method call_skip_ahead_sequence(i, stream=None)
+
+        Forces all generators to skip i subsequences. Is equivalent to
+        generating i * :math:`2^67` values and discarding results,
+        but is much faster.
+
+    .. method call_skip_ahead_sequence_array(i, stream=None)
+
+        Accepts array i of integer values, telling each generator how many
+        subsequences to skip.
+
 .. function:: generate_direction_vectors(count, direction=direction_vector_set.VECTOR_32)
 
     Return an :class:`GPUArray` `count` filled with direction vectors
-    used to initialize Sobol32 generators.
+    used to initialize Sobol generators.
+
+.. function:: generate_scramble_constants32(count)
+
+    Return a :class:`GPUArray` filled with `count' 32-bit unsigned integer
+    numbers used to initialize :class:`ScrambledSobol32RandomNumberGenerator`
+
+.. function:: generate_scramble_constants64(count)
+
+    Return a :class:`GPUArray` filled with `count' 64-bit unsigned integer
+    numbers used to initialize :class:`ScrambledSobol64RandomNumberGenerator`
 
 .. class:: Sobol32RandomNumberGenerator(dir_vector=None, offset=0)
 
@@ -487,6 +508,116 @@ Quasirandom numbers are more expensive to generate.
         Accepts array i of integer values, telling each generator how many
         values to skip.
 
+.. class:: ScrambledSobol32RandomNumberGenerator(dir_vector=None, scramble_vector=None, offset=0)
+
+    :arg dir_vector: a :class:`GPUArray` of 32-element `uint32` vectors which
+      are used to initialize quasirandom generator; it must contain one vector
+      for each initialized generator
+    :arg scramble_vector: a :class:`GPUArray` of `uint32` elements which
+      are used to initialize quasirandom generator; it must contain one number
+      for each initialized generator
+    :arg offset: Starting index into the Sobol32 sequence, given direction
+      vector.
+
+    Provides quasirandom numbers. Generates
+    sequences with period of :math:`2^32`.
+
+    CUDA 4.0 and above.
+
+    .. versionadded:: 2011.1
+
+    .. method fill_uniform(data, stream=None)
+
+        Fills in :class:`GPUArray` *data* with uniformly distributed
+        quasirandom values.
+
+    .. method fill_normal(data, stream=None)
+
+        Fills in :class:`GPUArray` *data* with normally distributed
+        quasirandom values.
+
+    .. method call_skip_ahead(i, stream=None)
+
+        Forces all generators to skip i values. Is equivalent to generating
+        i values and discarding results, but is much faster.
+
+    .. method call_skip_ahead_array(i, stream=None)
+
+        Accepts array i of integer values, telling each generator how many
+        values to skip.
+
+.. class:: Sobol64RandomNumberGenerator(dir_vector=None, offset=0)
+
+    :arg dir_vector: a :class:`GPUArray` of 64-element `uint64` vectors which
+      are used to initialize quasirandom generator; it must contain one vector
+      for each initialized generator
+    :arg offset: Starting index into the Sobol64 sequence, given direction
+      vector.
+
+    Provides quasirandom numbers. Generates
+    sequences with period of :math:`2^64`.
+
+    CUDA 4.0 and above.
+
+    .. versionadded:: 2011.1
+
+    .. method fill_uniform(data, stream=None)
+
+        Fills in :class:`GPUArray` *data* with uniformly distributed
+        quasirandom values.
+
+    .. method fill_normal(data, stream=None)
+
+        Fills in :class:`GPUArray` *data* with normally distributed
+        quasirandom values.
+
+    .. method call_skip_ahead(i, stream=None)
+
+        Forces all generators to skip i values. Is equivalent to generating
+        i values and discarding results, but is much faster.
+
+    .. method call_skip_ahead_array(i, stream=None)
+
+        Accepts array i of integer values, telling each generator how many
+        values to skip.
+
+.. class:: ScrambledSobol64RandomNumberGenerator(dir_vector=None, scramble_vector=None, offset=0)
+
+    :arg dir_vector: a :class:`GPUArray` of 64-element `uint64` vectors which
+      are used to initialize quasirandom generator; it must contain one vector
+      for each initialized generator
+    :arg scramble_vector: a :class:`GPUArray` of `uint64` vectors which
+      are used to initialize quasirandom generator; it must contain one vector
+      for each initialized generator
+    :arg offset: Starting index into the ScrambledSobol64 sequence,
+      given direction vector.
+
+    Provides quasirandom numbers. Generates
+    sequences with period of :math:`2^64`.
+
+    CUDA 4.0 and above.
+
+    .. versionadded:: 2011.1
+
+    .. method fill_uniform(data, stream=None)
+
+        Fills in :class:`GPUArray` *data* with uniformly distributed
+        quasirandom values.
+
+    .. method fill_normal(data, stream=None)
+
+        Fills in :class:`GPUArray` *data* with normally distributed
+        quasirandom values.
+
+    .. method call_skip_ahead(i, stream=None)
+
+        Forces all generators to skip i values. Is equivalent to generating
+        i values and discarding results, but is much faster.
+
+    .. method call_skip_ahead_array(i, stream=None)
+
+        Accepts array i of integer values, telling each generator how many
+        values to skip.
 
 Single-pass Custom Expression Evaluation
 ----------------------------------------
diff --git a/pycuda/curandom.py b/pycuda/curandom.py
index ed7b8dc..329ff4f 100644
--- a/pycuda/curandom.py
+++ b/pycuda/curandom.py
@@ -258,6 +258,10 @@ if get_curand_version() >= (3, 2, 0):
     direction_vector_set = _curand.direction_vector_set
     _get_direction_vectors = _curand._get_direction_vectors
 
+if get_curand_version() >= (4, 0, 0):
+    _get_scramble_constants32 = _curand._get_scramble_constants32
+    _get_scramble_constants64 = _curand._get_scramble_constants64
+
 # {{{ Base class
 
 gen_template = """
@@ -279,14 +283,19 @@ extern "C"
 
 %(generators)s
 
-__global__ void skip_ahead(%(state_type)s *s, const int n, const int skip)
+}
+"""
+
+random_skip_ahead32_source = """
+extern "C" {
+__global__ void skip_ahead(%(state_type)s *s, const int n, const unsigned int skip)
 {
   const int idx = blockIdx.x*blockDim.x+threadIdx.x;
   if (idx < n)
     skipahead(skip, &s[idx]);
 }
 
-__global__ void skip_ahead_array(%(state_type)s *s, const int n, const int *skip)
+__global__ void skip_ahead_array(%(state_type)s *s, const int n, const unsigned int *skip)
 {
   const int idx = blockIdx.x*blockDim.x+threadIdx.x;
   if (idx < n)
@@ -295,6 +304,23 @@ __global__ void skip_ahead_array(%(state_type)s *s, const int n, const int *skip
 }
 """
 
+random_skip_ahead64_source = """
+extern "C" {
+__global__ void skip_ahead(%(state_type)s *s, const int n, const unsigned long long skip)
+{
+  const int idx = blockIdx.x*blockDim.x+threadIdx.x;
+  if (idx < n)
+    skipahead(skip, &s[idx]);
+}
+
+__global__ void skip_ahead_array(%(state_type)s *s, const int n, const unsigned long long *skip)
+{
+  const int idx = blockIdx.x*blockDim.x+threadIdx.x;
+  if (idx < n)
+      skipahead(skip[idx], &s[idx]);
+}
+}
+"""
 
 
 
@@ -460,7 +486,7 @@ def seed_getter_unique(N):
 
 xorwow_random_source = """
 extern "C" {
-__global__ void prepare_with_seeds(curandState *s, const int n,
+__global__ void prepare_with_seeds(curandStateXORWOW *s, const int n,
   const int *seed, const int offset)
 {
   const int id = blockIdx.x*blockDim.x+threadIdx.x;
@@ -468,6 +494,19 @@ __global__ void prepare_with_seeds(curandState *s, const int n,
     curand_init(seed[id], threadIdx.x, offset, &s[id]);
 }
 
+__global__ void skip_ahead_sequence(%(state_type)s *s, const int n, const unsigned int skip)
+{
+  const int idx = blockIdx.x*blockDim.x+threadIdx.x;
+  if (idx < n)
+    skipahead_sequence(skip, &s[idx]);
+}
+
+__global__ void skip_ahead_sequence_array(%(state_type)s *s, const int n, const unsigned int *skip)
+{
+  const int idx = blockIdx.x*blockDim.x+threadIdx.x;
+  if (idx < n)
+      skipahead_sequence(skip[idx], &s[idx]);
+}
 }
 """
 
@@ -483,7 +522,7 @@ if get_curand_version() >= (3, 2, 0):
             """
 
             super(XORWOWRandomNumberGenerator, self).__init__(
-                'curandStateXORWOW', xorwow_random_source)
+                'curandStateXORWOW', xorwow_random_source+random_skip_ahead64_source)
 
             if seed_getter is None:
                 seed = array.to_gpu(
@@ -501,6 +540,10 @@ if get_curand_version() >= (3, 2, 0):
 
             p = self.module.get_function("prepare_with_seeds")
             p.prepare("PiPi")
+            self.skip_ahead_sequence = self.module.get_function("skip_ahead_sequence")
+            self.skip_ahead_sequence.prepare("Pii")
+            self.skip_ahead_sequence_array = self.module.get_function("skip_ahead_sequence_array")
+            self.skip_ahead_sequence_array.prepare("PiP")
 
             from pycuda.characterize import has_stack
             has_stack = has_stack()
@@ -523,26 +566,164 @@ if get_curand_version() >= (3, 2, 0):
                 if has_stack:
                     drv.Context.set_limit(drv.limit.STACK_SIZE, prev_stack_size)
 
+        def call_skip_ahead_sequence(self, i, stream=None):
+            self.skip_ahead_sequence.prepared_async_call(
+                    (self.block_count, 1), (self.generators_per_block, 1, 1), stream,
+                    self.state, self.generators_per_block, i)
+
+        def call_skip_ahead_sequence_array(self, i, stream=None):
+            self.skip_ahead_sequence_array.prepared_async_call(
+                    (self.block_count, 1), (self.generators_per_block, 1, 1), stream,
+                    self.state, self.generators_per_block, i.gpudata)
+
         def _kernels(self):
             return (_RandomNumberGeneratorBase._kernels(self)
-                    + [self.module.get_function("prepare_with_seeds")])
+                    + [self.module.get_function("prepare_with_seeds")]
+                    + [self.module.get_function("skip_ahead_sequence"),
+                       self.module.get_function("skip_ahead_sequence_array")])
 
 # }}}
 
-# {{{ Sobol32 RNG
+# {{{ Sobol RNG
 
 def generate_direction_vectors(count, direction=None):
-    if direction is None:
-        direction = direction_vector_set.VECTOR_32
-
-    result = np.empty((count, 32), dtype=np.int32)
+    if get_curand_version() >= (4, 0, 0):
+        if direction == direction_vector_set.VECTOR_64 or \
+            direction == direction_vector_set.SCRAMBLED_VECTOR_64:
+            result = np.empty((count, 64), dtype=np.uint64)
+        else:
+            result = np.empty((count, 32), dtype=np.uint32)
+    else:
+        result = np.empty((count, 32), dtype=np.uint32)
     _get_direction_vectors(direction, result, count)
     return pycuda.gpuarray.to_gpu(result)
 
+if get_curand_version() >= (4, 0, 0):
+    def generate_scramble_constants32(count):
+        result = np.empty((count, ), dtype=np.uint32)
+        _get_scramble_constants32(result, count)
+        return pycuda.gpuarray.to_gpu(result)
+
+    def generate_scramble_constants64(count):
+        result = np.empty((count, ), dtype=np.uint64)
+        _get_scramble_constants64(result, count)
+        return pycuda.gpuarray.to_gpu(result)
+
+class _SobolRandomNumberGeneratorBase(_RandomNumberGeneratorBase):
+    """
+    Class surrounding CURAND kernels from CUDA 3.2.
+    It allows for generating quasi-random numbers with uniform
+    and normal probability function of type int, float, and double.
+    """
+
+    has_box_muller = False
+
+    def __init__(self, dir_vector, dir_vector_dtype, dir_vector_size,
+        dir_vector_set, offset, state_type, sobol_random_source):
+        super(_SobolRandomNumberGeneratorBase, self).__init__(state_type,
+            sobol_random_source)
+
+        if dir_vector is None:
+            dir_vector = generate_direction_vectors(
+                self.block_count * self.generators_per_block, dir_vector_set)
+
+        if not (isinstance(dir_vector, pycuda.gpuarray.GPUArray)
+                and dir_vector.dtype == dir_vector_dtype
+                and dir_vector.shape == (self.block_count * self.generators_per_block, dir_vector_size)):
+            raise TypeError("seed must be GPUArray of integers of right length")
+
+        p = self.module.get_function("prepare")
+        p.prepare("PiPi")
+
+        from pycuda.characterize import has_stack
+        has_stack = has_stack()
+
+        if has_stack:
+            prev_stack_size = drv.Context.get_limit(drv.limit.STACK_SIZE)
+
+        try:
+            if has_stack:
+                drv.Context.set_limit(drv.limit.STACK_SIZE, 1<<14) # 16k
+            try:
+                p.prepared_call((2 * self.block_count, 1), (self.generators_per_block // 2, 1, 1),
+                    self.state, self.block_count * self.generators_per_block,
+                    dir_vector.gpudata, offset)
+            except drv.LaunchError:
+                raise ValueError("Initialisation failed. Decrease number of threads.")
+
+        finally:
+            if has_stack:
+                drv.Context.set_limit(drv.limit.STACK_SIZE, prev_stack_size)
+
+    def _kernels(self):
+        return (_RandomNumberGeneratorBase._kernels(self)
+                + [self.module.get_function("prepare")])
+
+class _ScrambledSobolRandomNumberGeneratorBase(_RandomNumberGeneratorBase):
+    """
+    Class surrounding CURAND kernels from CUDA 4.0.
+    It allows for generating quasi-random numbers with uniform
+    and normal probability function of type int, float, and double.
+    """
+
+    has_box_muller = False
+
+    def __init__(self, dir_vector, dir_vector_dtype, dir_vector_size,
+        dir_vector_set, scramble_vector, scramble_vector_function,
+        offset, state_type, sobol_random_source):
+        super(_ScrambledSobolRandomNumberGeneratorBase, self).__init__(state_type,
+            sobol_random_source)
+
+        if dir_vector is None:
+            dir_vector = generate_direction_vectors(
+                self.block_count * self.generators_per_block,
+                dir_vector_set)
+
+        if scramble_vector is None:
+            scramble_vector = scramble_vector_function(
+                self.block_count * self.generators_per_block)
+
+        if not (isinstance(dir_vector, pycuda.gpuarray.GPUArray)
+                and dir_vector.dtype == dir_vector_dtype
+                and dir_vector.shape == (self.block_count * self.generators_per_block, dir_vector_size)):
+            raise TypeError("seed must be GPUArray of integers of right length")
+
+        if not (isinstance(scramble_vector, pycuda.gpuarray.GPUArray)
+                and scramble_vector.dtype == dir_vector_dtype
+                and scramble_vector.shape == (self.block_count * self.generators_per_block, )):
+            raise TypeError("scramble must be GPUArray of integers of right length")
+
+        p = self.module.get_function("prepare")
+        p.prepare("PiPPi")
+
+        from pycuda.characterize import has_stack
+        has_stack = has_stack()
+
+        if has_stack:
+            prev_stack_size = drv.Context.get_limit(drv.limit.STACK_SIZE)
+
+        try:
+            if has_stack:
+                drv.Context.set_limit(drv.limit.STACK_SIZE, 1<<14) # 16k
+            try:
+                p.prepared_call((2 * self.block_count, 1), (self.generators_per_block // 2, 1, 1),
+                    self.state, self.block_count * self.generators_per_block,
+                    dir_vector.gpudata, scramble_vector.gpudata, offset)
+            except drv.LaunchError:
+                raise ValueError("Initialisation failed. Decrease number of threads.")
+
+        finally:
+            if has_stack:
+                drv.Context.set_limit(drv.limit.STACK_SIZE, prev_stack_size)
+
+    def _kernels(self):
+        return (_RandomNumberGeneratorBase._kernels(self)
+                + [self.module.get_function("prepare")])
+
 sobol32_random_source = """
 extern "C" {
-__global__ void prepare(curandStateSobol32 *s, const int n, curandDirectionVectors32_t *v,
-    const unsigned int o)
+__global__ void prepare(curandStateSobol32 *s, const int n,
+    curandDirectionVectors32_t *v, const unsigned int o)
 {
   const int id = blockIdx.x*blockDim.x+threadIdx.x;
   if (id < n)
@@ -552,59 +733,95 @@ __global__ void prepare(curandStateSobol32 *s, const int n, curandDirectionVecto
 """
 
 if get_curand_version() >= (3, 2, 0):
-    class Sobol32RandomNumberGenerator(_RandomNumberGeneratorBase):
+    class Sobol32RandomNumberGenerator(_SobolRandomNumberGeneratorBase):
         """
         Class surrounding CURAND kernels from CUDA 3.2.
         It allows for generating quasi-random numbers with uniform
         and normal probability function of type int, float, and double.
         """
 
-        has_box_muller = False
-
         def __init__(self, dir_vector=None, offset=0):
-            super(Sobol32RandomNumberGenerator, self).__init__('curandStateSobol32',
-                sobol32_random_source)
+            super(Sobol32RandomNumberGenerator, self).__init__(dir_vector,
+                np.uint32, 32, direction_vector_set.VECTOR_32, offset,
+                'curandStateSobol32', sobol32_random_source+random_skip_ahead32_source)
 
-            if dir_vector is None:
-                dir_vector = generate_direction_vectors(
-                    self.block_count * self.generators_per_block)
+scrambledsobol32_random_source = """
+extern "C" {
+__global__ void prepare(curandStateScrambledSobol32 *s, const int n,
+    curandDirectionVectors32_t *v, unsigned int *scramble, const unsigned int o)
+{
+  const int id = blockIdx.x*blockDim.x+threadIdx.x;
+  if (id < n)
+    curand_init(v[id], scramble[id], o, &s[id]);
+}
+}
+"""
 
-            if not (isinstance(dir_vector, pycuda.gpuarray.GPUArray)
-                    and dir_vector.dtype == np.int32
-                    and dir_vector.shape == (self.block_count * self.generators_per_block, 32)):
-                raise TypeError("seed must be GPUArray of integers of right length")
+if get_curand_version() >= (4, 0, 0):
+    class ScrambledSobol32RandomNumberGenerator(_ScrambledSobolRandomNumberGeneratorBase):
+        """
+        Class surrounding CURAND kernels from CUDA 4.0.
+        It allows for generating quasi-random numbers with uniform
+        and normal probability function of type int, float, and double.
+        """
 
-            p = self.module.get_function("prepare")
-            p.prepare("PiPi", block=(self.generators_per_block, 1, 1))
+        def __init__(self, dir_vector=None, scramble_vector=None, offset=0):
+            super(ScrambledSobol32RandomNumberGenerator, self).__init__(dir_vector,
+                np.uint32, 32, direction_vector_set.SCRAMBLED_VECTOR_32,
+                scramble_vector, generate_scramble_constants32, offset,
+                'curandStateScrambledSobol32', scrambledsobol32_random_source+random_skip_ahead32_source)
 
-            from pycuda.characterize import has_stack
-            has_stack = has_stack()
+sobol64_random_source = """
+extern "C" {
+__global__ void prepare(curandStateSobol64 *s, const int n, curandDirectionVectors64_t *v,
+    const unsigned int o)
+{
+  const int id = blockIdx.x*blockDim.x+threadIdx.x;
+  if (id < n)
+    curand_init(v[id], o, &s[id]);
+}
+}
+"""
 
-            if has_stack:
-                prev_stack_size = drv.Context.get_limit(drv.limit.STACK_SIZE)
 
-            try:
-                if has_stack:
-                    drv.Context.set_limit(drv.limit.STACK_SIZE, 1<<14) # 16k
-                try:
+if get_curand_version() >= (4, 0, 0):
+    class Sobol64RandomNumberGenerator(_SobolRandomNumberGeneratorBase):
+        """
+        Class surrounding CURAND kernels from CUDA 4.0.
+        It allows for generating quasi-random numbers with uniform
+        and normal probability function of type int, float, and double.
+        """
 
-                    dev = drv.Context.get_device()
-                    if dev.compute_capability() >= (2, 0):
-                        p.prepared_call((self.block_count, 1), self.state,
-                            self.block_count * self.generators_per_block, dir_vector.gpudata, offset)
-                    else:
-                        p.prepared_call((2 * self.block_count, 1), self.state,
-                            self.block_count * self.generators_per_block // 2, dir_vector.gpudata, offset)
-                except drv.LaunchError:
-                    raise ValueError("Initialisation failed. Decrease number of threads.")
+        def __init__(self, dir_vector=None, offset=0):
+            super(Sobol64RandomNumberGenerator, self).__init__(dir_vector,
+                np.uint64, 64, direction_vector_set.VECTOR_64, offset,
+                'curandStateSobol64', sobol64_random_source+random_skip_ahead64_source)
 
-            finally:
-                if has_stack:
-                    drv.Context.set_limit(drv.limit.STACK_SIZE, prev_stack_size)
+scrambledsobol64_random_source = """
+extern "C" {
+__global__ void prepare(curandStateScrambledSobol64 *s, const int n,
+    curandDirectionVectors64_t *v, unsigned long long *scramble, const unsigned int o)
+{
+  const int id = blockIdx.x*blockDim.x+threadIdx.x;
+  if (id < n)
+    curand_init(v[id], scramble[id], o, &s[id]);
+}
+}
+"""
 
-        def _kernels(self):
-            return (_RandomNumberGeneratorBase._kernels(self)
-                    + [self.module.get_function("prepare")])
+if get_curand_version() >= (4, 0, 0):
+    class ScrambledSobol64RandomNumberGenerator(_ScrambledSobolRandomNumberGeneratorBase):
+        """
+        Class surrounding CURAND kernels from CUDA 4.0.
+        It allows for generating quasi-random numbers with uniform
+        and normal probability function of type int, float, and double.
+        """
+
+        def __init__(self, dir_vector=None, scramble_vector=None, offset=0):
+            super(ScrambledSobol64RandomNumberGenerator, self).__init__(dir_vector,
+                np.uint64, 64, direction_vector_set.SCRAMBLED_VECTOR_64,
+                scramble_vector, generate_scramble_constants64, offset,
+                'curandStateScrambledSobol64', scrambledsobol64_random_source+random_skip_ahead64_source)
 
 # }}}
 
diff --git a/src/cpp/curand.hpp b/src/cpp/curand.hpp
index 9783239..b3d015d 100644
--- a/src/cpp/curand.hpp
+++ b/src/cpp/curand.hpp
@@ -52,19 +52,94 @@ namespace pycuda { namespace curandom {
 
     if (PyObject_AsWriteBuffer(dst.ptr(), &buf, &len))
       throw py::error_already_set();
-    if (CURAND_DIRECTION_VECTORS_32_JOEKUO6 == set) {
+    if (CURAND_DIRECTION_VECTORS_32_JOEKUO6 == set
+#if CUDAPP_CUDA_VERSION >= 4000
+      || CURAND_SCRAMBLED_DIRECTION_VECTORS_32_JOEKUO6 == set
+#endif
+    ) {
       curandDirectionVectors32_t *vectors;
       CURAND_CALL_GUARDED(curandGetDirectionVectors32, (&vectors, set));
       while (count > 0) {
         int size = ((count > 20000) ? 20000 : count)*sizeof(curandDirectionVectors32_t);
-        memcpy((int *)buf+n*20000*sizeof(curandDirectionVectors32_t)/sizeof(unsigned int), vectors, size);
+        memcpy((unsigned int *)buf+n*20000*sizeof(curandDirectionVectors32_t)/sizeof(unsigned int), vectors, size);
 	count -= size/sizeof(curandDirectionVectors32_t);
         n++;
       }
     }
+#if CUDAPP_CUDA_VERSION >= 4000
+    if (CURAND_DIRECTION_VECTORS_64_JOEKUO6 == set
+      || CURAND_SCRAMBLED_DIRECTION_VECTORS_64_JOEKUO6 == set) {
+      curandDirectionVectors64_t *vectors;
+      CURAND_CALL_GUARDED(curandGetDirectionVectors64, (&vectors, set));
+      while (count > 0) {
+        int size = ((count > 20000) ? 20000 : count)*sizeof(curandDirectionVectors64_t);
+        memcpy((unsigned long long *)buf+n*20000*sizeof(curandDirectionVectors64_t)/sizeof(unsigned long long), vectors, size);
+	count -= size/sizeof(curandDirectionVectors64_t);
+        n++;
+      }
+    }
+#endif
+  }
+#endif
+
+#if CUDAPP_CUDA_VERSION >= 4000
+  void py_curand_get_scramble_constants32(py::object dst, int count)
+  {
+    void *buf;
+    PYCUDA_BUFFER_SIZE_T len;
+    int n = 0;
+
+    if (PyObject_AsWriteBuffer(dst.ptr(), &buf, &len))
+      throw py::error_already_set();
+    unsigned int *vectors;
+    CURAND_CALL_GUARDED(curandGetScrambleConstants32, (&vectors));
+// Documentation does not mention number of dimensions
+// Assuming the same as in getDirectionVectors*
+    while (count > 0) {
+      int size = ((count > 20000) ? 20000 : count)*sizeof(unsigned int);
+      memcpy((unsigned int *)buf+n*20000, vectors, size);
+      count -= size/sizeof(unsigned int);
+      n++;
+    }
+  }
+
+  void py_curand_get_scramble_constants64(py::object dst, int count)
+  {
+    void *buf;
+    PYCUDA_BUFFER_SIZE_T len;
+    int n = 0;
+
+    if (PyObject_AsWriteBuffer(dst.ptr(), &buf, &len))
+      throw py::error_already_set();
+    unsigned long long *vectors;
+    CURAND_CALL_GUARDED(curandGetScrambleConstants64, (&vectors));
+// Documentation does not mention number of dimensions
+// Assuming the same as in getDirectionVectors*
+    while (count > 0) {
+      int size = ((count > 20000) ? 20000 : count)*sizeof(unsigned long long);
+      memcpy((unsigned long long *)buf+n*20000, vectors, size);
+      count -= size/sizeof(unsigned long long);
+      n++;
+    }
   }
 #endif
 
+// TODO: add more methods in classes (log)
+// log_normal(state, mean, stddev)
+// log_normal(Scrambled64)
+// log_normal_double(Scrambled64)
+// log_normal(Sobol64)
+// log_normal_double(Sobol64)
+// log_normal(Scrambled32)
+// log_normal_double(Scrambled32)
+// log_normal(Sobol32)
+// log_normal_double(Sobol32)
+//
+// log_normal(XORWOW)
+// log_normal_double(XORWOW)
+// log_normal2(XORWOW)
+// log_normal2_double(XORWOW)
+//
 } }
 
 #endif
diff --git a/src/wrapper/wrap_curand.cpp b/src/wrapper/wrap_curand.cpp
index 4ca1e42..8bb2b7e 100644
--- a/src/wrapper/wrap_curand.cpp
+++ b/src/wrapper/wrap_curand.cpp
@@ -19,6 +19,11 @@ void pycuda_expose_curand()
 #if CUDAPP_CUDA_VERSION >= 3020
   py::enum_<curandDirectionVectorSet_t>("direction_vector_set")
     .value("VECTOR_32", CURAND_DIRECTION_VECTORS_32_JOEKUO6)
+#if CUDAPP_CUDA_VERSION >= 4000
+    .value("SCRAMBLED_VECTOR_32", CURAND_SCRAMBLED_DIRECTION_VECTORS_32_JOEKUO6)
+    .value("VECTOR_64", CURAND_DIRECTION_VECTORS_64_JOEKUO6)
+    .value("SCRAMBLED_VECTOR_64", CURAND_SCRAMBLED_DIRECTION_VECTORS_64_JOEKUO6)
+#endif
   ;
 #endif
 
@@ -28,4 +33,11 @@ void pycuda_expose_curand()
   py::def("_get_direction_vectors", py_curand_get_direction_vectors,
       (arg("set"), arg("dst"), arg("count")));
 #endif
+
+#if CUDAPP_CUDA_VERSION >= 4000
+  py::def("_get_scramble_constants32", py_curand_get_scramble_constants32,
+      (arg("dst"), arg("count")));
+  py::def("_get_scramble_constants64", py_curand_get_scramble_constants64,
+      (arg("dst"), arg("count")));
+#endif
 }
diff --git a/test/test_gpuarray.py b/test/test_gpuarray.py
index 5149ed8..8672fcf 100644
--- a/test/test_gpuarray.py
+++ b/test/test_gpuarray.py
@@ -259,20 +259,30 @@ class TestGPUArray:
             from pytest import skip
             skip("curand not installed")
 
-
-        from pycuda.curandom import (
-                XORWOWRandomNumberGenerator,
-                Sobol32RandomNumberGenerator)
+        generator_types = []
+        if get_curand_version() >= (3, 2, 0):
+            from pycuda.curandom import (
+                    XORWOWRandomNumberGenerator,
+                    Sobol32RandomNumberGenerator)
+            generator_types.extend([
+                    XORWOWRandomNumberGenerator,
+                    Sobol32RandomNumberGenerator])
+        if get_curand_version() >= (4, 0, 0):
+            from pycuda.curandom import (
+                    ScrambledSobol32RandomNumberGenerator,
+                    Sobol64RandomNumberGenerator,
+                    ScrambledSobol64RandomNumberGenerator)
+            generator_types.extend([
+                    ScrambledSobol32RandomNumberGenerator,
+                    Sobol64RandomNumberGenerator,
+                    ScrambledSobol64RandomNumberGenerator])
 
         if has_double_support():
             dtypes = [np.float32, np.float64]
         else:
             dtypes = [np.float32]
 
-        for gen_type in [
-                XORWOWRandomNumberGenerator,
-                #Sobol32RandomNumberGenerator
-                ]:
+        for gen_type in generator_types:
             gen = gen_type()
 
             for dtype in dtypes:

Attachment: signature.asc
Description: This is a digitally signed message part

_______________________________________________
PyCUDA mailing list
PyCUDA@tiker.net
http://lists.tiker.net/listinfo/pycuda

Reply via email to