Hello.
Can anyone with Fermi (GTX 460, 470, 480 - are there other Fermi cards?)
tell whether attached patch solves problems with GPUArray on Fermi?
There has been discussion here on this list (started on 2010-09-27
by jmcarval) about problems with GPUArray. In summary,
test/test_gpuarray.py failed four times on Fermi.

I have send this patch to mailing list on 2010-10-01, but got no
reply whether it works or not.
I would like to know if it works to know how to proceed with
PyCUDA packaging for Debian. CUDA toolkit is waiting to be
included, and as soon as it is accepted into Debian I intend
to ask for sponsorship for PyCUDA packages.
I am not sure, however, if I should leave PyCUDA as is (and
risk filling bugs by with Fermi GPUs) or to apply untested
patch, and risk that it does not work fully/has some side effects.

So please, if you have Fermi cards:
1. apply attached patch to PyCUDA from git 
2. build PyCUDA
3. run test/test_gpuarray.py from source directory
4. send information (either to the list (better) or to me directly)
whether there are any errors with GPUArray after applying this patch

Thanks in advance.
Best regards

PS. I am waiting impatiently for responses ;-)

-- 
Tomasz Rybak <bogom...@post.pl> GPG/PGP key ID: 2AD5 9860
Fingerprint A481 824E 7DD3 9C0E C40A  488E C654 FB33 2AD5 9860
http://member.acm.org/~tomaszrybak
diff --git a/pycuda/reduction.py b/pycuda/reduction.py
index 0a17508..5fb8802 100644
--- a/pycuda/reduction.py
+++ b/pycuda/reduction.py
@@ -122,12 +122,14 @@ def get_reduction_module(out_type, block_size,
 
           if (tid < 32) 
           {
-            if (BLOCK_SIZE >= 64) sdata[tid] = REDUCE(sdata[tid], sdata[tid + 32]);
-            if (BLOCK_SIZE >= 32) sdata[tid] = REDUCE(sdata[tid], sdata[tid + 16]);
-            if (BLOCK_SIZE >= 16) sdata[tid] = REDUCE(sdata[tid], sdata[tid + 8]);
-            if (BLOCK_SIZE >= 8)  sdata[tid] = REDUCE(sdata[tid], sdata[tid + 4]);
-            if (BLOCK_SIZE >= 4)  sdata[tid] = REDUCE(sdata[tid], sdata[tid + 2]);
-            if (BLOCK_SIZE >= 2)  sdata[tid] = REDUCE(sdata[tid], sdata[tid + 1]);
+// Fermi compatibility guide 1.2.2
+            volatile out_type * smem = sdata;
+            if (BLOCK_SIZE >= 64) smem[tid] = REDUCE(smem[tid], smem[tid + 32]);
+            if (BLOCK_SIZE >= 32) smem[tid] = REDUCE(smem[tid], smem[tid + 16]);
+            if (BLOCK_SIZE >= 16) smem[tid] = REDUCE(smem[tid], smem[tid + 8]);
+            if (BLOCK_SIZE >= 8)  smem[tid] = REDUCE(smem[tid], smem[tid + 4]);
+            if (BLOCK_SIZE >= 4)  smem[tid] = REDUCE(smem[tid], smem[tid + 2]);
+            if (BLOCK_SIZE >= 2)  smem[tid] = REDUCE(smem[tid], smem[tid + 1]);
           }
 
           if (tid == 0) out[blockIdx.x] = sdata[0];

Attachment: signature.asc
Description: This is a digitally signed message part

_______________________________________________
PyCUDA mailing list
PyCUDA@tiker.net
http://lists.tiker.net/listinfo/pycuda

Reply via email to