We've managed to reproduce the error on a second machine. On this machine with 
a slightly less powerful graphics card, the 'illegal memory access' error 
appeared when N was approx. 17000 (previously: 37000).Below that, no errors or 
unsual behaviour was recorded. The results were correct also. Combined array 
sizes were still only a fraction of the capacity of the card (2GB).
Perhaps someone has an explanation? Have I conceptually misundersttod 
something? Why does the error occur only when N is larger than a given number? 
And how could I determine from what N the code will fail?
Many thanks,
Daniel
Here's the sample code once more. It produces the pairwise cross-products of 
two lists of 3d vectors:
from __future__ import print_functionfrom __future__ import 
absolute_importimport pycuda.autoinitimport numpyfrom pycuda.compiler import 
SourceModulefrom pycuda import gpuarray
mod = SourceModule("""__global__ void cross_products(float3* vCs, float3* vAs, 
float3* vBs, int w, int h){  const int r = blockIdx.x * blockDim.x + 
threadIdx.x;  const int c = blockIdx.y * blockDim.y + threadIdx.y;  int i = r * 
w + c;    if ((c >= w) || (r >= h) || i > w*h)  {  return;  }    float3 vA = 
vAs[i];  float3 vB = vBs[i];    float3 vC = make_float3(vA.y*vB.z - vA.z*vB.y, 
vA.z*vB.x - vA.x*vB.z, vA.x*vB.y - vA.y*vB.x);     vCs[i] = vC;  }""")
cross_products = mod.get_function("cross_products")N = 35000M = 3a = 
numpy.ndarray((N,M), dtype = numpy.float32)b = numpy.ndarray((N,M), dtype = 
numpy.float32)for i in range(0,N):    a[i] = [1,0,0]    b[i] = [0,1,0]
c = numpy.zeros((N,M), dtype = numpy.float32)
print("a x b")print(numpy.cross(a,b))print(numpy.cross(a,b).nbytes)M_gpu = 
numpy.int32(M)N_gpu = numpy.int32(N)a_gpu = gpuarray.to_gpu(a) b_gpu = 
gpuarray.to_gpu(b)c_gpu = gpuarray.to_gpu(c)

bx = 32 #256by = 3 #1gdimX = (int)((N + bx-1) / bx);gdimY = (int)((M + by-1) / 
by); print("grid")print(gdimX)print(gdimY)cross_products(c_gpu, a_gpu, b_gpu, 
M_gpu, N_gpu, block=(bx,by,1), grid = (gdimX, gdimY))
dest = 
c_gpu.get()print(a_gpu.mem_size)print("dest")print(dest)print("diff")print(numpy.sum(dest-numpy.cross(a,b)))

From: dgebrei...@hotmail.com
To: pycuda@tiker.net
Date: Sun, 16 Oct 2016 22:22:11 +0200
Subject: [PyCUDA] an illegal memory access was encountered




Hello all,
I get "pycuda._driver.LogicError: cuMemcpyDtoH failed: an illegal memory access 
was encountered" errors when I use pycuda with matrices over certain sizes. 
Only a restart of spyder remedies the issue. The matrix sizes are still well 
below what I believe my graphics card should be able to handle (a Geforce GTX 
1060, 3GB). Is there a pycuda-driven limit? 
I've created a fairly simple example which simply computes the cross products 
of two 3d-vectors.
The code works fine for up N approx. 35000 vectors. Above that, I get the 
following error:
Traceback (most recent call last):  File 
"C:\owncloud\Python\float3_example.py", line 68, in <module>    dest = 
c_gpu.get()  File 
"C:\WinPython-64bit-3.5.2.2Qt5\python-3.5.2.amd64\lib\site-packages\pycuda-2016.1.2-py3.5-win-amd64.egg\pycuda\gpuarray.py",
 line 271, in get    _memcpy_discontig(ary, self, async=async, stream=stream)  
File 
"C:\WinPython-64bit-3.5.2.2Qt5\python-3.5.2.amd64\lib\site-packages\pycuda-2016.1.2-py3.5-win-amd64.egg\pycuda\gpuarray.py",
 line 1190, in _memcpy_discontig    drv.memcpy_dtoh(dst, 
src.gpudata)pycuda._driver.LogicError: cuMemcpyDtoH failed: an illegal memory 
access was encountered
Assuming the problem lies with my code rather than pyCuda - is there a problem 
with my usage of the float3 vector types inside but not outside the CUDA 
kernel? (The results are correct for small matrices.) I couldn't find a succint 
example of a best practice case of passing lists of 3d vectors (or float3s) to 
kernel using pyCuda. Or the way I have set up blocks and grids (I tried many)?
Many thanks!
Here's the very simple example:
from __future__ import print_functionfrom __future__ import 
absolute_importimport pycuda.autoinitimport numpyfrom pycuda.compiler import 
SourceModulefrom pycuda import gpuarray
mod = SourceModule("""__global__ void cross_products(float3* vCs, float3* vAs, 
float3* vBs, int w, int h){  const int c = blockIdx.x * blockDim.x + 
threadIdx.x;  const int r = blockIdx.y * blockDim.y + threadIdx.y;  int i = r * 
w + c; // 1D flat index    // Check if within array bounds.  if ((c >= w) || (r 
>= h))  {  return;  }    float3 vA = vAs[i];  float3 vB = vBs[i];    float3 vC 
= make_float3(vA.y*vB.z - vA.z*vB.y, vA.z*vB.x - vA.x*vB.z, vA.x*vB.y - 
vA.y*vB.x);     vCs[i] = vC;  }""")
cross_products = mod.get_function("cross_products")N = 32000 #on my machine, 
this fails if N > 36000M = 3a = numpy.ndarray((N,M), dtype = numpy.float32)b = 
numpy.ndarray((N,M), dtype = numpy.float32)for i in range(0,N):    a[i] = 
[1,0,0]    b[i] = [0,1,0]
c = numpy.zeros((N,M), dtype = numpy.float32)
print("a x b")print(numpy.cross(a,b))
M_gpu = numpy.int32(M)N_gpu = numpy.int32(N)a_gpu = gpuarray.to_gpu(a) b_gpu = 
gpuarray.to_gpu(b)c_gpu = gpuarray.to_gpu(c)

bx = 32 #256by = 32 #1gdimX = (int)((M + bx-1) / bx);gdimY = (int)((N + by-1) / 
by); print("grid")print(gdimX)print(gdimY)cross_products(c_gpu, a_gpu, b_gpu, 
M_gpu, N_gpu, block=(bx,by,1), grid = (gdimX, gdimY))
dest = c_gpu.get()
print("dest")print(dest)print("diff")print(dest-numpy.cross(a,b))
                                          

_______________________________________________
PyCUDA mailing list
PyCUDA@tiker.net
https://lists.tiker.net/listinfo/pycuda                                         
  
_______________________________________________
PyCUDA mailing list
PyCUDA@tiker.net
https://lists.tiker.net/listinfo/pycuda

Reply via email to