Dear Matthew, first of all what GPU you are using? A Geforce or a Tesla or something like that?
I see a quadruple loop in your kernel. If it's quick to execute there's no problem with that, however, be careful, if you have a Geforce for instance, there's a time limit a thread can run (5s I think), so the kernel cannot run forever. But I think that's not the case with Teslas for instance. Try to execute something shorter and see if it works....then add another sub-loop and you'll see that it's too slow. ++ Peter On 06/21/2011 08:05 AM, Matthew Graham wrote: > Hi, > > I'm having problems with big arrays; this crashes horribly: > > from pycuda import driver, compiler, gpuarray, tools > from pycuda.curandom import rand as curand > import numpy as np > import pycuda.autoinit > > NV = 5632 # No of vectors > DIM = 2031 # No of dimensions per vector > BLOCK_SIZE = 16 # 512 max threads per mp > GRID_SIZE = 352 > HEIGHT = 8 > WIDTH = 8 > > gj = curand((NV, DIM)) > nodes = curand((HEIGHT, WIDTH, DIM)) # Each row is a vector > dissim = curand((DIM,DIM)) > nearest = gpuarray.zeros((NV), np.float32) > > kernel_code = ''' > __global__ void NearestPrototypeKernel(float *gj, float *nodes, float > *dissim, float *nearest) > { > // Element > int idx = blockIdx.x * blockDim.x + threadIdx.x; > float value = 0.; > > for (int i = 0; i < 8; ++i) { > for (int j = 0; j < 8; ++j) { > for (int l = 0; l < 2031; ++l) { > float wp_l = gj[idx * 2031 + l]; > if (wp_l > 0) { > for (int k = 0; k < 2031; ++k) { > value += 0.01; > } > } > } > } > } > > nearest[idx] = value; > } > ''' > mod = compiler.SourceModule(kernel_code) > npker = mod.get_function("NearestPrototypeKernel") > npker( > # inputs > gj, nodes, dissim, > # output > nearest, > # block of multiple threads > block = (BLOCK_SIZE, 1, 1), > # grid of blocks > grid = (GRID_SIZE, 1) > ) > a = nearest.get() > > --- > > with this: > > /Library/Frameworks/Python.framework/Versions/6.0.0/lib/python2.6/site-packages/pycuda-2011.1-py2.6-macosx-10.5-i386.egg/pycuda/compiler.py:122: > UserWarning: The CUDA compiler suceeded, but said the following: > ptxas /tmp/tmpxft_000003e5_00000000-2_kernel.ptx, line 93; warning : Double > is not supported. Demoting to float > > +stdout+stderr) > Traceback (most recent call last): > File "kernel_test.py", line 54, in <module> > a = nearest.get() > File > "/Library/Frameworks/Python.framework/Versions/6.0.0/lib/python2.6/site-packages/pycuda-2011.1-py2.6-macosx-10.5-i386.egg/pycuda/gpuarray.py", > line 177, in get > drv.memcpy_dtoh(ary, self.gpudata) > pycuda._driver.LaunchError: cuMemcpyDtoH failed: launch timeout > Error in atexit._run_exitfuncs: > Traceback (most recent call last): > File > "/Library/Frameworks/Python.framework/Versions/6.0.0/lib/python2.6/atexit.py", > line 24, in _run_exitfuncs > func(*targs, **kargs) > File > "/Library/Frameworks/Python.framework/Versions/6.0.0/lib/python2.6/site-packages/pycuda-2011.1-py2.6-macosx-10.5-i386.egg/pycuda/autoinit.py", > line 13, in _finish_up > context.pop() > LaunchError: cuCtxPopCurrent failed: launch timeout > Error in sys.exitfunc: > Traceback (most recent call last): > File > "/Library/Frameworks/Python.framework/Versions/6.0.0/lib/python2.6/atexit.py", > line 24, in _run_exitfuncs > func(*targs, **kargs) > File > "/Library/Frameworks/Python.framework/Versions/6.0.0/lib/python2.6/site-packages/pycuda-2011.1-py2.6-macosx-10.5-i386.egg/pycuda/autoinit.py", > line 13, in _finish_up > context.pop() > pycuda._driver.LaunchError: cuCtxPopCurrent failed: launch timeout > PyCUDA WARNING: a clean-up operation failed (dead context maybe?) > cuMemFree failed: invalid context > PyCUDA WARNING: a clean-up operation failed (dead context maybe?) > cuMemFree failed: invalid context > PyCUDA WARNING: a clean-up operation failed (dead context maybe?) > cuMemFree failed: invalid context > PyCUDA WARNING: a clean-up operation failed (dead context maybe?) > cuModuleUnload failed: invalid context > PyCUDA WARNING: a clean-up operation failed (dead context maybe?) > cuMemFree failed: invalid context > ------------------------------------------------------------------- > PyCUDA ERROR: The context stack was not empty upon module cleanup. > ------------------------------------------------------------------- > A context was still active when the context stack was being > cleaned up. At this point in our execution, CUDA may already > have been deinitialized, so there is no way we can finish > cleanly. The program will be aborted now. > Use Context.pop() to avoid this problem. > ------------------------------------------------------------------- > Abort > > ---- > > Does anybody have any idea to get around this? > > Cheers, > > Matthew > _______________________________________________ > PyCUDA mailing list > PyCUDA@tiker.net > http://lists.tiker.net/listinfo/pycuda > -- Peter Schmidtke PhD Student Dept. Physical Chemistry Faculty of Pharmacy University of Barcelona _______________________________________________ PyCUDA mailing list PyCUDA@tiker.net http://lists.tiker.net/listinfo/pycuda