Hi, I've stripped my code down a bit so now it's just doing multiplication of a 2-D matrix and a vector. The problem is very reproducible on my Mac - when DIM <= 1500, it works but when DIM = 1800, it hangs the GPU and I have to power cycle the machine. This is using a GeForce 9400M.
Cheers, Matthew DIM = 1800 BLOCK_SIZE = 16 # 512 max threads per mp gj = np.random.randn(DIM).astype(np.float32) dissim = np.random.randn(DIM, DIM).astype(np.float32) ans = gpuarray.empty((DIM,), np.float32) """ Each thread calculates for one term (axis) """ kernel_code_template = ''' __global__ void GOPKernel(float *gj, float *d, float *ans) { // Element int row = blockIdx.x * blockDim.x + threadIdx.x; float val = 0; for (int k = 0; k < %(DIM)s; ++k) { float d_elem = d[row * %(DIM)s + k]; float gj_elem = gj[k]; val += d_elem * gj_elem; } ans[row] = val; } ''' # Get the kernel code from the template kernel_code = kernel_code_template % { 'DIM': DIM } # Compile the kernel code mod = compiler.SourceModule(kernel_code) # Get the kernel function from the compiled module gopker = mod.get_function("GOPKernel") gj = gpuarray.to_gpu(gj) dissim = gpuarray.to_gpu(dissim) gridx = DIM / BLOCK_SIZE if DIM%BLOCK_SIZE == 1 else DIM / BLOCK_SIZE + 1 # Call the function on the card gopker( # inputs gj, dissim, # output ans, # block of multiple threads block = (BLOCK_SIZE, BLOCK_SIZE, 1), # grid of blocks grid = (gridx, 1) ) # Get result z = ans.get() On Apr 5, 2011, at 3:21 AM, Andreas Kloeckner wrote: > Hi Matthew, > > On Mon, 4 Apr 2011 18:13:14 +0100, Matthew Graham <m...@cacr.caltech.edu> > wrote: >> I'm trying to run the following code on my Mac laptop to multiply a 1D >> vector by a square matrix and then dot the result with another >> vector. It works fine when the dimension of the matrix and vector >> (DIM) is small but when at test values close the operating level (DIM >> ~ 1500 - 2000), it computes for a few seconds and then crashes the >> laptop with an unresponsive keyboard and screen. >> >> I've run the wiki-examples MatrixmulTIled.py with the same dimensions >> and that works fine. I would be grateful if someone could point out >> what I am doing wrong. > > Your code does not crash my Linux box, but it does lock up the GPU for a > few seconds, after which PyCUDA reports a launch failure, and the syslog > shows the typical 'GPU segfault' line: > > [15532.668056] NVRM: Xid (0000:01:00): 13, 0003 00000000 000050c0 > 00000368 00000000 00000100 > > Maybe start by changing various indices to zero for debugging, just to > see which access is causing the issue. > > Andreas > > _______________________________________________ > PyCUDA mailing list > PyCUDA@tiker.net > http://lists.tiker.net/listinfo/pycuda _______________________________________________ PyCUDA mailing list PyCUDA@tiker.net http://lists.tiker.net/listinfo/pycuda