Hi,

I've stripped my code down a bit so now it's just doing multiplication of a 2-D 
matrix and a vector. The problem is very reproducible on my Mac - when DIM <= 
1500, it works but when DIM = 1800, it hangs the GPU and I have to power cycle 
the machine. This is using a GeForce 9400M.

        Cheers,

        Matthew

DIM = 1800
BLOCK_SIZE = 16 # 512 max threads per mp
gj = np.random.randn(DIM).astype(np.float32)
dissim = np.random.randn(DIM, DIM).astype(np.float32)
ans = gpuarray.empty((DIM,), np.float32)

"""
Each thread calculates for one term (axis)
"""
kernel_code_template = '''
__global__ void GOPKernel(float *gj, float *d, float *ans)
{

  // Element
  int row = blockIdx.x * blockDim.x + threadIdx.x;
  float val = 0;

  for (int k = 0; k < %(DIM)s; ++k) {
    float d_elem = d[row * %(DIM)s + k];
    float gj_elem = gj[k];
    val += d_elem * gj_elem;
  }

    ans[row] = val;
}
'''

# Get the kernel code from the template
kernel_code = kernel_code_template % {
  'DIM': DIM
  }

# Compile the kernel code
mod = compiler.SourceModule(kernel_code)

# Get the kernel function from the compiled module
gopker = mod.get_function("GOPKernel")
gj = gpuarray.to_gpu(gj)
dissim = gpuarray.to_gpu(dissim)

gridx = DIM / BLOCK_SIZE if DIM%BLOCK_SIZE == 1 else DIM / BLOCK_SIZE + 1

# Call the function on the card
gopker(
  # inputs
  gj, dissim,
  # output
  ans,
  # block of multiple threads
  block = (BLOCK_SIZE, BLOCK_SIZE, 1),
  # grid of blocks
  grid = (gridx, 1)
  )

# Get result
z = ans.get()





On Apr 5, 2011, at 3:21 AM, Andreas Kloeckner wrote:

> Hi Matthew,
> 
> On Mon, 4 Apr 2011 18:13:14 +0100, Matthew Graham <m...@cacr.caltech.edu> 
> wrote:
>> I'm trying to run the following code on my Mac laptop to multiply a 1D
>> vector by a square matrix and then dot the result with another
>> vector. It works fine when the dimension of the matrix and vector
>> (DIM) is small but when at test values close the operating level (DIM
>> ~ 1500 - 2000), it computes for a few seconds and then crashes the
>> laptop with an unresponsive keyboard and screen.
>> 
>> I've run the wiki-examples MatrixmulTIled.py with the same dimensions
>> and that works fine. I would be grateful if someone could point out
>> what I am doing wrong.
> 
> Your code does not crash my Linux box, but it does lock up the GPU for a
> few seconds, after which PyCUDA reports a launch failure, and the syslog
> shows the typical 'GPU segfault' line:
> 
> [15532.668056] NVRM: Xid (0000:01:00): 13, 0003 00000000 000050c0
> 00000368 00000000 00000100
> 
> Maybe start by changing various indices to zero for debugging, just to
> see which access is causing the issue.
> 
> Andreas
> 
> _______________________________________________
> PyCUDA mailing list
> PyCUDA@tiker.net
> http://lists.tiker.net/listinfo/pycuda


_______________________________________________
PyCUDA mailing list
PyCUDA@tiker.net
http://lists.tiker.net/listinfo/pycuda

Reply via email to