Hi,
I've stripped my code down a bit so now it's just doing multiplication of a 2-D
matrix and a vector. The problem is very reproducible on my Mac - when DIM <=
1500, it works but when DIM = 1800, it hangs the GPU and I have to power cycle
the machine. This is using a GeForce 9400M.
Cheers,
Matthew
DIM = 1800
BLOCK_SIZE = 16 # 512 max threads per mp
gj = np.random.randn(DIM).astype(np.float32)
dissim = np.random.randn(DIM, DIM).astype(np.float32)
ans = gpuarray.empty((DIM,), np.float32)
"""
Each thread calculates for one term (axis)
"""
kernel_code_template = '''
__global__ void GOPKernel(float *gj, float *d, float *ans)
{
// Element
int row = blockIdx.x * blockDim.x + threadIdx.x;
float val = 0;
for (int k = 0; k < %(DIM)s; ++k) {
float d_elem = d[row * %(DIM)s + k];
float gj_elem = gj[k];
val += d_elem * gj_elem;
}
ans[row] = val;
}
'''
# Get the kernel code from the template
kernel_code = kernel_code_template % {
'DIM': DIM
}
# Compile the kernel code
mod = compiler.SourceModule(kernel_code)
# Get the kernel function from the compiled module
gopker = mod.get_function("GOPKernel")
gj = gpuarray.to_gpu(gj)
dissim = gpuarray.to_gpu(dissim)
gridx = DIM / BLOCK_SIZE if DIM%BLOCK_SIZE == 1 else DIM / BLOCK_SIZE + 1
# Call the function on the card
gopker(
# inputs
gj, dissim,
# output
ans,
# block of multiple threads
block = (BLOCK_SIZE, BLOCK_SIZE, 1),
# grid of blocks
grid = (gridx, 1)
)
# Get result
z = ans.get()
On Apr 5, 2011, at 3:21 AM, Andreas Kloeckner wrote:
> Hi Matthew,
>
> On Mon, 4 Apr 2011 18:13:14 +0100, Matthew Graham <[email protected]>
> wrote:
>> I'm trying to run the following code on my Mac laptop to multiply a 1D
>> vector by a square matrix and then dot the result with another
>> vector. It works fine when the dimension of the matrix and vector
>> (DIM) is small but when at test values close the operating level (DIM
>> ~ 1500 - 2000), it computes for a few seconds and then crashes the
>> laptop with an unresponsive keyboard and screen.
>>
>> I've run the wiki-examples MatrixmulTIled.py with the same dimensions
>> and that works fine. I would be grateful if someone could point out
>> what I am doing wrong.
>
> Your code does not crash my Linux box, but it does lock up the GPU for a
> few seconds, after which PyCUDA reports a launch failure, and the syslog
> shows the typical 'GPU segfault' line:
>
> [15532.668056] NVRM: Xid (0000:01:00): 13, 0003 00000000 000050c0
> 00000368 00000000 00000100
>
> Maybe start by changing various indices to zero for debugging, just to
> see which access is causing the issue.
>
> Andreas
>
> _______________________________________________
> PyCUDA mailing list
> [email protected]
> http://lists.tiker.net/listinfo/pycuda
_______________________________________________
PyCUDA mailing list
[email protected]
http://lists.tiker.net/listinfo/pycuda