Thank you marmaduke! For points 1 and 2, that's what i've noticed too. -Oresits
Date: Fri, 6 Jul 2012 13:42:32 +0200 Subject: Re: [PyCUDA] Do tasks run in background? From: [email protected] To: [email protected] CC: [email protected] Hi You may want to hold out for a more authoritative response from someone else, but I have noticed and write my code assuming that - func() will launch the kernel and return (almost) immediately - attempts to access gpuarrays involved in a launched kernel will block until launched kernel has completed - pycuda.driver.Context.synchronize can be called to explicitly wait for kernel launch to complete (which is useful if you have two kernels operating on same data, as they could otherwise run simultaneously) cheers Marmaduke On Fri, Jul 6, 2012 at 11:39 AM, Orestis K <[email protected]> wrote: Hello everyone! I'm new to PyCUDA and GPU programming however initial experiences have been very pleasant. I started out by some simple task and it seems blazing faster than running on a CPU. However, I would like to confirm that it's indeed as fast as it seems. My main question is whether after 'func' is called and access of the prompt is regained, are there still any of the tasks running on the GPU? If so, is there a way to block from performing the next tasks until it has finished? I've posted the code below for reference purposes. You can change the value of N so that it's faster. I set it very close to the limit so that I might witness a delay on returning control of the command prompt. Thank you in advance and please keep up the excellent work! -Orestis ================================================================= import pycuda.driver as cuda import pycuda.autoinit from pycuda.compiler import SourceModule import pycuda.gpuarray as gpuarray import sys, numpy, random, string # create random input data N = 33500000 buf = ''.join(random.choice(string.ascii_uppercase + string.ascii_lowercase + string.digits) for x in xrange(N)) mod = SourceModule(""" __global__ void get_words(int N, char *a,unsigned int *b) { int idx = blockIdx.x * blockDim.x + threadIdx.x; if ( idx <N-3) { b[idx] = (a[idx] << 24) + (a[idx+3]); } } """) func = mod.get_function("get_words") # copy buffer to GPU bufArray = cuda.mem_alloc(N) cuda.memcpy_htod(bufArray, buf) # create results array on GPU resArray = gpuarray.to_gpu(numpy.zeros((N-3,1),dtype=numpy.int32)) # setup parameters and execute function threadsPerBlock = 512 blocksPerGrid = (N+threadsPerBlock-1)/threadsPerBlock func(numpy.int32(len(buf)), bufArray, resArray, grid=(blocksPerGrid ,1), block=(threadsPerBlock,1,1)) # get back results a = numpy.zeros(N-3,1),dtype=numpy.int32) b = resArray.get(a) a = a.reshape(-1).tolist() _______________________________________________ PyCUDA mailing list [email protected] http://lists.tiker.net/listinfo/pycuda
_______________________________________________ PyCUDA mailing list [email protected] http://lists.tiker.net/listinfo/pycuda
