Hi all, While optimizing some host-device data transfers, I came to the little piece of code given below. My questions are : why such a long time is spent on the non-blocking copy launching ? What can I do to have a ‘real’ non-blocking call in order to do some computations on the host before waiting the copy completion ? In the example launch time ~ profile time where launch time is the cl.enqueue_copy calling time and profile time come from the event profiling informations. I was expecting that wait time ~ profile time. The result on a K20m is : In [15]: print "Launch time=", t_wait - t_start Launch time= 0.373787879944
In [16]: print "Wait time", t_end - t_wait Wait time 0.0372970104218 In [17]: print "Profile time", 1e-9 * (evt.profile.end - evt.profile.start) Profile time 0.338622592 Thanks, Jean-Matthieu. import time import pyopencl as cl import numpy as np ctx = cl.create_some_context() queue = cl.CommandQueue(ctx, properties=cl.command_queue_properties.PROFILING_ENABLE) data = np.zeros((512, 512, 512), dtype=np.float64) data_cl = cl. Buffer(ctx, cl .mem_flags.READ_WRITE, size=data.nbytes) cl.enqueue_copy(queue, data_cl, data) queue.finish() t_start = time.time() evt = cl.enqueue_copy(queue, data_cl, data, is_blocking=False) t_wait = time.time() evt.wait() t_end = time.time() print "Launch time=", t_wait - t_start print "Wait time", t_end - t_wait print "Profile time", 1e-9 * (evt.profile.end - evt.profile.start)
_______________________________________________ PyOpenCL mailing list [email protected] http://lists.tiker.net/listinfo/pyopencl
