Hi all, 

While optimizing some host-device data transfers, I came to the little piece of 
code given below.
My questions are : why such a long time is spent on the non-blocking copy 
launching ? What can I do to have a ‘real’ non-blocking call in order to do 
some computations on the host before waiting the copy completion ? 
In the example launch time ~ profile time where launch time is the 
cl.enqueue_copy calling time and profile time come from the event profiling 
informations. I was expecting that wait time ~ profile time.
The result on a K20m is :
In [15]: print "Launch time=", t_wait - t_start
Launch time= 0.373787879944

In [16]: print "Wait time", t_end - t_wait
Wait time 0.0372970104218

In [17]: print "Profile time", 1e-9 * (evt.profile.end - evt.profile.start)
Profile time 0.338622592

Thanks, 
Jean-Matthieu.


import time
import pyopencl as cl
import numpy as np
ctx = cl.create_some_context()
queue = cl.CommandQueue(ctx, 
properties=cl.command_queue_properties.PROFILING_ENABLE)
data = np.zeros((512, 512, 512), dtype=np.float64)
data_cl = cl. Buffer(ctx, cl .mem_flags.READ_WRITE, size=data.nbytes)
cl.enqueue_copy(queue, data_cl, data)
queue.finish()
t_start = time.time()
evt = cl.enqueue_copy(queue, data_cl, data, is_blocking=False)
t_wait = time.time()
evt.wait()
t_end = time.time()
print "Launch time=", t_wait - t_start
print "Wait time", t_end - t_wait
print "Profile time", 1e-9 * (evt.profile.end - evt.profile.start)

_______________________________________________
PyOpenCL mailing list
[email protected]
http://lists.tiker.net/listinfo/pyopencl

Reply via email to