Hi, 

Yes, thats perfect ! When the Buffer is mapped, the async copy is really fast.
So it comes an other question:
- What are the limitations of using the Buffer in kernels while it is mapped, 
is it necessary to release the buffer before ?

Best regards,
Jean-Matthieu 

Le 10 oct. 2014 à 17:13, Andreas Kloeckner <[email protected]> a écrit :

> Hi Jean-Matthieu,
> Jean-Matthieu Etancelin <[email protected]> writes:
>> While optimizing some host-device data transfers, I came to the little piece 
>> of code given below.
>> My questions are : why such a long time is spent on the non-blocking copy 
>> launching ? What can I do to have a ‘real’ non-blocking call in order to do 
>> some computations on the host before waiting the copy completion ? 
>> In the example launch time ~ profile time where launch time is the 
>> cl.enqueue_copy calling time and profile time come from the event profiling 
>> informations. I was expecting that wait time ~ profile time.
>> The result on a K20m is :
>> In [15]: print "Launch time=", t_wait - t_start
>> Launch time= 0.373787879944
>> 
>> In [16]: print "Wait time", t_end - t_wait
>> Wait time 0.0372970104218
>> 
>> In [17]: print "Profile time", 1e-9 * (evt.profile.end - evt.profile.start)
>> Profile time 0.338622592
> 
> On Nvidia implementations, the host memory from which you want to do
> async copies has to be "page-locked", which in terms of their OpenCL
> implementation means that it has to be allocated as a buffer with the
> ALLOC_HOST_PTR flag.
> 
> Hope that helps,
> Andreas
> 

--
Jean-Matthieu Etancelin
Doctorant
Laboratoire Jean Kuntzmann
Université de Grenoble-Alpes


_______________________________________________
PyOpenCL mailing list
[email protected]
http://lists.tiker.net/listinfo/pyopencl

Reply via email to