Hi, Yes, thats perfect ! When the Buffer is mapped, the async copy is really fast. So it comes an other question: - What are the limitations of using the Buffer in kernels while it is mapped, is it necessary to release the buffer before ?
Best regards, Jean-Matthieu Le 10 oct. 2014 à 17:13, Andreas Kloeckner <[email protected]> a écrit : > Hi Jean-Matthieu, > Jean-Matthieu Etancelin <[email protected]> writes: >> While optimizing some host-device data transfers, I came to the little piece >> of code given below. >> My questions are : why such a long time is spent on the non-blocking copy >> launching ? What can I do to have a ‘real’ non-blocking call in order to do >> some computations on the host before waiting the copy completion ? >> In the example launch time ~ profile time where launch time is the >> cl.enqueue_copy calling time and profile time come from the event profiling >> informations. I was expecting that wait time ~ profile time. >> The result on a K20m is : >> In [15]: print "Launch time=", t_wait - t_start >> Launch time= 0.373787879944 >> >> In [16]: print "Wait time", t_end - t_wait >> Wait time 0.0372970104218 >> >> In [17]: print "Profile time", 1e-9 * (evt.profile.end - evt.profile.start) >> Profile time 0.338622592 > > On Nvidia implementations, the host memory from which you want to do > async copies has to be "page-locked", which in terms of their OpenCL > implementation means that it has to be allocated as a buffer with the > ALLOC_HOST_PTR flag. > > Hope that helps, > Andreas > -- Jean-Matthieu Etancelin Doctorant Laboratoire Jean Kuntzmann Université de Grenoble-Alpes _______________________________________________ PyOpenCL mailing list [email protected] http://lists.tiker.net/listinfo/pyopencl
