Hi Nicolas, On Dienstag 02 Februar 2010, Bonnel wrote: > I was just playing with the profiler from nvidia and I'm wondering why > all data from the graphic card are read back. I though memory was read > back only when using cl.enqueue_read_buffer. Here is the result I get > from the profiling of matrix-multiply.py : > > method memory transfert size > memcpyHtoDasync 5.12e+06 > memcpyHtoDasync 5.12e+06 > memcpyDtoHasync 2.56e+06 > memcpyDtoHasync 5.12e+06 > memcpyDtoHasync 2.56e+06 > memcpyDtoHasync 5.12e+06 > > As there is only one cl.enqueue_read_buffer call, there should be only > one memcpyDtoHasync call.
I recently had an informative conversation with someone on the Nvidia driver team, and they indicated that CL may 'transparently' issue transfers after kernel launches based on the flags with which the buffer was created. Now I'm faced with two problems. First, all the Nvidia profiler does for me is crash. I've figured out that I can invoke it from the command line by specifying export OPENCL_PROFILE=1 export OPENCL_PROFILE_CONFIG='temp_cl_profiler.conf' and then find data in "opencl_profile_0.log". However no matter what I put in temp_cl_profiler.conf, I can't see the extra transfers you are seeing. Can you grab and post the generated config file, perhaps by import os; print open(os.environ["OPENCL_PROFILE_CONFIG"], "r").read() That would be very helpful. (If you could generate a survey of what the file can look like, that would of course help even more!) As far as flags were concerned, COPY_HOST_PTR was a natural suspect, but removing that didn't change the timings. It would really help if I could observe the extra transfers. Thanks for posting your observations! Andreas
signature.asc
Description: This is a digitally signed message part.
_______________________________________________ PyOpenCL mailing list [email protected] http://host304.hostmonster.com/mailman/listinfo/pyopencl_tiker.net
