Hi Nicolas,

On Dienstag 02 Februar 2010, Bonnel wrote:
> I was just playing with the profiler from nvidia and I'm wondering why
> all data from the graphic card are read back. I though memory was read
> back only when using cl.enqueue_read_buffer. Here is the result I get
> from the profiling of matrix-multiply.py :
> 
> method                        memory transfert size
> memcpyHtoDasync      5.12e+06
> memcpyHtoDasync      5.12e+06
> memcpyDtoHasync      2.56e+06
> memcpyDtoHasync      5.12e+06
> memcpyDtoHasync      2.56e+06
> memcpyDtoHasync      5.12e+06
> 
> As there is only one cl.enqueue_read_buffer call, there should be only
> one memcpyDtoHasync call.

I recently had an informative conversation with someone on the Nvidia
driver team, and they indicated that CL may 'transparently' issue
transfers after kernel launches based on the flags with which the buffer
was created.

Now I'm faced with two problems. First, all the Nvidia profiler does for
me is crash. I've figured out that I can invoke it from the command line
by specifying

export OPENCL_PROFILE=1
export OPENCL_PROFILE_CONFIG='temp_cl_profiler.conf'

and then find data in "opencl_profile_0.log". However no matter what I
put in temp_cl_profiler.conf, I can't see the extra transfers you are
seeing. Can you grab and post the generated config file, perhaps by

import os; print open(os.environ["OPENCL_PROFILE_CONFIG"], "r").read()

That would be very helpful. (If you could generate a survey of what the
file can look like, that would of course help even more!)

As far as flags were concerned, COPY_HOST_PTR was a natural suspect, but
removing that didn't change the timings. It would really help if I could
observe the extra transfers.

Thanks for posting your observations!

Andreas

Attachment: signature.asc
Description: This is a digitally signed message part.

_______________________________________________
PyOpenCL mailing list
[email protected]
http://host304.hostmonster.com/mailman/listinfo/pyopencl_tiker.net

Reply via email to