Hi,

I am creating some timing tests with PyCUDA for batch-loading an image
sequence. I first tried timing a normal, synchronous transfer over global
memory.

Now I am looking to test pagelocked memory, specifically, I would like to
test: Single-stream, pagelocked synchronous transfers, multi-stream,
asynchronous pagelocked transfers and zero-copy memory using device mapped
memory.

For the first one, do I simply call pycuda.driver.memcpy_htod/dtoh using
the pagelocked memory (I am using memflags=0 for creating the pagelocked
memory, I assume it corresponds to cudaHostAllocDefault?) For the second, I
would use the memcpy_(htod/dtoh)_async calls with more than one stream (my
laptop supports concurrent kernels). For the final one, I would create my
own context using pycuda.driver.make_context with the MAP_HOST flag,
allocate the pagelocked memory using host_alloc_flags.DEVICE_MAP and call
my kernel with the device pointer? Am I on the right track?

I had a hard time finding good tutorials/source (even in the PyCUDA
examples section), so I plan to submit some examples if I have time :)
Also...Excellent library!

Best regards,
Alexander
_______________________________________________
PyCUDA mailing list
[email protected]
http://lists.tiker.net/listinfo/pycuda

Reply via email to