Hi Leszek,

Leszek Flis <[email protected]> writes:
> May I ask about showing the place where
> I can find a Python example code written in PyOpenCL
> that sows how to run test code on more that one NVIDIA cards.

This is a thorny subject, with data sharing being the key issue. Since
an OpenCL command queue is attached to a single device, it is not
possible to submit a kernel to multiple devices and expect them to
somehow "share the load". Instead, you will have to think about how to
partition the work yourself and submit a part of the work to each
device.

At this point, it is usually most expedient to adopt a
distributed-memory model and assume that there are arbitrarily many GPUs
communicating over something like MPI.

If you are willing to limit yourself to however many GPUs fit inside one
machine, *and* if all these GPUs come from one manufacturer (i.e. live
in the same CL "platform"), then you can make use of the fact that CL
buffers attach to CL contexts, and a context can span many devices. You
can therefore submit work to fill "buffer A" to device 1 and then submit
work to use "buffer A" to device 2. It becomes the implementation's
problem to make sure the data is resident where it is needed--i.e. the
act of moving the data is hidden from you, and the implementation has
the means to decide whether to do an efficient, hardware-level
transfer, or to somehow map device 1's memory into the address space of
device 2.

Hope that helps,
Andreas

_______________________________________________
PyOpenCL mailing list
[email protected]
http://lists.tiker.net/listinfo/pyopencl

Reply via email to