Hi Leszek, Leszek Flis <[email protected]> writes: > May I ask about showing the place where > I can find a Python example code written in PyOpenCL > that sows how to run test code on more that one NVIDIA cards.
This is a thorny subject, with data sharing being the key issue. Since an OpenCL command queue is attached to a single device, it is not possible to submit a kernel to multiple devices and expect them to somehow "share the load". Instead, you will have to think about how to partition the work yourself and submit a part of the work to each device. At this point, it is usually most expedient to adopt a distributed-memory model and assume that there are arbitrarily many GPUs communicating over something like MPI. If you are willing to limit yourself to however many GPUs fit inside one machine, *and* if all these GPUs come from one manufacturer (i.e. live in the same CL "platform"), then you can make use of the fact that CL buffers attach to CL contexts, and a context can span many devices. You can therefore submit work to fill "buffer A" to device 1 and then submit work to use "buffer A" to device 2. It becomes the implementation's problem to make sure the data is resident where it is needed--i.e. the act of moving the data is hidden from you, and the implementation has the means to decide whether to do an efficient, hardware-level transfer, or to somehow map device 1's memory into the address space of device 2. Hope that helps, Andreas _______________________________________________ PyOpenCL mailing list [email protected] http://lists.tiker.net/listinfo/pyopencl
