Hi all,
I'm looking into setting up a cluster of GPGPU nodes. The nodes would be
Linux based, and communicate between each other via ethernet. Each node
would have multiple GPUs.

I need to run a problem that for 99% can be described as y[i] = f(x1[i],
x2[i], ... xn[i]), running on 1D vectors of data. In other words, I have n
input vectors and 1 output vector, all of the same size, and  worker i-th
will exclusively need to access element i-th of every vector.

Are there any frameworks, preferably in Python and with direct access to
OpenCL, that allow to transparently split the input data in segments, send
them over the network, do caching, feeding executor queues, etc. etc.?

Data reuse is very heavy so if a vector is already in VRAM I don't want to
load it twice.

Also, are there PyOpenCL bolt-ons that allow for virtual VRAM? That is, to
have more buffers than you can fit in VRAM, and transparently swap to
system RAM those thare are not immediately needed?

Thanks
Guido
_______________________________________________
PyOpenCL mailing list
[email protected]
http://lists.tiker.net/listinfo/pyopencl

Reply via email to