Hi all, I'm looking into setting up a cluster of GPGPU nodes. The nodes would be Linux based, and communicate between each other via ethernet. Each node would have multiple GPUs.
I need to run a problem that for 99% can be described as y[i] = f(x1[i], x2[i], ... xn[i]), running on 1D vectors of data. In other words, I have n input vectors and 1 output vector, all of the same size, and worker i-th will exclusively need to access element i-th of every vector. Are there any frameworks, preferably in Python and with direct access to OpenCL, that allow to transparently split the input data in segments, send them over the network, do caching, feeding executor queues, etc. etc.? Data reuse is very heavy so if a vector is already in VRAM I don't want to load it twice. Also, are there PyOpenCL bolt-ons that allow for virtual VRAM? That is, to have more buffers than you can fit in VRAM, and transparently swap to system RAM those thare are not immediately needed? Thanks Guido
_______________________________________________ PyOpenCL mailing list [email protected] http://lists.tiker.net/listinfo/pyopencl
