*sigh* So all that exists is an academic publication that you need to pay to even read? Also from the abstract I understand it targets multiple GPUs on a single host and introduces memory management (not sure if virtual VRAM) ; can't see anything related on running a problem on multiple hosts in parallel... On 20 Oct 2015 02:37, "Andreas Kloeckner" <[email protected]> wrote:
> "CRV§ADER//KY" <[email protected]> writes: > > > Hi all, > > I'm looking into setting up a cluster of GPGPU nodes. The nodes would be > > Linux based, and communicate between each other via ethernet. Each node > > would have multiple GPUs. > > > > I need to run a problem that for 99% can be described as y[i] = f(x1[i], > > x2[i], ... xn[i]), running on 1D vectors of data. In other words, I have > n > > input vectors and 1 output vector, all of the same size, and worker i-th > > will exclusively need to access element i-th of every vector. > > > > Are there any frameworks, preferably in Python and with direct access to > > OpenCL, that allow to transparently split the input data in segments, > send > > them over the network, do caching, feeding executor queues, etc. etc.? > > > > Data reuse is very heavy so if a vector is already in VRAM I don't want > to > > load it twice. > > > > Also, are there PyOpenCL bolt-ons that allow for virtual VRAM? That is, > to > > have more buffers than you can fit in VRAM, and transparently swap to > > system RAM those thare are not immediately needed? > > VirtCL is one. There was another, but I forgot what it was called. > > https://dl.acm.org/citation.cfm?id=2688505 > > Andreas > >
_______________________________________________ PyOpenCL mailing list [email protected] http://lists.tiker.net/listinfo/pyopencl
