*sigh* So all that exists is an academic publication that you need to pay
to even read? Also from the abstract I understand it targets multiple GPUs
on a single host and introduces memory management (not sure if virtual
VRAM) ; can't see anything related on running a problem on multiple hosts
in parallel...
On 20 Oct 2015 02:37, "Andreas Kloeckner" <[email protected]> wrote:

> "CRV§ADER//KY" <[email protected]> writes:
>
> > Hi all,
> > I'm looking into setting up a cluster of GPGPU nodes. The nodes would be
> > Linux based, and communicate between each other via ethernet. Each node
> > would have multiple GPUs.
> >
> > I need to run a problem that for 99% can be described as y[i] = f(x1[i],
> > x2[i], ... xn[i]), running on 1D vectors of data. In other words, I have
> n
> > input vectors and 1 output vector, all of the same size, and  worker i-th
> > will exclusively need to access element i-th of every vector.
> >
> > Are there any frameworks, preferably in Python and with direct access to
> > OpenCL, that allow to transparently split the input data in segments,
> send
> > them over the network, do caching, feeding executor queues, etc. etc.?
> >
> > Data reuse is very heavy so if a vector is already in VRAM I don't want
> to
> > load it twice.
> >
> > Also, are there PyOpenCL bolt-ons that allow for virtual VRAM? That is,
> to
> > have more buffers than you can fit in VRAM, and transparently swap to
> > system RAM those thare are not immediately needed?
>
> VirtCL is one. There was another, but I forgot what it was called.
>
> https://dl.acm.org/citation.cfm?id=2688505
>
> Andreas
>
>
_______________________________________________
PyOpenCL mailing list
[email protected]
http://lists.tiker.net/listinfo/pyopencl

Reply via email to