Hi Pekka, On Mon, Mar 18, 2013 at 10:07:26PM +0200, Pekka Jääskeläinen wrote: > It should be doable with the CUDA API and the LLVM NVPTX backend. > I took a look at the CUDA API some time ago with this exact idea in > mind, but didn't have the time to move forward with it. > > How much work it is, I'm not sure, as I haven't tested the LLVM NVPTX > backend nor the API. But my guess is it shouldn't be too hard to get > something running because we have the previous drivers for > heterogeneous device setups as examples. > > If you are up for the task, take a look at the pocl device > drivers for cellspu, TCE (ttasim), or the Tom Stellard's > unfinished Gallium compute / AMD R600 driver.
Thanks for your quick response! I took a tour of the code in lib/CL and lib/CL/devices, and the addition of a CUDA driver device seems feasible. If I read the clFinish code correctly, the queue is synchronous, i.e., items in the queue are processed one at a time, and the host thread blocks during processing. Did you already think about asynchronous/non-blocking processing of the queue? This would be useful for computation and memory transfer overlap, and CPU+GPU or multi-GPU computation. Regards, Peter ------------------------------------------------------------------------------ Everyone hates slow websites. So do we. Make your web apps faster with AppDynamics Download AppDynamics Lite for free today: http://p.sf.net/sfu/appdyn_d2d_mar _______________________________________________ pocl-devel mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/pocl-devel
