David Garcia schrieb:

> 
> This two restrictions put together mean that there's a significant
> overhead associated with doing any brief computation on the GPU. You
> need to consider the amount of data that is being transferred from the
> CPU's RAM into the GPU's RAM and compare it with the time that the
> computation itself is going to take. If all you are doing is doing a
> component-wise vector addition, the cost of moving data around is going
> to be greater than the cost of the actual ALU instructions, which is why
> you are seeing some disappointing performance.
> 

David, I'm aware of the issues you mention, and I wasn't disappointed
about the timings. I just took the benchmark case as given which is
distributed with pyopencl; I didn't cook it up myself. Your comments
seem to imply that using another benchmark case may be more informative.

thanks for your feedback,
sven


_______________________________________________
PyOpenCL mailing list
[email protected]
http://host304.hostmonster.com/mailman/listinfo/pyopencl_tiker.net

Reply via email to