David Garcia schrieb: > > This two restrictions put together mean that there's a significant > overhead associated with doing any brief computation on the GPU. You > need to consider the amount of data that is being transferred from the > CPU's RAM into the GPU's RAM and compare it with the time that the > computation itself is going to take. If all you are doing is doing a > component-wise vector addition, the cost of moving data around is going > to be greater than the cost of the actual ALU instructions, which is why > you are seeing some disappointing performance. >
David, I'm aware of the issues you mention, and I wasn't disappointed about the timings. I just took the benchmark case as given which is distributed with pyopencl; I didn't cook it up myself. Your comments seem to imply that using another benchmark case may be more informative. thanks for your feedback, sven _______________________________________________ PyOpenCL mailing list [email protected] http://host304.hostmonster.com/mailman/listinfo/pyopencl_tiker.net
