On Friday, 16 August 2013 at 20:07:32 UTC, John Colvin wrote:
We have a[] = b[] * c[] - 5; etc. which could work very neatly perhaps?
While this in fact could work, given the nature of GPGPU it would not be very effective. In a non shared memory and non cache coherent setup, the entirety of all 3 arrays have to be copied into GPU memory, had that statement ran as gpu bytecode, and then copied back to complete the operation. GPGPU doesn't make sense on small code blocks, both in instruction count, and by how memory bound a particular statement would be. The compiler needs to either be explicitly told what can/should be ran as a GPU function, or have some intelligence about what to or not to run as a GPU function. This will get better in the future, APU's using the full HSA implementation will drastically reduce the "buyin" latency/cycle cost of using a GPGPU function, and make them more practical for smaller(in instruction count/memory boundess) operations.