The core (!) point here is that processor chips are rapidly becoming a collection of heterogeneous cores. Any programming language that assumes
a single CPU or a collection of homogeneous CPUs has built-in
obsolescence.

So the question I am interested in is whether D is the language that can allow me to express in a single codebase a program in which parts will be executed on one or more GPGPUs and parts on multiple CPUs. D has
support for the latter, std.parallelism and std.concurrency.

I guess my question is whether people are interested in std.gpgpu (or
some more sane name).

CUDA, works as a preprocessor pass that generates c files from .cu extension files.

In effect, to create a sensible environment for microthreaded programming, they extend the language.

a basic CUDA function looking something like...

__global__ void add( float * a, float * b, float * c) {
   int i = threadIdx.x;
   c[i] = a[i] + b[i];
}

add<<< 1, 10 >>>( ptrA, ptrB, ptrC );

Their is the buildin variables to handle the index location threadIdx.x in the above example, this is something generated by the thread scheduler in the video card/apu device.

Generally calls to this setup has a very high latency, so using this for a small handful of items as in the above example makes no sense. In the above example that would end up using a single execution cluster, and leave you prey to the latency of the pcie bus, execution time, and latency costs of the video memory.

it doesn't get effective until you are working with large data sets, that can take advantage of a massive number of threads where the latency problems would be secondary to the sheer calculations done.

as far as D goes, we really only have one build in microthreading capable language construct, foreach.

However I don't think a library extension similar to std.parallelism would work gpu based microthreading.

foreach would need to have something to tell the compiler to generate gpu bytecode for the code block it uses, and would need instructions on when to use said code block based on dataset size.

while it is completely possible to have very little change with function just add new property @microthreaded and the build in variables for the index position/s, the calling syntax would need changes to support a work range or multidimensional range of some sort.

perhaps looking something like....

add$(1 .. 10)(ptrA,ptrB,ptrC);

a templated function looking similar

add!(float)$(1 .. 10)(ptrA,ptrB,ptrC);

Reply via email to