Re: GPGPUs

luminousone Fri, 16 Aug 2013 13:00:58 -0700

The core (!) point here is that processor chips are rapidlybecoming acollection of heterogeneous cores. Any programming languagethat assumes
a single CPU or a collection of homogeneous CPUs has built-in
obsolescence.
So the question I am interested in is whether D is the languagethat canallow me to express in a single codebase a program in whichparts willbe executed on one or more GPGPUs and parts on multiple CPUs. Dhas
support for the latter, std.parallelism and std.concurrency.
I guess my question is whether people are interested instd.gpgpu (or
some more sane name).

CUDA, works as a preprocessor pass that generates c files from.cu extension files.

In effect, to create a sensible environment for microthreadedprogramming, they extend the language.


a basic CUDA function looking something like...

__global__ void add( float * a, float * b, float * c) {
   int i = threadIdx.x;
   c[i] = a[i] + b[i];
}

add<<< 1, 10 >>>( ptrA, ptrB, ptrC );

Their is the buildin variables to handle the index locationthreadIdx.x in the above example, this is something generated bythe thread scheduler in the video card/apu device.

Generally calls to this setup has a very high latency, so usingthis for a small handful of items as in the above example makesno sense. In the above example that would end up using a singleexecution cluster, and leave you prey to the latency of the pciebus, execution time, and latency costs of the video memory.

it doesn't get effective until you are working with large datasets, that can take advantage of a massive number of threadswhere the latency problems would be secondary to the sheercalculations done.

as far as D goes, we really only have one build in microthreadingcapable language construct, foreach.

However I don't think a library extension similar tostd.parallelism would work gpu based microthreading.

foreach would need to have something to tell the compiler togenerate gpu bytecode for the code block it uses, and would needinstructions on when to use said code block based on dataset size.

while it is completely possible to have very little change withfunction just add new property @microthreaded and the build invariables for the index position/s, the calling syntax would needchanges to support a work range or multidimensional range of somesort.


perhaps looking something like....

add$(1 .. 10)(ptrA,ptrB,ptrC);

a templated function looking similar

add!(float)$(1 .. 10)(ptrA,ptrB,ptrC);

Re: GPGPUs

Reply via email to