I'm pleased to announce that I have run the first dcompute kernel and it was a success!

There is still a fair bit of polish to the driver needed to make the API sane and more complete, not to mention more similar to the (untested) OpenCL driver API. But it works!
(Contributions are of course greatly welcomed)

The kernel:
```
@compute(CompileFor.deviceOnly)
module dcompute.tests.dummykernels;

import ldc.dcompute;
import dcompute.std.index;

@kernel void saxpy(GlobalPointer!(float) res,
                   float alpha,GlobalPointer!(float) x,
                   GlobalPointer!(float) y,
                   size_t N)
{
    auto i = GlobalIndex.x;
    if (i >= N) return;
    res[i] = alpha*x[i] + y[i];
}
```

The host code:
```
import dcompute.driver.cuda;
import dcompute.tests.dummykernels : saxpy;

Platform.initialise();

auto devs   = Platform.getDevices(theAllocator);
auto ctx    = Context(devs[0]); scope(exit) ctx.detach();

// Change the file to match your GPU.
Program.globalProgram = Program.fromFile("./.dub/obj/kernels_cuda210_64.ptx");
auto q = Queue(false);

enum size_t N = 128;
float alpha = 5.0;
float[N] res, x,y;
foreach (i; 0 .. N)
{
    x[i] = N - i;
    y[i] = i * i;
}
Buffer!(float) b_res, b_x, b_y;
b_res      =  Buffer!(float)(res[]); scope(exit) b_res.release();
b_x        =  Buffer!(float)(x[]);   scope(exit) b_x.release();
b_y        =  Buffer!(float)(y[]);   scope(exit) b_y.release();

b_x.copy!(Copy.hostToDevice); // not quite sold on this interface yet.
b_y.copy!(Copy.hostToDevice);

q.enqueue!(saxpy)  // <-- the main magic happens here
    ([N,1,1],[1,1,1])   // the grid
    (b_res,alpha,b_x,b_y, N); // the kernel arguments

b_res.copy!(Copy.deviceToHost);
foreach(i; 0 .. N)
    enforce(res[i] == alpha * x[i] + y[i]);
writeln(res[]); // [640, 636, ... 16134]
```

Simple as that!

Dcompute, as always, is at https://github.com/libmir/dcompute and on dub.

To successfully run the dcompute CUDA test you will need a very recent LDC (less than two days) with the NVPTX backend* enabled along with a CUDA environment and an Nvidia GPU.

*Or wait for LDC 1.4 release real soon(™).

Thanks to the LDC folks for putting up with me ;)

Have fun GPU programming,
Nic

Reply via email to