On Sunday, 26 February 2017 at 08:37:29 UTC, Nicholas Wilson
wrote:
DCompute is an extension to LDC capable of generating code
(with no language changes*) for NVIDIA's NVPTX for use with
CUDA, SPIRV for use with the OpenCL runtime, and of course the
host, all at the same time! It is also possible to share
implementation of algorithms across the host and device.
This will enable writing kernels in D utilising all of D's meta
programming goodness across the device divide and will allow
launching those kernels with a level of ease on par with CUDA's
<<<...>>> syntax. I hope to be giving a talk at DConf2017 about
this ;), what it enables us to do, what still needs to be done
and future plans.
DCompute supports all of OpenCL except Images and Pipes
(support is planned though).
I haven't done any test for CUDA so I'm not sure about the
extent of support for it, all of the math stuff works,
images/textures not so sure.
Many thanks to the ldc team (especially Johan) for their
guidance and patience, Ilya for reminding me that I should
upstream my work and John Colvin for his DConf2016 talk for
making me think 'surely compiler support can't be too hard'. 10
months later: here it is!
The DCompute compiler is available at the dcompute branch of
ldc [0], you will need my fork of llvm here[1] and the SPIRV
submodule that comes with it [2] as the llvm to link against.
There is also a tool for interconversion [3] (I've mucked up
the submodules a bit, sorry, just clone it into
'tools/llvm-spirv', it's not necessary anyway). The device
standard library and drivers (both WIP) are available here[4].
Please sent bug reports to their respective components,
although I'm sure I'll see them anyway regardless of where they
go.
[0]: https://github.com/ldc-developers/ldc/tree/dcompute
[1]: https://github.com/thewilsonator/llvm/tree/compute
[2]: https://github.com/thewilsonator/llvm-target-spirv
[3]: https://github.com/thewilsonator/llvm-tool-spirv
[4]: https://github.com/libmir/dcompute
* modulo one hack related to resolving intrinsics because there
is no static context (i.e. static if) for the device(s).
Basically a 'codegen time if'.
An simple example because I forgot.
```
@compute(CompileFor.deviceOnly) module example;
import ldc.attributes;
import ldc.dcomputetypes;
import dcompute.std.index;
@kernel void test(GlobalPointer!float a, GlobalPointer!float b)
{
auto idx = GlobalIndex.x;
a[idx] = a[idx] + b[idx];
}
```
then compile with `ldc -mdcompute-targets=ocl-220,cuda-500
example.d -I/path/to/dcompute`. It will produce two files,
kernels_ocl220_64.spv and kernels_cuda500_64.ptx when built in
64-bit mode and kernels_ocl220_32.spv and kernels_cuda500_32.ptx
in 32 bit mode.