On 11.04.2012 0:31, Josh Klontz wrote:
IIRC, doesn't OpenCL support jit-ing ASCII source files? Then, there
wouldn't be a need for any language changes.

Correct, and that's the underlying power I'm proposing to
leverage.

IMO, writing OpenCL code involves (at least) the following
nuisances:
1) The kernel code needs to be written as a text string within
the native code base.
2) Various function calls to the OpenCL library need to be made
to manage the runtime, compile kernels, connect arguments to
kernels, execute the kernels, and retrieve the results.
3) If you want to build an application both with and without
OpenCL as the backend then you have to maintain two versions of
every algorithm, one as an OpenCL string and the other in the
native language of your program.

To me there seems to be a huge opportunity to obviate the above
issues and entice new developers to D via some careful
engineering at either the compiler or the standard library level
to support heterogeneous computing. Certainly technologies like
C++ AMP are a step in the right direction, but to my knowledge
there currently doesn't exist anything with the following
desirable principles:
1) Write the algorithm once, compile for both serial execution on
the CPU or massively parallel execution on an OpenCL enabled
device.
2) FOSS
3) Runs everywhere the underlying language runs.
4) The underlying language has a robust compiler, active and
growing community, solid standard library, elegant language
features, etc...

Perhaps I was wrong to suggest that this has to be solved at the
compiler level. The EPGPU library seems to tackle some of the
problems of mixing OpenCL kernels within C++, though the syntax
is far from ideal.

Thoughts?

From the looks of it this kind of stuff should be easy with tokenzied strings ( q{ code } )+ mixins + some "auto-magic" helpers being run for OpenCL behind the covers. The problematic part is checking that the fragment is using the correct subset of both languages.

Ideally API should work along the lines of this:

float[] arr1, arr2;
//init arr1 & arr2
assert(arr1.length == arr2.length);
length = arr1.length;
compute!q{
        for(int i=0;i<length; i++)
                arr1[i] += arr2[i];
}(arr1, arr2);

where compute works both with plain CPU and even without OpenCL (by simply mixin stuff in) and for OpenCL with a bit of extra binding magic inside compute template.

(compute is an eponymous template that alied to static function inside, that in turn is generated by mixin, for concrete example - take a look on how ctRegex template in std.regex does it)

Of course, there are some painful details when you go for deeper things and error messages but it should be perfectly doable in normal D even w/o say CTFE parser.


--
Dmitry Olshansky

Reply via email to