Re: D and Heterogeneous Computing

2012-04-10 Thread Josh Klontz
IIRC, doesn't OpenCL support jit-ing ASCII source files? Then, 
there wouldn't be a need for any language changes.


Correct, and that's the underlying power I'm proposing to
leverage.

IMO, writing OpenCL code involves (at least) the following
nuisances:
1) The kernel code needs to be written as a text string within
the native code base.
2) Various function calls to the OpenCL library need to be made
to manage the runtime, compile kernels, connect arguments to
kernels, execute the kernels, and retrieve the results.
3) If you want to build an application both with and without
OpenCL as the backend then you have to maintain two versions of
every algorithm, one as an OpenCL string and the other in the
native language of your program.

To me there seems to be a huge opportunity to obviate the above
issues and entice new developers to D via some careful
engineering at either the compiler or the standard library level
to support heterogeneous computing. Certainly technologies like
C++ AMP are a step in the right direction, but to my knowledge
there currently doesn't exist anything with the following
desirable principles:
1) Write the algorithm once, compile for both serial execution on
the CPU or massively parallel execution on an OpenCL enabled
device.
2) FOSS
3) Runs everywhere the underlying language runs.
4) The underlying language has a robust compiler, active and
growing community, solid standard library, elegant language
features, etc...

Perhaps I was wrong to suggest that this has to be solved at the
compiler level. The EPGPU library seems to tackle some of the
problems of mixing OpenCL kernels within C++, though the syntax
is far from ideal.

Thoughts?


Re: D and Heterogeneous Computing

2012-04-10 Thread Dmitry Olshansky

On 11.04.2012 0:31, Josh Klontz wrote:

IIRC, doesn't OpenCL support jit-ing ASCII source files? Then, there
wouldn't be a need for any language changes.


Correct, and that's the underlying power I'm proposing to
leverage.

IMO, writing OpenCL code involves (at least) the following
nuisances:
1) The kernel code needs to be written as a text string within
the native code base.
2) Various function calls to the OpenCL library need to be made
to manage the runtime, compile kernels, connect arguments to
kernels, execute the kernels, and retrieve the results.
3) If you want to build an application both with and without
OpenCL as the backend then you have to maintain two versions of
every algorithm, one as an OpenCL string and the other in the
native language of your program.

To me there seems to be a huge opportunity to obviate the above
issues and entice new developers to D via some careful
engineering at either the compiler or the standard library level
to support heterogeneous computing. Certainly technologies like
C++ AMP are a step in the right direction, but to my knowledge
there currently doesn't exist anything with the following
desirable principles:
1) Write the algorithm once, compile for both serial execution on
the CPU or massively parallel execution on an OpenCL enabled
device.
2) FOSS
3) Runs everywhere the underlying language runs.
4) The underlying language has a robust compiler, active and
growing community, solid standard library, elegant language
features, etc...

Perhaps I was wrong to suggest that this has to be solved at the
compiler level. The EPGPU library seems to tackle some of the
problems of mixing OpenCL kernels within C++, though the syntax
is far from ideal.

Thoughts?


From the looks of it this kind of stuff should be easy with tokenzied 
strings ( q{ code } )+ mixins + some auto-magic helpers being run for 
OpenCL behind the covers. The problematic part is checking that the 
fragment is using the correct subset of both languages.


Ideally API should work along the lines of this:

float[] arr1, arr2;
//init arr1  arr2
assert(arr1.length == arr2.length);
length = arr1.length;
compute!q{
for(int i=0;ilength; i++)
arr1[i] += arr2[i];
}(arr1, arr2);

where compute works both with plain CPU and even without OpenCL (by 
simply mixin stuff in) and for OpenCL with a bit of extra binding magic 
inside compute template.


(compute is an eponymous template that alied to static function inside, 
that in turn is generated by mixin, for concrete example - take a look 
on how ctRegex template in std.regex does it)


Of course, there are some painful details when you go for deeper things 
and error messages but it should be perfectly doable in normal D even 
w/o say CTFE parser.



--
Dmitry Olshansky


Re: D and Heterogeneous Computing

2012-04-10 Thread Josh Klontz
From the looks of it this kind of stuff should be easy with 
tokenzied strings ( q{ code } )+ mixins + some auto-magic 
helpers being run for OpenCL behind the covers. The problematic 
part is checking that the fragment is using the correct subset 
of both languages.


Ideally API should work along the lines of this:

float[] arr1, arr2;
//init arr1  arr2
assert(arr1.length == arr2.length);
length = arr1.length;
compute!q{
for(int i=0;ilength; i++)
arr1[i] += arr2[i];
}(arr1, arr2);

where compute works both with plain CPU and even without OpenCL 
(by simply mixin stuff in) and for OpenCL with a bit of extra 
binding magic inside compute template.


(compute is an eponymous template that alied to static function 
inside, that in turn is generated by mixin, for concrete 
example - take a look on how ctRegex template in std.regex does 
it)


Of course, there are some painful details when you go for 
deeper things and error messages but it should be perfectly 
doable in normal D even w/o say CTFE parser.


Awesome, thanks! Will chew on this for a while :)


Re: D and Heterogeneous Computing

2012-04-10 Thread proxy



Awesome, thanks! Will chew on this for a while :)


Looking forward to it!! :)



Re: D and Heterogeneous Computing

2012-04-09 Thread Robert Jacques

On Sun, 08 Apr 2012 21:49:48 -0500, Josh Klontz josh.klo...@gmail.com wrote:


On Saturday, 7 April 2012 at 18:47:21 UTC, Robert Jacques wrote:

On Sat, 07 Apr 2012 11:38:15 -0500, Josh Klontz
josh.klo...@gmail.com wrote:


Greetings! As someone with a research interest in software
abstractions for image processing, the D programming language
appears to offer unsurpassed language features for constructing
beautiful and efficient programs. With that said, what would
really get me to abandon C++ is if D supported a heterogenous
programming model.

My personal inclination would be something closer to OpenACC
than
anything else I've seen available. Though only in the sense
that
I like the idea of writing code once and being able to
compile/run/debug it with or without automatic
vectorization/kernelization. Presumably we could achieve more
elegant syntax with tighter integration into the language. Has
anyone been working on anything like this? Is this something
the
community would be interested in seeing? What should the
solution
look like?

One path forward could be a patch to the compiler to generate
and
execute OpenCL kernels for appropriately marked-up D code.
While
I'm new the the D language, I'd be happy to work on a proof of
concept of this if it is something the community thinks would
be
valuable and I could get specific feedback about the right way
to
approach it.



I've been using D with CUDA via a high-level wrapper around the
driver API. It works very nicely, but it doesn't address the
language integration issues. Might I recommend looking into
hooking up LDC to the PTX LLVM back-end. That would seem much
faster than writing your own back-end.


Yes, I certainly don't want to be in the business of writing
back-ends. Another idea that came to mind recently was
implementing a keyword similar in spirit to asm:

opencl {
  // Valid opencl code here
}

And have the compiler automatically handle memory copying of D
variables referenced in the kernel code. Would be entirely
back-end independent and perhaps pleasant to implement?



IIRC, doesn't OpenCL support jit-ing ASCII source files? Then, there wouldn't 
be a need for any language changes.


Re: D and Heterogeneous Computing

2012-04-08 Thread Josh Klontz

On Saturday, 7 April 2012 at 18:47:21 UTC, Robert Jacques wrote:
On Sat, 07 Apr 2012 11:38:15 -0500, Josh Klontz 
josh.klo...@gmail.com wrote:



Greetings! As someone with a research interest in software
abstractions for image processing, the D programming language
appears to offer unsurpassed language features for constructing
beautiful and efficient programs. With that said, what would
really get me to abandon C++ is if D supported a heterogenous
programming model.

My personal inclination would be something closer to OpenACC 
than
anything else I've seen available. Though only in the sense 
that

I like the idea of writing code once and being able to
compile/run/debug it with or without automatic
vectorization/kernelization. Presumably we could achieve more
elegant syntax with tighter integration into the language. Has
anyone been working on anything like this? Is this something 
the
community would be interested in seeing? What should the 
solution

look like?

One path forward could be a patch to the compiler to generate 
and
execute OpenCL kernels for appropriately marked-up D code. 
While

I'm new the the D language, I'd be happy to work on a proof of
concept of this if it is something the community thinks would 
be
valuable and I could get specific feedback about the right way 
to

approach it.



I've been using D with CUDA via a high-level wrapper around the 
driver API. It works very nicely, but it doesn't address the 
language integration issues. Might I recommend looking into 
hooking up LDC to the PTX LLVM back-end. That would seem much 
faster than writing your own back-end.


Yes, I certainly don't want to be in the business of writing 
back-ends. Another idea that came to mind recently was 
implementing a keyword similar in spirit to asm:


opencl {
 // Valid opencl code here
}

And have the compiler automatically handle memory copying of D 
variables referenced in the kernel code. Would be entirely 
back-end independent and perhaps pleasant to implement?


Re: D and Heterogeneous Computing

2012-04-08 Thread Dmitry Olshansky

On 09.04.2012 6:49, Josh Klontz wrote:

On Saturday, 7 April 2012 at 18:47:21 UTC, Robert Jacques wrote:

On Sat, 07 Apr 2012 11:38:15 -0500, Josh Klontz
josh.klo...@gmail.com wrote:


Greetings! As someone with a research interest in software
abstractions for image processing, the D programming language
appears to offer unsurpassed language features for constructing
beautiful and efficient programs. With that said, what would
really get me to abandon C++ is if D supported a heterogenous
programming model.

My personal inclination would be something closer to OpenACC than
anything else I've seen available. Though only in the sense that
I like the idea of writing code once and being able to
compile/run/debug it with or without automatic
vectorization/kernelization. Presumably we could achieve more
elegant syntax with tighter integration into the language. Has
anyone been working on anything like this? Is this something the
community would be interested in seeing? What should the solution
look like?

One path forward could be a patch to the compiler to generate and
execute OpenCL kernels for appropriately marked-up D code. While
I'm new the the D language, I'd be happy to work on a proof of
concept of this if it is something the community thinks would be
valuable and I could get specific feedback about the right way to
approach it.



I've been using D with CUDA via a high-level wrapper around the driver
API. It works very nicely, but it doesn't address the language
integration issues. Might I recommend looking into hooking up LDC to
the PTX LLVM back-end. That would seem much faster than writing your
own back-end.


Yes, I certainly don't want to be in the business of writing back-ends.
Another idea that came to mind recently was implementing a keyword
similar in spirit to asm:

opencl {
// Valid opencl code here
}

And have the compiler automatically handle memory copying of D variables
referenced in the kernel code. Would be entirely back-end independent
and perhaps pleasant to implement?


Take a look at C++ AMP it's almost exactly this thing added to Visual 
C++ (but of course for now it's DirectCompute):

http://msdn.microsoft.com/en-us/library/hh265136(v=vs.110).aspx

--
Dmitry Olshansky


D and Heterogeneous Computing

2012-04-07 Thread Josh Klontz
Greetings! As someone with a research interest in software 
abstractions for image processing, the D programming language 
appears to offer unsurpassed language features for constructing 
beautiful and efficient programs. With that said, what would 
really get me to abandon C++ is if D supported a heterogenous 
programming model.


My personal inclination would be something closer to OpenACC than 
anything else I've seen available. Though only in the sense that 
I like the idea of writing code once and being able to 
compile/run/debug it with or without automatic 
vectorization/kernelization. Presumably we could achieve more 
elegant syntax with tighter integration into the language. Has 
anyone been working on anything like this? Is this something the 
community would be interested in seeing? What should the solution 
look like?


One path forward could be a patch to the compiler to generate and 
execute OpenCL kernels for appropriately marked-up D code. While 
I'm new the the D language, I'd be happy to work on a proof of 
concept of this if it is something the community thinks would be 
valuable and I could get specific feedback about the right way to 
approach it.


Re: D and Heterogeneous Computing

2012-04-07 Thread Robert Jacques

On Sat, 07 Apr 2012 11:38:15 -0500, Josh Klontz josh.klo...@gmail.com wrote:


Greetings! As someone with a research interest in software
abstractions for image processing, the D programming language
appears to offer unsurpassed language features for constructing
beautiful and efficient programs. With that said, what would
really get me to abandon C++ is if D supported a heterogenous
programming model.

My personal inclination would be something closer to OpenACC than
anything else I've seen available. Though only in the sense that
I like the idea of writing code once and being able to
compile/run/debug it with or without automatic
vectorization/kernelization. Presumably we could achieve more
elegant syntax with tighter integration into the language. Has
anyone been working on anything like this? Is this something the
community would be interested in seeing? What should the solution
look like?

One path forward could be a patch to the compiler to generate and
execute OpenCL kernels for appropriately marked-up D code. While
I'm new the the D language, I'd be happy to work on a proof of
concept of this if it is something the community thinks would be
valuable and I could get specific feedback about the right way to
approach it.



I've been using D with CUDA via a high-level wrapper around the driver API. It 
works very nicely, but it doesn't address the language integration issues. 
Might I recommend looking into hooking up LDC to the PTX LLVM back-end. That 
would seem much faster than writing your own back-end.