On Tue, May 19, 2015 at 11:30 AM, Ronie Salgado <ronies...@gmail.com> wrote:
> Hi all,
>
> Sorry for answering very late, I am busy in ICSE where tomorrow I have to
> defend my poster for the Student Research Competition. The topic is
> profiling over the OpenCL bindings.

No problem Ronie. Thank you for your reply.

Do you have a copy of your poster somewhere ?

>> We are looking at the code of OpenCL and VirtualGPU done by Ronnie.
>> What we have understand until now :
>> - OpenCL package : low-level stuff to be able to interface OpenCL
>> kernels with Pharo
>> - VirtualGPU: high-level API on top of OpenCL in order to ease the
>> task of people who wants to use OpenCL. VirtualGPU provide high-level
>> operations on matrix and image at the moment.
>
> This is correct.
>
>> @Ronie: What is not clear at the moment in our mind : when you build a
>> VirtualGPU program with the DSL, do you have the overhead of
>> communications every time you execute a VirtualGPU instruction or all
>> the the instructions are sent at the same time and run on the GPU ?
>
> The DSL, is actually an abstraction over the OpenCL API. Each operation, is
> stored in a simple intermediate representation, that is used to call a
> single OpenCL kernel.
>
> There is no overhead in terms of memory transfers between intermediate
> operations, because there kept in the GPU the whole. There is an overhead in
> terms of kernel dispatching. For example, the expression a + b * 0.5  in the
> VGPU DSL is interpreted as the following pseudo code:
>
>  temp := opencl invokeKernel: 'add' a with: b.
>  temp2 := opencl invokeKernel: 'mulScalar' temp with: 0.5.
>
> A custom crafted code would do something like this:
>
> temp := opencl invokeKernel: 'addAndMulScalar' with: a with: b with: 0.5
>
> The VGPU does not do the latter for simplicity. Currently, it does not
> generate any kind of OpenCL C code. It works by composing simple functions.
> Perhaps, in the future I will add a code generation step for optimization.

Ok, I understand. What will be the benefit in term of speed if you
doing code generation regarding
the current version ?

> Another problem, is the proliferation of intermediate buffers. There are
> some samples that avoid using intermediate buffers by using add:into:
> instead of +. The into buffer is just where the result is going to be
> placed. Look into VirtualGPUSamples >> #imageChangedForGradient,
> VirtualGPUSamples >> #imageChangedForGradientOptimized , VirtualGPUSamples
>>> #imageChangedForGradientOptimizedMore .
>
>
>>
>> In our context, for building a GSSA algorithm, I guess we just have to
>> combine same VGPU instructions (matrix computations) but for doing SPH
>> simulations, we will have to provide our own instructions. Is there
>> any documentation in order to add own kernel and instructions ?
>
>
> There is not documentation. We should have some tool like doxygen so that I
> can write the documentation when I am writing the methods.
>
> Anyway, I will document here for now.

Great ;-)

Can we start a Pillar chapter in
https://github.com/SquareBracketAssociates/PharoLimbo

> First of all, look at the existing kernels. For that, put this in a
> playground and do it (you need the GTInspector):
>
> EmbObjectBrowser openBrowser.
>
> That will open a browser that I use to edit the OpenCL C code. It does have
> some bugs, but it is better than editting a huge string in a smalltalk
> method. Lets look in VGPULinearAlgebraSources. There you will see the
> 'kernels' category and inside of it two methods: #matrixKernels' and
> #vectorKernels . If you look #vectorKernels , you will see just the OpenCL C
> code.
>
> If you now go to Nautilus, and look the VGPULinearAlgebraSources class, you
> will see that it is a subclass of EmbObjectContainer. vectorKernels and
> matrixKernels are Smalltalk methods. vectorKernels looks like this:
>
> vectorKernels
>     <embeddedObject>
>     ^ '
> // Vector binary operations
> __kernel void floatVector_add(__global float *left, __global float *right,
> __global float *result)
> ...
> '
>
> For an example, of actually invoking the kernel, you should look at the
> following methods:
> VGPUFloatMatrix >> #abs
> VGPUFloatMatrix >> #absInto
> VGPUFloatMatrix >> #discreteGradient
> VGPUFloatMatrix >> #discreteGradientInto
>
> As for the OpenCL package, it just provides bindings for the C OpenCL API.
> You can also use it if you want, but your are on your own :) .

I already understand part of that.

Thank you for help.
Regards,

-- 
Serge Stinckwich
UCBN & UMI UMMISCO 209 (IRD/UPMC)
Every DSL ends up being Smalltalk
http://www.doesnotunderstand.org/

Reply via email to