On Tue, May 19, 2015 at 11:30 AM, Ronie Salgado <ronies...@gmail.com> wrote: > Hi all, > > Sorry for answering very late, I am busy in ICSE where tomorrow I have to > defend my poster for the Student Research Competition. The topic is > profiling over the OpenCL bindings.
No problem Ronie. Thank you for your reply. Do you have a copy of your poster somewhere ? >> We are looking at the code of OpenCL and VirtualGPU done by Ronnie. >> What we have understand until now : >> - OpenCL package : low-level stuff to be able to interface OpenCL >> kernels with Pharo >> - VirtualGPU: high-level API on top of OpenCL in order to ease the >> task of people who wants to use OpenCL. VirtualGPU provide high-level >> operations on matrix and image at the moment. > > This is correct. > >> @Ronie: What is not clear at the moment in our mind : when you build a >> VirtualGPU program with the DSL, do you have the overhead of >> communications every time you execute a VirtualGPU instruction or all >> the the instructions are sent at the same time and run on the GPU ? > > The DSL, is actually an abstraction over the OpenCL API. Each operation, is > stored in a simple intermediate representation, that is used to call a > single OpenCL kernel. > > There is no overhead in terms of memory transfers between intermediate > operations, because there kept in the GPU the whole. There is an overhead in > terms of kernel dispatching. For example, the expression a + b * 0.5 in the > VGPU DSL is interpreted as the following pseudo code: > > temp := opencl invokeKernel: 'add' a with: b. > temp2 := opencl invokeKernel: 'mulScalar' temp with: 0.5. > > A custom crafted code would do something like this: > > temp := opencl invokeKernel: 'addAndMulScalar' with: a with: b with: 0.5 > > The VGPU does not do the latter for simplicity. Currently, it does not > generate any kind of OpenCL C code. It works by composing simple functions. > Perhaps, in the future I will add a code generation step for optimization. Ok, I understand. What will be the benefit in term of speed if you doing code generation regarding the current version ? > Another problem, is the proliferation of intermediate buffers. There are > some samples that avoid using intermediate buffers by using add:into: > instead of +. The into buffer is just where the result is going to be > placed. Look into VirtualGPUSamples >> #imageChangedForGradient, > VirtualGPUSamples >> #imageChangedForGradientOptimized , VirtualGPUSamples >>> #imageChangedForGradientOptimizedMore . > > >> >> In our context, for building a GSSA algorithm, I guess we just have to >> combine same VGPU instructions (matrix computations) but for doing SPH >> simulations, we will have to provide our own instructions. Is there >> any documentation in order to add own kernel and instructions ? > > > There is not documentation. We should have some tool like doxygen so that I > can write the documentation when I am writing the methods. > > Anyway, I will document here for now. Great ;-) Can we start a Pillar chapter in https://github.com/SquareBracketAssociates/PharoLimbo > First of all, look at the existing kernels. For that, put this in a > playground and do it (you need the GTInspector): > > EmbObjectBrowser openBrowser. > > That will open a browser that I use to edit the OpenCL C code. It does have > some bugs, but it is better than editting a huge string in a smalltalk > method. Lets look in VGPULinearAlgebraSources. There you will see the > 'kernels' category and inside of it two methods: #matrixKernels' and > #vectorKernels . If you look #vectorKernels , you will see just the OpenCL C > code. > > If you now go to Nautilus, and look the VGPULinearAlgebraSources class, you > will see that it is a subclass of EmbObjectContainer. vectorKernels and > matrixKernels are Smalltalk methods. vectorKernels looks like this: > > vectorKernels > <embeddedObject> > ^ ' > // Vector binary operations > __kernel void floatVector_add(__global float *left, __global float *right, > __global float *result) > ... > ' > > For an example, of actually invoking the kernel, you should look at the > following methods: > VGPUFloatMatrix >> #abs > VGPUFloatMatrix >> #absInto > VGPUFloatMatrix >> #discreteGradient > VGPUFloatMatrix >> #discreteGradientInto > > As for the OpenCL package, it just provides bindings for the C OpenCL API. > You can also use it if you want, but your are on your own :) . I already understand part of that. Thank you for help. Regards, -- Serge Stinckwich UCBN & UMI UMMISCO 209 (IRD/UPMC) Every DSL ends up being Smalltalk http://www.doesnotunderstand.org/