Hi Pekka,

thanks for the comprehensive introduction! :)

On 2/1/13 12:10 PM, Pekka Jääskeläinen wrote:
> AFAIU, in the pocl's point of view, WFV is a method to generate multi-WI
> work group functions out from the single work item kernel bitcodes
> produced by Clang.
>
> [ ... ]
>
> However, as I understood your WFV is a complete work group generation
> solution that also detects the parallel regions during the process
> etc., it can skip most of the default pocl-workgroup's optimization
> pass list and replace it with its own (unless you want to modularize and
> share code with the other methods). So, maybe the best way
> is to define a new method, e.g. 'wfv'. Then, in pocl-workgroup,
> if wfv is selected, you can then call your optimization passes,
> or, e.g., a pass that wraps calls to your library.

Your description is fairly close :).
What libWFV really only does is create a vector variant of a given 
function, there's nothing OpenCL-specific in there. In fact, we started 
off applying the technique to RenderMan shaders called from a realtime 
ray tracer.

The way the our own OpenCL driver works is the following:
1. An old frontend of AMD creates LLVM IR from the .cl file.
    The code still contains calls to get_global_id(), barrier(), etc.
2. libWFV transforms the kernel such that it works on a vector of the
    next W consecutive local ids of dimension 0 (*).
    The code still contains calls to get_global_id(), barrier(), etc.
3. a) If there are no barriers,
       we create a wrapper for the kernel which contains loops over the
       local ids (probably the way you also do it). Inside that loop
       nest, there's a single call to the old kernel that we inline.
    b) If there are barriers,
       we create functions for every code block between barriers and call
       them from a "trampoline", a switch statement inside a while loop,
       where each case loops over the local ids and calls one of the
       "continuations", which returns the ID of the next case to execute.
       This was described in more detail in our CC'12 paper.
    The code does not contain any barrier() calls anymore now.
    Note that with enabled WFV, one of the generated loops (*) has a loop
    increment of W instead of 1.
4. We replace calls to get_local_id() etc. by accesses to the induction
    variables (similar to pocl I guess).

(*) The dimension is up to some currently non-existent heuristic that 
could decide that dimension 1 or 2 would be better for vectorization.

Obviously, this is only possible if the number of local work items is 
larger than the vectorization factor W and a multiple of it, but this is 
the case for all relevant applications that I have seen so far.

As you can see, most of the transformations are done by the OpenCL 
driver, only part 2) is where libWFV is used.
So, I think it should be possible to just modify your current pipeline 
at two specific points: First, we have to invoke WFV on the kernel at a 
pretty early stage (basically directly after the frontend). Then, when 
creating the local work item loops, make the one over dimension 0 
increment by 4 every time, and inline the vectorized kernel instead of 
the original one (unless vectorization failed, of course ;) ).


> Hopefully this will get you started. I'll be happy to answer any
> further questions, here in the list or in #pocl.
>
> In practical terms, you could push a branch to https://code.launchpad.net/pocl
> which we can then merge to trunk after reaching some level of
> stability.
>
> The test suite executed with 'make check' contains regression tests
> and can also run some external OpenCL projects automatically.

This all sounds great :).

Cheers,
Ralf

------------------------------------------------------------------------------
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://p.sf.net/sfu/appdyn_d2d_jan
_______________________________________________
pocl-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/pocl-devel

Reply via email to