Hi,

This has been prototyped with the initial AMD GPU work and
I've also played around with it locally, but not yet committed
anything related in the master repo. Patches to add the
required infra for "SPMD-optimized" hardware are warmly
welcomed -- it should not be much work.

I propose a following approach:

1) Add a new hook function to the driver layer that allows overriding
the work-group generation phase of the kernel compiler. Here you
can then skip the passes that generate the WG function (WorkitemHandlers),
just generate the single WI function. See pocl_cl.h's pocl_device_ops.
Override this new function in your target's device layer implementation.

2) Add new fields to the pocl_context struct that are passed at
run time to the function (additional argument added by Workgroup.cc)
that contains the local ids and local sizes (in case you do not have
some specific mechanism for getting the thread ids etc. in your
device). See include/pocl_device.h and how the basic/pthread
drivers populate these structs when they launch the WG functions.

3) Add overridden implementation of the get_local_id() for your
device's kernel built-in lib, that uses the above
struct indirectly via global variables that are "privatized"
in Workgroup.cc. Similarly to the other ids such as group_ids now.
It now refers to global variables in the kernel library and the
device driver populates a struct with them set. The kernel compiler
converts the magic global variables to point to the struct fields
in Workgroup.cc

HTH,
Pekka

On 02/05/2014 01:07 PM, Panagiotis Apostolou wrote:
> Hello.
>
> I'm trying to get pocl to work on a custom multicore processor.
>
> As far as i understand pocl expands all work items of a work group into
> a single binary (either by expanding the code or with loops) in the
> call_pocl_workgroup (or in pocl-workgroup script alternately) and then
> it is executed serially. This is done by setting
> pocl::LocalSize.addValue(local_x) and respectively for y and z (or with
> the opt's -local-size parameter in the script).
>
> My intention is to be able to override this functionality and avoid
> expanding the code and run a unique thread for each work item. My
> question is which is the cleanest way to do this, with minimal
> modification in the code. Any guidelines or hints would be appreciated.
>
> Thanks for your time,
> Panayiotis Apostolou.
>

------------------------------------------------------------------------------
Managing the Performance of Cloud-Based Applications
Take advantage of what the Cloud has to offer - Avoid Common Pitfalls.
Read the Whitepaper.
http://pubads.g.doubleclick.net/gampad/clk?id=121051231&iu=/4140/ostg.clktrk
_______________________________________________
pocl-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/pocl-devel

Reply via email to