On 08/07/2013 07:28 AM, Alun Evans wrote:
> i.e. I'm compiling pocl natively on x86_64, but trying to add a new
> device that is an ARM based platform.
>
> I've actually got a few more questions on that, but I think I'm making
> some progress. I just thought about checking whether this has been
> successfully attempted before?

Yes we use the basic heterogeneous setup all the time: the host
is a x86_64 Linux system and the device is something else.

If you want to make it run you need to write a device-layer
implementation that defines how to interact with your ARM device from
the host. See the earlier device layer implementations under
lib/CL/devices/. The current heterogeneous devices there are
the experimental cellspu and tce. The former offloads the kernels in
a Cell processor to SPUs and the latter offloads the kernels to
a simulator that simulates TTA-based accelerators designed using TCE.

There are a set of functions you need to implement for this, it
should be straightforward.

> So far I've managed to get a pocl binary (example1) to spit out some arm 
> .so's.

Then you need to get the work-group function to the device, run it, 
read/write buffers to it, etc. using the device layer implementation.

> Well infact the device is a bit space limited, so holding a a
> toolchain out there would have been a bit of a pain.

You do not need the toolchain in the device with the standalone setup.
Everything (the program + the kernels) will be (cross-)compiled
offline to a single binary. This is how it works in TCE now (currently
using its own host API stubs though): 
http://tce.cs.tut.fi/user_manual/TCE/node21.html

So, like Kalle wrote, in this setup you need to precompile the kernel
for the work group size you will need. An alternative would be to
create a kernel compiler mode which creates work-item loops with
variable iteration counts (the WG dimensions), but then fine-grained 
parallelization of multiple work-items (e.g., vectorization) gets more 
challenging. Yet another improvement would be to be able to compile
multiple work-group sizes of the kernel using the kernel compiler
without using the attribute.


-- 
--Pekka

------------------------------------------------------------------------------
Get 100% visibility into Java/.NET code with AppDynamics Lite!
It's a free troubleshooting tool designed for production.
Get down to code-level detail for bottlenecks, with <2% overhead. 
Download for free and get started troubleshooting in minutes. 
http://pubads.g.doubleclick.net/gampad/clk?id=48897031&iu=/4140/ostg.clktrk
_______________________________________________
pocl-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/pocl-devel

Reply via email to