I have spent the past week or so making pocl build on ARMv6, i.e. a Raspberry 
Pi. The basic problem I encountered is the following:

The target triple is insufficient in describing what code should be generated. 
It does not describe all the features that a CPU provides.

Some people call this the "hardware floating point problem", although this is 
something of a misnomer. Trying to support hardware floating point operations 
triggers this problem, but the problem is not intrinsically about hardware 
floating point operations.



This is already somewhat a problem e.g. on Intel, where the triple does not 
specify e.g. whether AVX instructions are available. On Intel, this may lead to 
sub-optimal code, which is not something that one would immediately observe -- 
one typically has to disassemble the generated kernel code, and most people 
don't do that.

On ARM, different CPU types offer vastly different features. For performance 
reasons, ARM offers several incompatible ABIs. Unfortunately, the target triple 
does not choose the ABI! The reason is somewhat indirect -- although the ABI is 
actually specified in the target triple, llvm will ignore this (!) unless one 
also specifies a CPU type that has sufficient features to use this ABI. 
Otherwise, llvm will generate code for a "basic" CPU, which may lack features, 
and will then (silently!) switch over to a different ABI. I would consider this 
a design bug in llvm, but that's what we have.

(In my case, host and target are the same, and the default target triple and 
CPU type that llc uses are already what I want. Still, since pocl explicitly 
specifies (the very same) target triple, llvm stops using the default CPU type 
and uses a more basic CPU type that does not actually support the ABI specified 
in this target triple...)



To remedy this, one needs to specify -march= or -mcpu= in various places. I 
have not yet determined the minimum set of such options that would lead to 
correct and/or efficient code, but at the very least, the llc invocation in 
devices/common.c seems to be affected. To add insult to injury, different llvm 
tools use different option names (why?) to specify target triple: the clang 
front-end uses -target, whereas llc uses -mtriple. Finding the list of 
available triples / CPU types / architecture attributes is also adventurous.

I think the following is necessary for pocl: When configuring a target, we 
should not look for a target triple, but should rather accept a set of options 
that may include CPU type, FPU type, ABI specification, and architecture 
attributes as well. This would likely also improve performance on other 
architectures, such as x86_64. We may need to ask for this information twice, 
since clang and llc expect it in different form, but maybe we can also 
translate this ourselves.

Of course, at the moment, configuring host and target is a bit of a mess, and 
it's not clear which configuration (environment) variable is used under what 
circumstances; a bit of clean up here would help.

-erik

-- 
Erik Schnetter <[email protected]>
http://www.perimeterinstitute.ca/personal/eschnetter/

My email is as private as my paper mail. I therefore support encrypting
and signing email messages. Get my PGP key from http://pgp.mit.edu/.

Attachment: signature.asc
Description: Message signed with OpenPGP using GPGMail

------------------------------------------------------------------------------
Learn the latest--Visual Studio 2012, SharePoint 2013, SQL 2012, more!
Discover the easy way to master current and previous Microsoft technologies
and advance your career. Get an incredible 1,500+ hours of step-by-step
tutorial videos with LearnDevNow. Subscribe today and save!
http://pubads.g.doubleclick.net/gampad/clk?id=58040911&iu=/4140/ostg.clktrk
_______________________________________________
pocl-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/pocl-devel

Reply via email to