On 2013-09-08, at 14:43 , Kalle Raiskila <[email protected]> wrote: > Den Sun, 8 Sep 2013 14:15:54 -0400 > skrev Re: [pocl-devel] Triples, targets, cpus, and features... and HOST > vs. TARGET: > >> On 2013-09-08, at 14:02 , Kalle Raiskila <[email protected]> wrote: >> >>> Den Sun, 08 Sep 2013 12:20:18 +0300 >>> skrev Re: [pocl-devel] Triples, targets, cpus, and features... and >>> HOST vs. TARGET: >>> >>>> If you think of binary distributions of pocl > >>> How about the runtime kernel library, then? There is some "#ifdef >>> SSE"-like stuff there already, and we compile this into .bc before >>> we know the the exact processor that we are going to run on. >> >> >> These #ifdefs are set by the compiler, depending on the options that >> one passes to the compiler. They do not depend on the host, but on >> the target. In other words, we're good here. > > But in the context of binary distributions, the kernel library is > compiled before we know what the exact flavour of the target CPU is.
In this case, we need to make the same (conservative) assumptions as all the binaries in this distribution. They are probably setting the CPU type to i686, essentially disabling most performance features. Alternatively, we can build and distribute multiple kernel libraries. Many performance-critical libraries do this; I expect e.g. VLC to have several back-ends available that are dynamically selected at run time, depending on the CPU type. >> However, if the >> target is the host, then we simply re-use the information we >> gathered. This allows e.g. detecting AVX, SSE, hardware >> floating-point etc. automatically, without having to detect this at >> run time. > > Thinking of this... I have nowhere ever specified the type of my CPU to > my OS in any greater detail than "x86_64". Clearly there exist some > "autodetection magic" *somewhere*. Perhaps we can reutilize that code > in runtime detection of the platform. Yes, LLVM has such magic built in. If you type e.g. llc --version | grep 'Host CPU' then this reports the current host CPU (i.e. it examines the hardware), from which its attributes can be inferred (also automatically done by LLVM). Unfortunately, this "automatically" is not well documented, and this also depends on the target triple, and unless you pass exactly the same flags when using clang/llvm, you can end up with inconsistent libraries. This is what happened to me in the past week. I am now using this "llc --version | grep 'Host CPU'" to determine the host CPU in configure.ac, and ensure that exactly this CPU type is used for all kernel build commands. We should be able to do this at run time as well. However, as I mentioned before, the kernel library needs to be built with the same flags. > Falling back to "host" should always be safe, but when pocl is gotten > from a distro, that CPU might be quite old. This problem would not > exist when pocl is compiled from sources, as the compiler/OS would then > have detected (hopefully) the exact CPU version that is the "host". Yes. This is probably not a problem for most applications, but if a kernel is CPU bound, then knowing the exact CPU type seems important. >> This needs to be detected before the kernel library is built, i.e. at >> the time pocl is configured and built. Bytecode is not generic enough >> to handle different architectures; e.g. type widths, calling >> conventions, and name mangling depend on the target, and these are >> present in bytecode already. > > Separate architectures would certainly require separate kernel > libraries. But can we keep the fine-tuning withing one architecture > (e.g. the SSE version) generic within the kernel lib? Yes, we can use the same kernel library for all x86-64 architectures. However, this would require disabling many performance features. > If not, we need one kernel per x86 generation and ARM version. > Or we need to accept a performance hit in the binary distributed > library. The command llvm-as < /dev/null | llc -mcpu=help displays all "attributes" of the current CPU. There are almost 50 of these for x86-64. So, doing this "for every x86-64 generation" is not possible. But we could select a few important combinations of attributes and pre-build kernel libraries for these. I suspect that five versions or so should suffice, since all people (like me) who want to squeeze the last percent of performance out of pocl can still build from source. I think a large performance hit (such as disabling AVX support) would not be acceptable. -erik -- Erik Schnetter <[email protected]> http://www.perimeterinstitute.ca/personal/eschnetter/ My email is as private as my paper mail. I therefore support encrypting and signing email messages. Get my PGP key from http://pgp.mit.edu/.
signature.asc
Description: Message signed with OpenPGP using GPGMail
------------------------------------------------------------------------------ Learn the latest--Visual Studio 2012, SharePoint 2013, SQL 2012, more! Discover the easy way to master current and previous Microsoft technologies and advance your career. Get an incredible 1,500+ hours of step-by-step tutorial videos with LearnDevNow. Subscribe today and save! http://pubads.g.doubleclick.net/gampad/clk?id=58041391&iu=/4140/ostg.clktrk
_______________________________________________ pocl-devel mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/pocl-devel
