Hi, Summarizing the discussion around the issues regarding the bitcode libs for the OpenCL C builtin library. The discussion was mostly taking place in the pull request: https://github.com/pocl/pocl/pull/12#issuecomment-25010864 and the IRC channel.
At least these problems with the current approach of shipping pre-built bitcode libs have been identified: The LLVM bitcode libs are dependent on the target flags in use. In particular, the different instruction set extensions such as vector extensions affect the bitcode. They affect the calling convention, and to access the extensions one needs to use target-specific LLVM builtin calls which are not portable to variants of the same CPU family that do not support the extension. In practice, one generic x86_64 or ARM built-in bitcode library does not work for all x86_64 and ARM variants if not built for a safe (least common denominator) target that cannot exploit the special features. This affects at least the scenario of binary-distributed pocl. In this case we have to build and distribute a generic bitcode lib as we do not know the real variants of the devices the end users have. Moreover, when we have some more widely useful heterogeneous devices supported (e.g. via the libcuda or gallium) we need to ship a bitcode lib for each of the potential devices in the binary distributions. It will be auto-probed which devices one has installed in the system, and the device list is populated dynamically accordingly. In this case one has to ship all possible bitcode libs in the distribution, just in case. Not to mention the use case of customizable processors of TCE. There we can have various combinations of operations supported by the device at hand. The kernel lib implementations could exploit those operations in different combinations, chosen by preprocessor macros set by the TCE compiler. This scenario is currently not possible and we ship a generic TCE lib that cannot exploit any special operations explicitly. One solution I proposed was to distribute and install the sources of the kernel lib and build the optimized kernel bitcode libs on-demand. Then these would be cached to the user's home dir, so only the first use would get the performance hit (there could be a separate population step in the installation of the binary that does it). I think this could work alright, especially if we still allow installing bitcodes too as "sources" for the kernels: then we can case-by-case use compiling from sources and prebuilt bitcodes. What do you think? -- Pekka ------------------------------------------------------------------------------ October Webinars: Code for Performance Free Intel webinars can help you accelerate application performance. Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from the latest Intel processors and coprocessors. See abstracts and register > http://pubads.g.doubleclick.net/gampad/clk?id=60135031&iu=/4140/ostg.clktrk _______________________________________________ pocl-devel mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/pocl-devel
