On Wed, Aug 17, 2016 at 11:16 AM, Matthias Noack <[email protected]> wrote:
> Hi all, > > I've recently started to evaluate PoCL on the recent Intel Xeon Phi > generation (KNL). My motivation is the lack of OpenCL support by Intel > for HPC platforms. The Intel OpenCL SDK runs with a work-around in AVX2 > mode. My hope is to get AVX-512 support, i.e. performance competitive > with OpenMP, via PoCL/LLVM. > Very good! I'm excited to see this. I am currently gathering benchmark results. Therefore I use a micro > benchmark to compare the performance of basic arithmetic operators and > math functions between OpenMP and OpenCL on different target > architectures, and also some real world scientific codes that use OpenCL. > > I hope this is the right place to ask questions. The first one would be: > > Q: Does it matter, performance-wise, on which platform I build PoCL vs. > on which I use PoCL - all within the x86-64 world? > Yes, it does. pocl will compile the OpenCL kernel library at build time. This is the support library containing the definitions of functions such as sin, cos, sqrt, etc., including their vector counterparts. For best performance, you need architecture-optimized version of this library. This is more problematic than one would naively think, because the x86-64 ABI assumes that SSE2 is present, but assumes that AVX and higher are not present. pocl now has to make a choice -- either to not use AVX features in function calls (unacceptable for performance, cannot pass vector values in registers), or silently use different ABI "versions" depending on the hardware that is detected. I don't know exactly how many ABI versions there are, but I assume there are at least three, depending on whether the ymm and zmm registers exist or not, respectively. Since this is not addressed by the official x86-64 ABI, there exist basically no features to prevent or detect ABI mismatches. (In writing this, I think we should add such a feature to pocl.) Other features (e.g. whether SSE 4 etc. are present) do not influence the ABI, but still influence how particular run-time functions should be implemented. pocl currently does not contain the respective fine-grained function selection mechanism at run time. In short -- pocl works best if compiled on the host where it will run. On a side node, pocl's support for AVX-512 was implemented targeting the Intel MIC architecture found e.g. on Stampede. I don't know whether the compiler intrinsics are identical to the current version of AVX-512. If not, and if the kernel library is important to you, then I will be happy to assist updating the respective parts of pocl. Since I do not (yet!) have access to a KNL system, this might involve some trial and error. It's been some time since I looked at the makefile and run-time magic in pocl that handles architecture selection, so part of my answer might be outdated. -erik Background: I use PoCL on a Xeon (Haswell, AVX2) host, and a Xeon Phi > (KNL, AVX-512) which share a home. So do I need two builds of PoCL to > generate optimal code for my OpenCL kernels, or will the PoCL OpenCL > compiler figure out whether to use AVX2 or AVX-512 at runtime? > > Thanks, > Matthias > > > > ------------------------------------------------------------ > ------------------ > _______________________________________________ > pocl-devel mailing list > [email protected] > https://lists.sourceforge.net/lists/listinfo/pocl-devel > -- Erik Schnetter <[email protected]> http://www.perimeterinstitute.ca/personal/eschnetter/
------------------------------------------------------------------------------
_______________________________________________ pocl-devel mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/pocl-devel
