Re: [pocl-devel] PoCL on Intel Xeon Phi (KNL)

Erik Schnetter Wed, 17 Aug 2016 10:38:47 -0700

On Wed, Aug 17, 2016 at 11:16 AM, Matthias Noack <[email protected]>
wrote:


> Hi all,
>
> I've recently started to evaluate PoCL on the recent Intel Xeon Phi
> generation (KNL). My motivation is the lack of OpenCL support by Intel
> for HPC platforms. The Intel OpenCL SDK runs with a work-around in AVX2
> mode. My hope is to get AVX-512 support, i.e. performance competitive
> with OpenMP, via PoCL/LLVM.
>

Very good! I'm excited to see this.

I am currently gathering benchmark results. Therefore I use a micro
> benchmark to compare the performance of basic arithmetic operators and
> math functions between OpenMP and OpenCL on different target
> architectures, and also some real world scientific codes that use OpenCL.
>
> I hope this is the right place to ask questions. The first one would be:
>
> Q: Does it matter, performance-wise, on which platform I build PoCL vs.
> on which I use PoCL - all within the x86-64 world?
>

Yes, it does.

pocl will compile the OpenCL kernel library at build time. This is the
support library containing the definitions of functions such as sin, cos,
sqrt, etc., including their vector counterparts. For best performance, you
need architecture-optimized version of this library.

This is more problematic than one would naively think, because the x86-64
ABI assumes that SSE2 is present, but assumes that AVX and higher are not
present. pocl now has to make a choice -- either to not use AVX features in
function calls (unacceptable for performance, cannot pass vector values in
registers), or silently use different ABI "versions" depending on the
hardware that is detected. I don't know exactly how many ABI versions there
are, but I assume there are at least three, depending on whether the ymm
and zmm registers exist or not, respectively. Since this is not addressed
by the official x86-64 ABI, there exist basically no features to prevent or
detect ABI mismatches. (In writing this, I think we should add such a
feature to pocl.)

Other features (e.g. whether SSE 4 etc. are present) do not influence the
ABI, but still influence how particular run-time functions should be
implemented. pocl currently does not contain the respective fine-grained
function selection mechanism at run time.

In short -- pocl works best if compiled on the host where it will run.

On a side node, pocl's support for AVX-512 was implemented targeting the
Intel MIC architecture found e.g. on Stampede. I don't know whether the
compiler intrinsics are identical to the current version of AVX-512. If
not, and if the kernel library is important to you, then I will be happy to
assist updating the respective parts of pocl. Since I do not (yet!) have
access to a KNL system, this might involve some trial and error.

It's been some time since I looked at the makefile and run-time magic in
pocl that handles architecture selection, so part of my answer might be
outdated.

-erik

Background: I use PoCL on a Xeon (Haswell, AVX2) host, and a Xeon Phi
> (KNL, AVX-512) which share a home. So do I need two builds of PoCL to
> generate optimal code for my OpenCL kernels, or will the PoCL OpenCL
> compiler figure out whether to use AVX2 or AVX-512 at runtime?
>
> Thanks,
> Matthias
>
>
>
> ------------------------------------------------------------
> ------------------
> _______________________________________________
> pocl-devel mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/pocl-devel
>



-- 
Erik Schnetter <[email protected]>
http://www.perimeterinstitute.ca/personal/eschnetter/

------------------------------------------------------------------------------

_______________________________________________
pocl-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/pocl-devel

Re: [pocl-devel] PoCL on Intel Xeon Phi (KNL)

Reply via email to