On Wed, Aug 17, 2016 at 2:45 PM, Matthias Noack <[email protected]> wrote:
> Hi Erik, > > thanks for your reply. > > On 17.08.2016 19:37, Erik Schnetter wrote: > > On Wed, Aug 17, 2016 at 11:16 AM, Matthias Noack <[email protected]> > wrote: > > pocl will compile the OpenCL kernel library at build time. This is the > support library containing the definitions of functions such as sin, cos, > sqrt, etc., including their vector counterparts. For best performance, you > need architecture-optimized version of this library. > > I understand. I was wondering about the origin of the math built-ins as > the library mentions some external library. The documentation mentions > Vecmathlib, which seems to be written by you. Is it always used or do I > need to activate it somehow? > Yes, correct. Vecmathlib is active by default, but can be explicitly deactivated. If it finds an unsupported architecture, it will automatically scalarize all vector operations (obviously very slow). I haven't worked on Vecmathlib in quite some time, hoping that other libraries will become more mature, and will e.g. become part of LLVM. This has not happened yet, though. My interest in Vecmathlib is mostly in efficiently iterating over multi-dimensional arrays taking vector size, cache line size, alignment etc. into account; however, these features are not part of OpenCL. A colleague of mine and I recently looked into different SIMD coding > techniques using gcc, clang and the Intel compiler. Intel has its libsvml > for vectorised math functions, GNU comes with libmvec in newer glibc > versions, but LLVM/clang seems to lack an equivalent. Manual vectorisation > with intrinsic-wrapping C++ class libraries like Vc, which comes with its > own math function implementations, were the only way to get good > performance with LLVM/clang. > That is my experience as well. > On a side node, pocl's support for AVX-512 was implemented targeting the > Intel MIC architecture found e.g. on Stampede. > > Just to be sure we are on the same page on for everyone else who might > read this, some terminology and facts: > > - MIC (Many Integrated Cores) is the architecture on which the Intel Xeon > Phi product line is based > > - KNC (Knights Corner), was the first Intel Xeon Phi 71xx product line > - coprocessor only, almost x86-64, but own binary format ('k1om', > because no SSE2 registers as needed by x86-64 calling convention) > - 512-bit SIMD units, but *not* AVX-512 > - cross compilation with Intel compiler only for applications (and a > patched gcc for it's Linux-based OS) > - Intel OpenCL with KNC-SIMD instructions is available but was > discontinued in newer releases > > - KNL (Knights Landing), is the current Intel Xeon Phi 72xx product line > (officially released at ISC'16, in June) > - bootable CPU, fully x86-64 compatible (i.e. a > - AVX512, and everything before > - all x86-64 compilers and frameworks work, but performance depends on > AVX-512 support and platform-specific optimisations > => lots of stuff to try > - no official Intel OpenCL support, but x86-64 SDK kind of works with > with AVX2 (not competitive with OpenMP performance) > Thanks. I was not aware of the state of affairs of OpenCL there... Here are some very early numbers comparing AVX2 and AVX-512 using basically > the same benchmarks with mostly OpenMP, which I now use for OpenCL: > https://drive.google.com/open?id=0B9D5EnxRqcaZaU1vbWJHUklMSWs > > I don't know whether the compiler intrinsics are identical to the current > version of AVX-512. If not, and if the kernel library is important to you, > then I will be happy to assist updating the respective parts of pocl. > > > There are slight differences, but its not much. Sadly, I don't know of any > publicly available document listing them, only the Intel intrinsics guide: > https://software.intel.com/sites/landingpage/IntrinsicsGuide/ > > Also there is "AVX-512 Common", the portable portion, "AVX512 MIC" with > some extensions for numerical work-loads, and some others. > > For PoCL on KNL, the first question is: Is the KNC MIC-implementation of > the kernel library usable on KNL, and is it already used? My best guess is, > that PoCL's build-system won't use it if built natively on a KNL system. So > maybe we should try to enforce it, and see what happens, and fix it if > necessary. > > Since I do not (yet!) have access to a KNL system, this might involve some > trial and error. > > Can't provide you with direct access. ;-) But I guess we could work > together in a PoCL fork on GitHub and I can run tests as needed. > > Currently, I get messages like: > remark: <unknown>:0:0: loop not vectorized: value that could not be > identified as reduction is used outside the loop > remark: <unknown>:0:0: loop not vectorized: use > -Rpass-analysis=loop-vectorize for more info > so it seems that LLVM has trouble vectorising kernel (while Intel OpenCL > does). > > Any hint on how I can pass through that "-Rpass-analysis=loop-vectorize"? > Sorry, I don't know off-hand. I hope one of the other developers will comment. -erik > Performance for basic arithmetic operators and FMA is off by 4 to 5x (i.e. > slower than Intel OpenCL with AVX2) - hopefully that's the 4x vectorisation > advantage of Intel OpenCL. Built-in runtimes are close for e.g. exp(), but > off by > 10x for log(). > > Well, any input is welcome. :-) > > Cheers, > Matthias > > ------------------------------------------------------------ > ------------------ > > _______________________________________________ > pocl-devel mailing list > [email protected] > https://lists.sourceforge.net/lists/listinfo/pocl-devel > > -- Erik Schnetter <[email protected]> http://www.perimeterinstitute.ca/personal/eschnetter/
------------------------------------------------------------------------------
_______________________________________________ pocl-devel mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/pocl-devel
