Thanks for the great summary, Anton. I'm curious that is there anybody builds 
mxnet successfully with ICC/ICPC?

-----Original Message-----
From: Anton Chernov [mailto:mecher...@gmail.com] 
Sent: Thursday, November 22, 2018 8:36 PM
To: d...@mxnet.apache.org
Subject: [Discussion] Remove bundled llvm OpenMP

Dear MXNet community,

I would like to drive attention to an important issue that is present in the 
MXNet CMake build: usage of bundled llvm OpenMP library.

I have opened a PR to remove it:
https://github.com/apache/incubator-mxnet/pull/12160

The issue was closed, but I am strong in my oppinion that it's the right thing 
to do.

*Background*
If you want to use OpenMP pragmas in your code for parallelization you would 
supply a special flag to the compiler:

- Clang / -fopenmp
https://openmp.llvm.org/

- GCC / -fopenmp
https://gcc.gnu.org/onlinedocs/libgomp/Enabling-OpenMP.html

- Intel / [Q]openmp
https://software.intel.com/en-us/node/522689#6E24682E-F411-4AE3-A04D-ECD81C7008D1

- Visual Studio: /openmp (Enable OpenMP 2.0 Support) 
https://msdn.microsoft.com/en-us/library/tt15eb9t.aspx

Each of the compilers would enable the '#pragma omp' directive during C/C++ 
compilation and arrange for automatic linking of the OpenMP runtime library 
supplied by each complier separately.

Thus, to use the advantages of an OpenMP implementation one has to compile the 
code with the corresponding compiler.

Currently, in MXNet CMake build scripts a bundled version of llvm OpenMP is 
used ([1] and [2]) to replace the OpenMP library supplied by the compiler.

I will quote here the README from the MKL-DNN (Intel(R) Math Kernel Library for 
Deep Neural Networks):

"Intel MKL-DNN uses OpenMP* for parallelism and requires an OpenMP runtime 
library to work. As different OpenMP runtimes may not be binary compatible it's 
important to ensure that only one OpenMP runtime is used throughout the 
application. Having more than one OpenMP runtime initialized may lead to 
undefined behavior resulting in incorrect results or crashes." [3]

And:

"Using GNU compiler with -fopenmp and -liomp5 options will link the application 
with both Intel and GNU OpenMP runtime libraries. This will lead to undefined 
behavior of the application." [4]

As can be seen from ldd for MXNet:

$ ldd build/tests/mxnet_unit_tests | grep omp
    libomp.so => /.../mxnet/build/3rdparty/openmp/runtime/src/libomp.so
(0x00007f697bc55000)
    libgomp.so.1 => /usr/lib/x86_64-linux-gnu/libgomp.so.1
(0x00007f69660cd000)

*Performance*

The only performance data related to OpenMP in MXNet I was able to find is
here:
https://github.com/apache/incubator-mxnet/issues/9744#issuecomment-367711172

Which in my understanding is testing imact of different environment variables 
for the same setup (using same bundled OpenMP library).

The libraries may differ in implementation and the Thread Affinity Interface 
[5] may have significant impact on performance.

All compliers support it:

- Clang / KMP_AFFINITY
https://github.com/clang-ykt/openmp/blob/master/runtime/src/kmp_affinity.cpp

- GCC / GOMP_CPU_AFFINITY
https://gcc.gnu.org/onlinedocs/gcc-4.7.1/libgomp/GOMP_005fCPU_005fAFFINITY.html

- Intel / KMP_AFFINITY
https://software.intel.com/en-us/node/522689#6E24682E-F411-4AE3-A04D-ECD81C7008D1

- Visual Studio / SetThreadAffinityMask
https://docs.microsoft.com/en-us/windows/desktop/api/winbase/nf-winbase-setthreadaffinitymask

*Issues*

Failed OpenMP assertion when loading MXNet compiled with DEBUG=1
https://github.com/apache/incubator-mxnet/issues/10856

libomp.so dependency (need REAL fix)
https://github.com/apache/incubator-mxnet/issues/11417

mxnet-mkl (v0.12.0) crash when using (conda-installed) numpy with MKL
https://github.com/apache/incubator-mxnet/issues/8532

Performance regression when OMP_NUM_THREADS environment variable is not set
https://github.com/apache/incubator-mxnet/issues/9744

Poor concat CPU performance on CUDA builds
https://github.com/apache/incubator-mxnet/issues/11905

I would appreciate hearing your thoughts.


Best
Anton

[1]
https://github.com/apache/incubator-mxnet/blob/master/CMakeLists.txt#L400-L405
[2] https://github.com/apache/incubator-mxnet/tree/master/3rdparty
[3] https://github.com/intel/mkl-dnn/blame/master/README.md#L261-L265
[4] https://github.com/intel/mkl-dnn/blame/master/README.md#L278-L280
[5] https://software.intel.com/en-us/node/522691

Reply via email to