3x less overhead*

On Thu, Nov 22, 2018 at 6:25 AM Chris Olivier <cjolivie...@gmail.com> wrote:

> someone is doing something unhealthy when they fork, which is causing an
> assertion in the openmp library. the same assertion that would fire in mkl,
> which is linked to libiomp5 (exact same omp library). this is new behavior
> and most likely due to an error or suboptimal approach in the forking logic
> in mxnet.
>
> in order to circumvent the assert, the Ci team is proposing to remove the
> library completely which is equivalent to cutting off your leg to make the
> pain from stubbing your toe go away.
>
> we get a lot of performance gain from OMP. is has about a 1/3 less
> overhead for entering omp regions and also supports omp regions after a
> fork, which libgomp does not.
>
> in many months, no investigation has occurred as to WHY the assertion is
> failing.
>
> the pr is vetoed until such a time that the actual root cause of the
> problem is known.
>
>
> thanks,
>
> -Chris.
>
>
>
>
> On Thu, Nov 22, 2018 at 4:36 AM Anton Chernov <mecher...@gmail.com> wrote:
>
>> Dear MXNet community,
>>
>> I would like to drive attention to an important issue that is present in
>> the MXNet CMake build: usage of bundled llvm OpenMP library.
>>
>> I have opened a PR to remove it:
>> https://github.com/apache/incubator-mxnet/pull/12160
>>
>> The issue was closed, but I am strong in my oppinion that it's the right
>> thing to do.
>>
>> *Background*
>> If you want to use OpenMP pragmas in your code for parallelization you
>> would supply a special flag to the compiler:
>>
>> - Clang / -fopenmp
>> https://openmp.llvm.org/
>>
>> - GCC / -fopenmp
>> https://gcc.gnu.org/onlinedocs/libgomp/Enabling-OpenMP.html
>>
>> - Intel / [Q]openmp
>>
>> https://software.intel.com/en-us/node/522689#6E24682E-F411-4AE3-A04D-ECD81C7008D1
>>
>> - Visual Studio: /openmp (Enable OpenMP 2.0 Support)
>> https://msdn.microsoft.com/en-us/library/tt15eb9t.aspx
>>
>> Each of the compilers would enable the '#pragma omp' directive during
>> C/C++
>> compilation and arrange for automatic linking of the OpenMP runtime
>> library
>> supplied by each complier separately.
>>
>> Thus, to use the advantages of an OpenMP implementation one has to compile
>> the code with the corresponding compiler.
>>
>> Currently, in MXNet CMake build scripts a bundled version of llvm OpenMP
>> is
>> used ([1] and [2]) to replace the OpenMP library supplied by the compiler.
>>
>> I will quote here the README from the MKL-DNN (Intel(R) Math Kernel
>> Library
>> for Deep Neural Networks):
>>
>> "Intel MKL-DNN uses OpenMP* for parallelism and requires an OpenMP runtime
>> library to work. As different OpenMP runtimes may not be binary compatible
>> it's important to ensure that only one OpenMP runtime is used throughout
>> the application. Having more than one OpenMP runtime initialized may lead
>> to undefined behavior resulting in incorrect results or crashes." [3]
>>
>> And:
>>
>> "Using GNU compiler with -fopenmp and -liomp5 options will link the
>> application with both Intel and GNU OpenMP runtime libraries. This will
>> lead to undefined behavior of the application." [4]
>>
>> As can be seen from ldd for MXNet:
>>
>> $ ldd build/tests/mxnet_unit_tests | grep omp
>>     libomp.so => /.../mxnet/build/3rdparty/openmp/runtime/src/libomp.so
>> (0x00007f697bc55000)
>>     libgomp.so.1 => /usr/lib/x86_64-linux-gnu/libgomp.so.1
>> (0x00007f69660cd000)
>>
>> *Performance*
>>
>> The only performance data related to OpenMP in MXNet I was able to find is
>> here:
>>
>> https://github.com/apache/incubator-mxnet/issues/9744#issuecomment-367711172
>>
>> Which in my understanding is testing imact of different environment
>> variables for the same setup (using same bundled OpenMP library).
>>
>> The libraries may differ in implementation and the Thread Affinity
>> Interface [5] may have significant impact on performance.
>>
>> All compliers support it:
>>
>> - Clang / KMP_AFFINITY
>>
>> https://github.com/clang-ykt/openmp/blob/master/runtime/src/kmp_affinity.cpp
>>
>> - GCC / GOMP_CPU_AFFINITY
>>
>> https://gcc.gnu.org/onlinedocs/gcc-4.7.1/libgomp/GOMP_005fCPU_005fAFFINITY.html
>>
>> - Intel / KMP_AFFINITY
>>
>> https://software.intel.com/en-us/node/522689#6E24682E-F411-4AE3-A04D-ECD81C7008D1
>>
>> - Visual Studio / SetThreadAffinityMask
>>
>> https://docs.microsoft.com/en-us/windows/desktop/api/winbase/nf-winbase-setthreadaffinitymask
>>
>> *Issues*
>>
>> Failed OpenMP assertion when loading MXNet compiled with DEBUG=1
>> https://github.com/apache/incubator-mxnet/issues/10856
>>
>> libomp.so dependency (need REAL fix)
>> https://github.com/apache/incubator-mxnet/issues/11417
>>
>> mxnet-mkl (v0.12.0) crash when using (conda-installed) numpy with MKL
>> https://github.com/apache/incubator-mxnet/issues/8532
>>
>> Performance regression when OMP_NUM_THREADS environment variable is not
>> set
>> https://github.com/apache/incubator-mxnet/issues/9744
>>
>> Poor concat CPU performance on CUDA builds
>> https://github.com/apache/incubator-mxnet/issues/11905
>>
>> I would appreciate hearing your thoughts.
>>
>>
>> Best
>> Anton
>>
>> [1]
>>
>> https://github.com/apache/incubator-mxnet/blob/master/CMakeLists.txt#L400-L405
>> [2] https://github.com/apache/incubator-mxnet/tree/master/3rdparty
>> [3] https://github.com/intel/mkl-dnn/blame/master/README.md#L261-L265
>> [4] https://github.com/intel/mkl-dnn/blame/master/README.md#L278-L280
>> [5] https://software.intel.com/en-us/node/522691
>>
>

Reply via email to