Hi Chris,
Following up on the issue, are all things resolved in the discussion?
If yes, I kindly ask you to reopen this PR and remove ‘requesting changes’
status:
https://github.com/apache/incubator-mxnet/pull/12160
Thank you.
Best
Anton
вт, 27 нояб. 2018 г. в 17:15, Anton Chernov :
> Another thing to take into consideration:
>
> All python artefacts that are created (PyPi) are built with make and are
> not using the bundled OpenMP library.
>
> One step for the switch to CMake to happen is the approval and merging of
> the mentioned PR:
>
> https://github.com/apache/incubator-mxnet/pull/12160
>
> If there are no other objections I kindly ask Chris Olivier to remove his
> 'requesting changes' veto on it to unblock the CMake overhaul work.
>
> Thank you.
>
> Best
> Anton
>
> чт, 22 нояб. 2018 г. в 17:11, Anton Chernov :
>
>>
>> Thank you for you answer, Chris.
>>
>> > The whole “mixing omp libraries” is something that occurs in production
>> every day and certainly in everything that uses mkl.
>>
>> I'm afraid this statement is wrong. Intel MKL-DNN strictly ensures that
>> this mixture is not happening:
>>
>> "Intel MKL-DNN uses OpenMP* for parallelism and requires an OpenMP
>> runtime library to work. As different OpenMP runtimes may not be binary
>> compatible it's important to ensure that only one OpenMP runtime is used
>> throughout the application. Having more than one OpenMP runtime initialized
>> may lead to undefined behavior resulting in incorrect results or crashes."
>> [1]
>>
>> That is why 2 different MKLML libraries are provided:
>>
>> lib/libmklml_gnu.so | Intel MKL small library for GNU* OpenMP runtime
>> lib/libmklml_intel.so | Intel MKL small library for Intel(R) OpenMP
>> runtime
>>
>> > is the suggestion that libiomp be removed from mkl?
>>
>> That is certainly not my suggestion.
>>
>> > have you spoken with intel? have you consulted Intel at all?
>>
>> Yes, I have asked for comments on the issue.
>>
>> > “hard to debug random crash”. you’re seeing an assertion which is
>> probably ...
>>
>> I'm seeing the result of undefined behaviour. And I want to put emphasis
>> on the following statement:
>>
>> I disregards of whether there is a particular reason for the assert - it
>> is a result of behaviour that should not happen. There are valid ways how
>> to use llvm OpenMP in MXNet and the current way is not one of them.
>>
>> > The lack of root-causing the problem and knee-jerk solution here makes
>> me
>> uncomfortable.
>>
>> I hope that my efforts highlighting the problems reach you to mitigate
>> your uncomfort.
>>
>> > if you want to see performance differences there’s an environment
>> variable
>> you can set in the mxnet omp tuning code that will print overhead and
>> execution times for the current omp library.
>>
>> I don't want to see performance differences in the current OpenMP
>> library. I want to remove the current OpenMP library and use the one
>> provided by the compiler.
>>
>>
>>
>> Best
>> Anton
>>
>> [1] https://github.com/intel/mkl-dnn/blame/master/README.md#L261-L265
>>
>> чт, 22 нояб. 2018 г. в 16:50, Chris Olivier :
>>
>>> Do you not work on CI mostly? My apologies for thinking that was some
>>> sort
>>> of team effort between you and a few others that were passionate about CI
>>> keeping the CI system running smoothly.
>>>
>>> You have source code, you have the line the assertion is on. If you can’t
>>> describe what’s going wrong that causes the assertion, then I don’t
>>> really
>>> have anything more to add to this conversation beyond what’s below:
>>>
>>> The whole “mixing omp libraries” is something that occurs in production
>>> every day and certainly in everything that uses mkl. It may occasionally
>>> cause problems for some edge cases when there is super-complex linking
>>> strategies and dynamic loading. But this is not one of those edge cases.
>>> Mostly blaming this is a red herring for other thread-related problems
>>> and
>>> people switch omp library and the timing of their code changes and they
>>> stop seeing the problem. I’ve spent my entire career doing heavily
>>> multiphreaded c++ development and i’ve seen that a million times. is the
>>> suggestion that libiomp be removed from mkl? have you spoken with intel?
>>> have you consulted Intel at all?
>>>
>>> and what you are seeing isn’t some “hard to debug random crash”. you’re
>>> seeing an assertion which is probably related to omp trying to create a
>>> thread pool after a fork and something was done in the mxnet code to make
>>> that sketchy to do. I’d suggest filing an issue with the llvm openmp just
>>> like you’d file with any other not-well-understood behavior in mxnet.
>>>
>>> The lack of root-causing the problem and knee-jerk solution here makes me
>>> uncomfortable.
>>>
>>> if you want to see performance differences there’s an environment
>>> variable
>>> you can set in the mxnet omp tuning code that will print overhead and
>>> execution times for the current omp library.
>>>
>>>