1) I don't think the problem is related to environment variables but is good to know that you agree we can remove the modification to OMP_NUM_THREADS which was creating random crashes with low probability. For the answer to your question I suggest to use debugger and reasoning about initialization, static construction and thread creation using pthread_atfork, I think we are mixing all those difficult and subtle actions with side effects and we have the perfect storm,
I think one of the possibilities is that two threads can initialize openmp at the same time, or at least if OMP is initialized and the operator tuning code is running at the same time you get this effect of getting inside the OpenMP code before it has been initialized or during initalization. __kmp_team_pool and other are volatile variables used inside the OMP engine change by different threads. I verified myself by modifying OpenMP and having a write watch in that memory region that indeed they were changing value during __kmp_do_serial_initialize and creating the assert described in this issue: https://github.com/apache/incubator-mxnet/issues/10856 Read below for a more detailed explanation of this process with backtraces included. Kind regards. ==== omp_get_num_procs and other openmp functions are called from different places concurrently, such as operator tuning and static initialization of openmp here: https://github.com/apache/incubator-mxnet/blob/master/src/engine/openmp.cc#L37 While static initialization is thread safe, the constructor of a statically initialized object might not be thread safe. Operator tuning is called during (static) library initialization . Static initialization is by the C++ standard done in an implementation defined order. 2) Is what I described above correct or it could cause some problems? At least is causing an assertion inside OpenMP so seems to violate an invariant that the OpenMP developers consider to hold, this is a bit of a concern for me. This explains the assertion during OpenMP. Let me know if you have any other questions or if you think this is incorrect. If i'm not mistaken you contributed some parts of this code. 3) Why the pip packages are using libgomp and when building from source we are using llvm openmp? (I asked this question on the 1.5 release thread). Below are some stack traces that justify my above observations and reasonings, which I captured using a debugger. __kmp_allocate_thread kmp_runtime.cpp:4153 __kmp_allocate_team kmp_runtime.cpp:4965 __kmp_fork_call kmp_runtime.cpp:1991 __kmp_GOMP_fork_call kmp_gsupport.cpp:290 __kmp_api_GOMP_parallel kmp_gsupport.cpp:1080 mxnet::op::OperatorTune<float>::GetOMPLoopOverhead operator_tune-inl.h:342 mxnet::op::OperatorTune<float>::GetOMPLoopOverhead operator_tune-inl.h:370 mxnet::op::OperatorTune<float>::Initialize operator_tune-inl.h:174 mxnet::op::OperatorTune<float>::TuneAll operator_tune-inl.h:220 mxnet::op::OperatorTune<float>::OperatorTune operator_tune-inl.h:116 mxnet::op::UnaryOpTune<float>::UnaryOpTune operator_tune-inl.h:534 mxnet::op::BinaryOpTune<float>::BinaryOpTune operator_tune-inl.h:724 __static_initialization_and_destruction_0 operator_tune.cc:369 _GLOBAL__sub_I_operator_tune.cc(void) operator_tune.cc:378 call_init 0x00007f8f4e41d733 _dl_init 0x00007f8f4e41d733 dl_open_worker 0x00007f8f4e4221ff __GI__dl_catch_exception 0x00007f8f4e1832df _dl_open 0x00007f8f4e4217ca dlopen_doit 0x00007f8f4dbf9f96 __GI__dl_catch_exception 0x00007f8f4e1832df __GI__dl_catch_error 0x00007f8f4e18336f _dlerror_run 0x00007f8f4dbfa735 __dlopen 0x00007f8f4dbfa051 <unknown> 0x00007f8f4b3eacda <unknown> 0x0000000000502d6f _PyEval_EvalFrameDefault 0x0000000000506859 <unknown> 0x0000000000504c28 _PyFunction_FastCallDict 0x0000000000501ba7 <unknown> 0x0000000000591461 <unknown> 0x000000000054b813 <unknown> 0x0000000000555421 _PyObject_FastCallKeywords 0x00000000005a730c <unknown> 0x0000000000503073 _PyEval_EvalFrameDefault 0x0000000000506859 <unknown> 0x0000000000502209 <unknown> 0x0000000000502f3d _PyEval_EvalFrameDefault 0x0000000000506859 <unknown> 0x0000000000504c28 <unknown> 0x0000000000511d78 PyCFunction_Call 0x000000000056617e _PyEval_EvalFrameDefault 0x000000000050bb66 <unknown> 0x0000000000504c28 <unknown> 0x0000000000502540 <unknown> 0x0000000000502f3d _PyEval_EvalFrameDefault 0x0000000000506859 <unknown> 0x0000000000502209 <unknown> 0x0000000000502f3d _PyEval_EvalFrameDefault 0x0000000000506859 <unknown> 0x0000000000502209 <unknown> 0x0000000000502f3d _PyEval_EvalFrameDefault 0x0000000000506859 <unknown> 0x0000000000502209 <unknown> 0x0000000000502f3d _PyEval_EvalFrameDefault 0x0000000000506859 _PyFunction_FastCallDict 0x0000000000501945 _PyObject_FastCallDict 0x00000000005a36f1 _PyObject_CallMethodIdObjArgs 0x000000000059662e PyImport_ImportModuleLevelObject 0x00000000004ee84d _PyEval_EvalFrameDefault 0x000000000050896c <unknown> 0x0000000000504c28 <unknown> 0x0000000000511d78 PyCFunction_Call 0x000000000056617e _PyEval_EvalFrameDefault 0x000000000050bb66 <unknown> 0x0000000000504c28 <unknown> 0x0000000000502540 <unknown> 0x0000000000502f3d _PyEval_EvalFrameDefault 0x0000000000506859 <unknown> 0x0000000000502209 <unknown> 0x0000000000502f3d _PyEval_EvalFrameDefault 0x0000000000506859 <unknown> 0x0000000000502209 <unknown> 0x0000000000502f3d _PyEval_EvalFrameDefault 0x0000000000506859 <unknown> 0x0000000000502209 <unknown> 0x0000000000502f3d _PyEval_EvalFrameDefault 0x0000000000506859 _PyFunction_FastCallDict 0x0000000000501945 _PyObject_FastCallDict 0x00000000005a36f1 _PyObject_CallMethodIdObjArgs 0x000000000059662e PyImport_ImportModuleLevelObject 0x00000000004ee84d _PyEval_EvalFrameDefault 0x000000000050896c <unknown> 0x0000000000504c28 <unknown> 0x0000000000511d78 PyCFunction_Call 0x000000000056617e _PyEval_EvalFrameDefault 0x000000000050bb66 <unknown> 0x0000000000504c28 <unknown> 0x0000000000502540 <unknown> 0x0000000000502f3d _PyEval_EvalFrameDefault 0x0000000000506859 <unknown> 0x0000000000502209 <unknown> 0x0000000000502f3d _PyEval_EvalFrameDefault 0x0000000000506859 <unknown> 0x0000000000502209 <unknown> 0x0000000000502f3d _PyEval_EvalFrameDefault 0x0000000000506859 <unknown> 0x0000000000502209 <unknown> 0x0000000000502f3d _PyEval_EvalFrameDefault 0x0000000000506859 _PyFunction_FastCallDict 0x0000000000501945 _PyObject_FastCallDict 0x00000000005a36f1 _PyObject_CallMethodIdObjArgs 0x000000000059662e PyImport_ImportModuleLevelObject 0x00000000004ee84d _PyEval_EvalFrameDefault 0x000000000050896c <unknown> 0x0000000000504c28 PyEval_EvalCode 0x0000000000506393 <unknown> 0x0000000000634d52 PyRun_FileExFlags 0x0000000000634e0a PyRun_SimpleFileExFlags 0x00000000006385c8 Py_Main 0x000000000063915a main 0x00000000004a6f10 __libc_start_main 0x00007f8f4e03db97 _start 0x00000000005afa0a (py3_venv) piotr@panther:0: ~/d/mxnet [master]> nosetests -v -s tests/python/unittest/test_gluon.py 2>&1 | head kmp __kmp_do_serial_initialize: kmp_team_pool: 0 kmp __kmp_do_serial_initialize xx: kmp_team_pool: 0 kmp __kmp_do_serial_initialize: kmp_team_pool: 1 Assertion failure at kmp_runtime.cpp(6488): __kmp_team_pool == __null. Due to pthread_atfork handlers, we are incurring on a significant overhead when creating any thread. I think we are creating and destroying the engine thread pool on EVERY thread initialization as shown in this backtrace: Thread-1-[python]: __GI___pthread_timedjoin_ex 0x00007f9d7708bd2d std::thread::join() 0x00007f9d199428c3 mxnet::engine::ThreadPool::~ThreadPool thread_pool.h:84 std::default_delete<mxnet::engine::ThreadPool>::operator() unique_ptr.h:78 std::unique_ptr<mxnet::engine::ThreadPool, std::default_delete<mxnet::engine::ThreadPool> >::~unique_ptr unique_ptr.h:268 mxnet::engine::ThreadedEnginePerDevice::ThreadWorkerBlock<(dmlc::ConcurrentQueueType)1>::~ThreadWorkerBlock threaded_engine_perdevice.cc:212 std::default_delete<mxnet::engine::ThreadedEnginePerDevice::ThreadWorkerBlock<(dmlc::ConcurrentQueueType)1> >::operator() unique_ptr.h:78 std::unique_ptr<mxnet::engine::ThreadedEnginePerDevice::ThreadWorkerBlock<(dmlc::ConcurrentQueueType)1>, std::default_delete<mxnet::engine::ThreadedEnginePerDevice::ThreadWorkerBlock<(dmlc::ConcurrentQueueType)1> > >::reset unique_ptr.h:376 mxnet::engine::ThreadedEnginePerDevice::StopNoWait threaded_engine_perdevice.cc:67 mxnet::engine::ThreadedEnginePerDevice::Stop threaded_engine_perdevice.cc:73 mxnet::LibraryInitializer::LibraryInitializer()::{lambda()#1}::operator()() const initialize.cc:61 mxnet::LibraryInitializer::LibraryInitializer()::{lambda()#1}::_FUN() initialize.cc:62 __libc_fork 0x00007f9d77386aca <unknown> 0x00000000005e8https://github.com/apache/incubator-mxnet/blob/master/src/engine/openmp.cc#L37646 <unknown> 0x0000000000502d6f _PyEval_EvalFrameDefault 0x0000000000506859 <unknown> 0x0000000000504c28 <unknown> 0x0000000000502540 <unknown> 0x0000000000502f3d _PyEval_EvalFrameDefault 0x0000000000506859 <unknown> 0x0000000000504c28 _PyFunction_FastCallDict 0x0000000000501b2e <unknown> 0x0000000000591461 <unknown> 0x000000000054b813 <unknown> 0x0000000000555421 _PyObject_FastCallKeywords 0x00000000005a730c <unknown> 0x0000000000503073 _PyEval_EvalFrameDefault 0x0000000000507641 <unknown> 0x0000000000504c28 <unknown> 0x0000000000502540 <unknown> 0x0000000000502f3d _PyEval_EvalFrameDefault 0x0000000000506859 <unknown> 0x0000000000504c28 <unknown> 0x0000000000502540 <unknown> 0x0000000000502f3d _PyEval_EvalFrameDefault 0x0000000000506859 <unknown> 0x0000000000502209 <unknown> 0x0000000000502f3d _PyEval_EvalFrameDefault 0x0000000000506859 <unknown> 0x0000000000502209 <unknown> 0x0000000000502f3d _PyEval_EvalFrameDefault 0x0000000000506859 <unknown> 0x0000000000502209 <unknown> 0x0000000000502f3d _PyEval_EvalFrameDefault 0x0000000000506859 <unknown> 0x0000000000502209 <unknown> 0x0000000000502f3d _PyEval_EvalFrameDefault 0x0000000000506859 <unknown> 0x0000000000504c28 <unknown> 0x0000000000510f36 <unknown> 0x00000000005030d5 _PyEval_EvalFrameDefault 0x0000000000506859 <unknown> 0x0000000000504c28 <unknown> 0x0000000000511d78 PyCFunction_Call 0x000000000056617e _PyEval_EvalFrameDefault 0x000000000050bb66 <unknown> 0x0000000000504c28 <unknown> 0x0000000000502540 <unknown> 0x0000000000502f3d _PyEval_EvalFrameDefault 0x0000000000506859 <unknown> 0x0000000000502209 <unknown> 0x0000000000502f3d _PyEval_EvalFrameDefault 0x0000000000506859 <unknown> 0x0000000000502209 <unknown> 0x0000000000502f3d _PyEval_EvalFrameDefault 0x0000000000506859 <unknown> 0x0000000000502209 <unknown> 0x0000000000502f3d _PyEval_EvalFrameDefault 0x0000000000506859 _PyFunction_FastCallDict 0x0000000000501945 _PyObject_FastCallDict 0x00000000005a36f1 _PyObject_CallMethodIdObjArgs 0x000000000059662e PyImport_ImportModuleLevelObject 0x00000000004ee84d <unknown> 0x0000000000510dc4 PyCFunction_Call 0x0000000000566103 _PyEval_EvalFrameDefault 0x000000000050bb66 <unknown> 0x0000000000504c28 <unknown> 0x0000000000502540 <unknown> 0x0000000000502f3d _PyEval_EvalFrameDefault 0x0000000000506859 <unknown> 0x0000000000504c28 _PyFunction_FastCallDict 0x0000000000501b2e _PyObject_FastCallDict 0x00000000005a36f1 _PyObject_CallMethodIdObjArgs 0x000000000059662e PyImport_ImportModuleLevelObject 0x00000000004ee790 _PyEval_EvalFrameDefault 0x000000000050896c <unknown> 0x0000000000504c28 <unknown> 0x0000000000511d78 PyCFunction_Call 0x000000000056617e _PyEval_EvalFrameDefault 0x000000000050bb66 <unknown> 0x0000000000504c28 <unknown> 0x0000000000502540 <unknown> 0x0000000000502f3d _PyEval_EvalFrameDefault 0x0000000000506859 <unknown> 0x0000000000502209 <unknown> 0x0000000000502f3d _PyEval_EvalFrameDefault 0x0000000000506859 <unknown> 0x0000000000502209 <unknown> 0x0000000000502f3d _PyEval_EvalFrameDefault 0x0000000000506859 <unknown> 0x0000000000502209 <unknown> 0x0000000000502f3d _PyEval_EvalFrameDefault 0x0000000000506859 _PyFunction_FastCallDict 0x0000000000501945 _PyObject_FastCallDict 0x00000000005a36f1 _PyObject_CallMethodIdObjArgs 0x000000000059662e PyImport_ImportModuleLevelObject 0x00000000004ee84d <unknown> 0x0000000000510dc4 PyCFunction_Call 0x0000000000566103 _PyEval_EvalFrameDefault 0x000000000050bb66 <unknown> 0x0000000000504c28 <unknown> 0x0000000000502540 <unknown> 0x0000000000502f3d _PyEval_EvalFrameDefault 0x0000000000506859 <unknown> 0x0000000000504c28 _PyFunction_FastCallDict 0x0000000000501b2e _PyObject_FastCallDict 0x00000000005a36f1 _PyObject_CallMethodIdObjArgs 0x000000000059662e PyImport_ImportModuleLevelObject 0x00000000004ee790 _PyEval_EvalFrameDefault 0x000000000050896c <unknown> 0x0000000000504c28 <unknown> 0x0000000000511d78 PyCFunction_Call 0x000000000056617e _PyEval_EvalFrameDefault 0x000000000050bb66 <unknown> 0x0000000000504c28 <unknown> 0x0000000000502540 <unknown> 0x0000000000502f3d _PyEval_EvalFrameDefault 0x0000000000506859 <unknown> 0x0000000000502209 <unknown> 0x0000000000502f3d _PyEval_EvalFrameDefault 0x0000000000506859 <unknown> 0x0000000000502209 <unknown> 0x0000000000502f3d _PyEval_EvalFrameDefault 0x0000000000506859 <unknown> 0x0000000000502209 <unknown> 0x0000000000502f3d _PyEval_EvalFrameDefault 0x0000000000506859 _PyFunction_FastCallDict 0x0000000000501945 _PyObject_FastCallDict 0x00000000005a36f1 _PyObject_CallMethodIdObjArgs 0x000000000059662e PyImport_ImportModuleLevelObject 0x00000000004ee84d <unknown> 0x0000000000510dc4 PyCFunction_Call 0x0000000000566103 _PyEval_EvalFrameDefault 0x000000000050bb66 <unknown> 0x0000000000504c28 <unknown> 0x0000000000502540 <unknown> 0x0000000000502f3d _PyEval_EvalFrameDefault 0x0000000000506859 <unknown> 0x0000000000504c28 _PyFunction_FastCallDict 0x0000000000501b2e _PyObject_FastCallDict 0x00000000005a36f1 _PyObject_CallMethodIdObjArgs 0x000000000059662e PyImport_ImportModuleLevelObject 0x00000000004ee790 _PyEval_EvalFrameDefault 0x000000000050896c <unknown> 0x0000000000504c28 <unknown> 0x0000000000511d78 PyCFunction_Call 0x000000000056617e _PyEval_EvalFrameDefault 0x000000000050bb66 <unknown> 0x0000000000504c28 <unknown> 0x0000000000502540 <unknown> 0x0000000000502f3d _PyEval_EvalFrameDefault 0x0000000000506859 <unknown> 0x0000000000502209 <unknown> 0x0000000000502f3d _PyEval_EvalFrameDefault 0x0000000000506859 <unknown> 0x0000000000502209 <unknown> 0x0000000000502f3d _PyEval_EvalFrameDefault 0x0000000000506859 <unknown> 0x0000000000502209 <unknown> 0x0000000000502f3d _PyEval_EvalFrameDefault 0x0000000000506859 _PyFunction_FastCallDict 0x0000000000501945 _PyObject_FastCallDict 0x00000000005a36f1 _PyObject_CallMethodIdObjArgs 0x000000000059662e PyImport_ImportModuleLevelObject 0x00000000004ee84d _PyEval_EvalFrameDefault 0x000000000050896c <unknown> 0x0000000000504c28 PyEval_EvalCode 0x0000000000506393 <unknown> 0x0000000000634d52 PyRun_FileExFlags 0x0000000000634e0a PyRun_SimpleFileExFlags 0x00000000006385c8 Py_Main 0x000000000063915a main 0x00000000004a6f10 __libc_start_main 0x00007f9d772c3b97 _start 0x00000000005afa0a At the same time, another thread is also inside the omp runtime: Thread-82: __kmp_free_team kmp_runtime.cpp:5331 __kmp_reset_root kmp_runtime.cpp:3850 __kmp_unregister_root_current_thread kmp_runtime.cpp:3942 __kmp_internal_end_thread kmp_runtime.cpp:6057 __kmp_internal_end_dest kmp_runtime.cpp:5620 __nptl_deallocate_tsd 0x00007f9d77089408 __nptl_deallocate_tsd 0x00007f9d7708a81b start_thread 0x00007f9d7708a81b clone 0x00007f9d773c388f ==================== Different timepoint __kmp_do_serial_initialize kmp_runtime.cpp:6485 __kmp_do_middle_initialize kmp_runtime.cpp:6597 __kmp_middle_initialize kmp_runtime.cpp:6706 __kmp_api_omp_get_num_procs kmp_ftn_entry.h:405 mxnet::engine::OpenMP::OpenMP openmp.cc:49 mxnet::engine::OpenMP::Get openmp.cc:37 __static_initialization_and_destruction_0 openmp.cc:110 _GLOBAL__sub_I_openmp.cc(void) openmp.cc:113 call_init 0x00007f439c83a733 _dl_init 0x00007f439c83a733 dl_open_worker 0x00007f439c83f1ff __GI__dl_catch_exception 0x00007f439c5a02df _dl_open 0x00007f439c83e7ca dlopen_doit 0x00007f439c016f96 __GI__dl_catch_exception 0x00007f439c5a02df __GI__dl_catch_error 0x00007f439c5a036f _dlerror_run 0x00007f439c017735 __dlopen 0x00007f439c017051 <unknown> 0x00007f4399807cda <unknown> 0x0000000000502d6f _PyEval_EvalFrameDefault 0x0000000000506859 <unknown> 0x0000000000504c28 _PyFunction_FastCallDict 0x0000000000501ba7 <unknown> 0x0000000000591461 <unknown> 0x000000000054b813 <unknown> 0x0000000000555421 _PyObject_FastCallKeywords 0x00000000005a730c <unknown> 0x0000000000503073 _PyEval_EvalFrameDefault 0x0000000000506859 <unknown> 0x0000000000502209 <unknown> 0x0000000000502f3d _PyEval_EvalFrameDefault 0x0000000000506859 <unknown> 0x0000000000504c28 <unknown> 0x0000000000511d78 PyCFunction_Call 0x000000000056617e _PyEval_EvalFrameDefault 0x000000000050bb66 <unknown> 0x0000000000504c28 <unknown> 0x0000000000502540 <unknown> 0x0000000000502f3d _PyEval_EvalFrameDefault 0x0000000000506859 <unknown> 0x0000000000502209 <unknown> 0x0000000000502f3d _PyEval_EvalFrameDefault 0x0000000000506859 <unknown> 0x0000000000502209 <unknown> 0x0000000000502f3d _PyEval_EvalFrameDefault 0x0000000000506859 <unknown> 0x0000000000502209 <unknown> 0x0000000000502f3d _PyEval_EvalFrameDefault 0x0000000000506859 _PyFunction_FastCallDict 0x0000000000501945 _PyObject_FastCallDict 0x00000000005a36f1 _PyObject_CallMethodIdObjArgs 0x000000000059662e PyImport_ImportModuleLevelObject 0x00000000004ee84d _PyEval_EvalFrameDefault 0x000000000050896c <unknown> 0x0000000000504c28 <unknown> 0x0000000000511d78 PyCFunction_Call 0x000000000056617e _PyEval_EvalFrameDefault 0x000000000050bb66 <unknown> 0x0000000000504c28 <unknown> 0x0000000000502540 <unknown> 0x0000000000502f3d _PyEval_EvalFrameDefault 0x0000000000506859 <unknown> 0x0000000000502209 <unknown> 0x0000000000502f3d _PyEval_EvalFrameDefault 0x0000000000506859 <unknown> 0x0000000000502209 <unknown> 0x0000000000502f3d _PyEval_EvalFrameDefault 0x0000000000506859 <unknown> 0x0000000000502209 <unknown> 0x0000000000502f3d _PyEval_EvalFrameDefault 0x0000000000506859 _PyFunction_FastCallDict 0x0000000000501945 _PyObject_FastCallDict 0x00000000005a36f1 _PyObject_CallMethodIdObjArgs 0x000000000059662e PyImport_ImportModuleLevelObject 0x00000000004ee84d _PyEval_EvalFrameDefault 0x000000000050896c <unknown> 0x0000000000504c28 <unknown> 0x0000000000511d78 PyCFunction_Call 0x000000000056617e _PyEval_EvalFrameDefault 0x000000000050bb66 <unknown> 0x0000000000504c28 <unknown> 0x0000000000502540 <unknown> 0x0000000000502f3d _PyEval_EvalFrameDefault 0x0000000000506859 <unknown> 0x0000000000502209 <unknown> 0x0000000000502f3d _PyEval_EvalFrameDefault 0x0000000000506859 <unknown> 0x0000000000502209 <unknown> 0x0000000000502f3d _PyEval_EvalFrameDefault 0x0000000000506859 <unknown> 0x0000000000502209 <unknown> 0x0000000000502f3d _PyEval_EvalFrameDefault 0x0000000000506859 _PyFunction_FastCallDict 0x0000000000501945 _PyObject_FastCallDict 0x00000000005a36f1 _PyObject_CallMethodIdObjArgs 0x000000000059662e PyImport_ImportModuleLevelObject 0x00000000004ee84d _PyEval_EvalFrameDefault 0x000000000050896c <unknown> 0x0000000000504c28 PyEval_EvalCode 0x0000000000506393 <unknown> 0x0000000000634d52 PyRun_FileExFlags 0x0000000000634e0a PyRun_SimpleFileExFlags 0x00000000006385c8 Py_Main 0x000000000063915a main 0x00000000004a6f10 __libc_start_main 0x00007f439c45ab97 _start 0x00000000005afa0a On Tue, Jun 25, 2019 at 1:55 PM Chris Olivier <cjolivie...@gmail.com> wrote: > > 1) I don't see how that code could cause reentrancy problems in omp. It > doesn't make any OMP calls at all. Still doesn't look related to me. > Setting an environment variable probably doesn't even do anything, because: > a) It probably doesn't check the environment variable except at initial > startup > b) Even if it did, whether this code ran before or after the OMP init > code would be nondeterministic > c) It for sure doesn't check the environment variable every time it hits > an omp region. That would be ridiculously expensive and checking the OMP > source code, it doesn't.. You can't affect the OMP behavior at arbitrary > points in time by setting the "OMP_NUM_THREADS" environment variable. > > > > > On Tue, Jun 25, 2019 at 1:20 PM Pedro Larroy <pedro.larroy.li...@gmail.com> > wrote: > > > Nobody claimed that the original lockup has to do with OMP, but the > > fix caused re-entrancy into OMP initialization as explained below. So > > I agree with your statement that the bug that using pthread_atfork was > > fixing is not related with OMP, but the fix is causing interactions > > with OMP as described above. > > > > Pedro. > > > > On Tue, Jun 25, 2019 at 12:33 PM Chris Olivier <cjolivie...@gmail.com> > > wrote: > > > > > > The call stacks there are mostly associated with the execution engine > > > threads, which are not OMP threads. That lockup doesn't look to me to be > > > related to OMP -- the execution engine uses its own thread pool logic > > -- > > > I'm pretty familiar with that part of the code. Unless I am missing one > > -- > > > can you point to the one that looks OMP-related? > > > > > > > > > On Tue, Jun 25, 2019 at 10:35 AM Pedro Larroy < > > pedro.larroy.li...@gmail.com> > > > wrote: > > > > > > > Thanks for digging that out Kellen. That's good info so maybe it would > > > > be good to rework the fix with the info you provided and remove the > > > > pthread_atfork handlers. > > > > Do you think setting the device would avoid the problem seen on the > > > > backtrace you provided? specifically here: > > > > > > > > > > https://gist.github.com/KellenSunderland/893d11165e19d1efcf5c0fe8e8584600#file-hang_bt-L24 > > > > > > > > On Mon, Jun 24, 2019 at 6:43 PM kellen sunderland > > > > <kellen.sunderl...@gmail.com> wrote: > > > > > > > > > > I remember at the time we also had a read through of this blog post, > > but > > > > to > > > > > use the code looked like it was following the advice: > > > > > > > > > > > https://devblogs.nvidia.com/cuda-pro-tip-always-set-current-device-avoid-multithreading-bugs/ > > > > > > > > > > On Mon, Jun 24, 2019 at 6:39 PM kellen sunderland < > > > > > kellen.sunderl...@gmail.com> wrote: > > > > > > > > > > > I remember this hang as well, it was pretty hard to reproduce > > IIRC. I > > > > > > believe the stacks for the hang are here: > > > > > > > > > > > > https://gist.github.com/KellenSunderland/893d11165e19d1efcf5c0fe8e8584600 > > > > and > > > > > > the trick was we could only debug it up to the point that we hit: > > > > > > > > > > > > #0 0x00007fec6df1ba4f in futex_wait (private=0, expected=1, > > > > > > futex_word=0x7fec60843758) > > > > > > at ../sysdeps/unix/sysv/linux/futex-internal.h:61 > > > > > > #1 futex_wait_simple (private=0, expected=1, > > > > futex_word=0x7fec60843758) > > > > > > at ../sysdeps/nptl/futex-internal.h:135 > > > > > > #2 __pthread_once_slow (once_control=0x7fec60843758, > > > > > > init_routine=0x7fec605f38f0) > > > > > > at pthread_once.c:105 > > > > > > ... > > > > > > #6 0x00007fec6061c577 in cudaSetDevice () from > > > > > > /usr/local/cuda/lib64/libcudart.so.9.0 > > > > > > > > > > > > because the code in libcudart is obviously closed source we > > couldn't > > > > dig > > > > > > into what threading work was going on when we called cudaSetDevice. > > > > > > > > > > > > On Mon, Jun 24, 2019 at 6:13 PM Pedro Larroy < > > > > pedro.larroy.li...@gmail.com> > > > > > > wrote: > > > > > > > > > > > >> If you check initialize.cc we seem to be explicitly disabling that > > > > > >> behaviour in pthread_at_fork which seems to cause thread > > contention > > > > > >> during multiprocessing. Why do we need this major advantage for > > the > > > > > >> library if that's the case? > > > > > >> > > > > > >> Related PRs: > > > > > >> > > > > > >> https://github.com/apache/incubator-mxnet/pull/10820 > > > > > >> https://github.com/apache/incubator-mxnet/issues/14396 > > > > > >> > > > > > >> The original code was authored in this PR: > > > > > >> > > > > > >> https://github.com/apache/incubator-mxnet/pull/8677 > > > > > >> > > > > > >> I actually remember this fix, it was done during a release as the > > cuda > > > > > >> runtime was forking and the engine was being re-entered. If that > > > > > >> situation is now happening anymore it might not be needed any > > longer. > > > > > >> I don't think we know the cause why there was a fork inside cuda, > > so > > > > > >> the code has grown around a fix for an issue which its root cause > > was > > > > > >> not understood, and side effects which this fix caused afterwards. > > > > > >> > > > > > >> My build uses MKL+LLVM OMP+DEBUG as seen in the container > > provided in > > > > > >> the link above, no libgomp. > > > > > >> > > > > > >> I didn't try the Make build. > > > > > >> > > > > > >> I would refactor the code linked above and stop using > > pthread_at_fork, > > > > > >> since OMP assumes it won't be initialized twice, but needs to be > > very > > > > > >> well tested to make sure it doesn't cause bugs or affect the fixes > > > > > >> done on the linked PRs above. > > > > > >> > > > > > >> Pedro. > > > > > >> > > > > > >> On Mon, Jun 24, 2019 at 5:38 PM Chris Olivier < > > cjolivie...@gmail.com> > > > > > >> wrote: > > > > > >> > > > > > > >> > one major advantage of intel/llvm omp is that it spawns a new > > thread > > > > > >> pool > > > > > >> > after fork if a thread pool was already created. this is so > > that omp > > > > > >> can be > > > > > >> > used in the forked processes. libgomp doesn’t do this so it’ll > > just > > > > > >> lock up > > > > > >> > if you try to do omp in the forked process. > > > > > >> > > > > > > >> > is your build linking libgomp as well? > > > > > >> > > > > > > >> > standard mkl build (from Makefile) uses same omp library. are > > there > > > > > >> > problems with that build? > > > > > >> > > > > > > >> > what changes need to be made to make the assertion not fire? > > > > > >> > > > > > > >> > On Mon, Jun 24, 2019 at 5:32 PM Pedro Larroy < > > > > > >> pedro.larroy.li...@gmail.com> > > > > > >> > wrote: > > > > > >> > > > > > > >> > > There's an assertion which is easily reproducible, and also > > > > there's a > > > > > >> > > crash including core dump, the latter is not easy to reproduce > > > > for me > > > > > >> > > in different environments. I have also seen mxnet getting > > stuck > > > > > >> > > without progressing with this build configuration and using no > > > > CPU at > > > > > >> > > all when running unit tests. > > > > > >> > > > > > > > >> > > In my view, the root cause of the assertion is that we are > > > > re-entering > > > > > >> > > OMP initialization when spawning threads on the following code > > > > through > > > > > >> > > pthread_at_fork > > > > > >> > > > > > > > >> > > > > > > > >> > > > > > > https://github.com/apache/incubator-mxnet/blob/master/src/initialize.cc#L58 > > > > > >> > > > > > > > >> > > This causes double initialization of the OMP engine, > > including the > > > > > >> > > assertion which you are asking about, and I suspect some > > > > additional > > > > > >> > > overhead. That's the shady forking part you are asking for. > > > > > >> > > > > > > > >> > > A question for you: What is the cause of runtime differences > > > > between > > > > > >> > > OMP runtimes? Shouldn't the implementation overhead diminish > > as > > > > > >> > > threads run longer? > > > > > >> > > > > > > > >> > > Pedro. > > > > > >> > > > > > > > >> > > On Mon, Jun 24, 2019 at 5:10 PM Chris Olivier < > > > > cjolivie...@gmail.com> > > > > > >> > > wrote: > > > > > >> > > > > > > > > >> > > > What’s the reason for the assertion failure? btw > > classifying an > > > > > >> assertion > > > > > >> > > > failure a “crash” is debatable. As I stated in the original > > > > issue a > > > > > >> long > > > > > >> > > > time ago, it’s possible something shady is being done with > > when > > > > > >> forking > > > > > >> > > > that should be fixed. The assertion should be root caused. > > > > > >> > > > > > > > > >> > > > > > > > > >> > > > > > > > > >> > > > On Mon, Jun 24, 2019 at 1:22 PM Pedro Larroy < > > > > > >> > > pedro.larroy.li...@gmail.com> > > > > > >> > > > wrote: > > > > > >> > > > > > > > > >> > > > > Added a dockerfile, and reports of a crash in my local > > machine > > > > > >> when > > > > > >> > > > > running MKL+OMP+DEBUG, with Anton's branch the crash > > happened > > > > as > > > > > >> well. > > > > > >> > > > > I couldn't reproduce the crash on my EC2 machine: > > > > > >> > > > > Added the backtrace of the crash as well. > > > > > >> > > > > > > > > > >> > > > > https://github.com/apache/incubator-mxnet/issues/10856 > > > > > >> > > > > > > > > > >> > > > > Dockerfile here: > > > > > >> > > > > > > > > > >> > > > > https://github.com/larroy/mxnet_omp > > > > > >> > > > > > > > > > >> > > > > Kind regards. > > > > > >> > > > > > > > > > >> > > > > Pedro. > > > > > >> > > > > > > > > > >> > > > > On Thu, Jun 20, 2019 at 5:29 PM Marco de Abreu < > > > > > >> > > marco.g.ab...@gmail.com> > > > > > >> > > > > wrote: > > > > > >> > > > > > > > > > > >> > > > > > As already proposed, I think the easiest way to get a > > common > > > > > >> > > > > understanding > > > > > >> > > > > > is if we start with a few docker containers. Pedro, > > would > > > > it be > > > > > >> > > possible > > > > > >> > > > > > for you to wrap your benchmarks into a few containers > > that > > > > will > > > > > >> > > produce > > > > > >> > > > > > your shown results? That way, we can avoid possible > > > > > >> > > misunderstandings and > > > > > >> > > > > > also pinpoint the exact parts where people disagree or > > > > > >> misunderstood > > > > > >> > > each > > > > > >> > > > > > other. > > > > > >> > > > > > > > > > > >> > > > > > -Marco > > > > > >> > > > > > > > > > > >> > > > > > Pedro Larroy <pedro.larroy.li...@gmail.com> schrieb am > > Do., > > > > > >> 20. Juni > > > > > >> > > > > 2019, > > > > > >> > > > > > 21:47: > > > > > >> > > > > > > > > > > >> > > > > > > I can confirm that we are linking with two versions of > > > > omp, > > > > > >> I'm > > > > > >> > > > > > > gaining more clarity into this topic, but I have still > > > > > >> questions, > > > > > >> > > the > > > > > >> > > > > > > facts that I got so far are the folllowing: > > > > > >> > > > > > > > > > > > >> > > > > > > * #1: We are linking with two versions of omp, > > intel's omp > > > > > >> and llvm > > > > > >> > > > > > > openmp when building with MKL enabled. > > > > > >> > > > > > > * #2: We have 3 different possible OMP versions: > > Intel OMP > > > > > >> (comes > > > > > >> > > with > > > > > >> > > > > > > MKL), LLVM OpenMP (3rdparty/openmp), libgomp (comes > > with > > > > gcc) > > > > > >> (This > > > > > >> > > > > > > one is used on the PR proposed by Anton). > > > > > >> > > > > > > > > > > > >> > > > > > > Questions: > > > > > >> > > > > > > > > > > > >> > > > > > > * #1 Is it ok to have two versions of openmp linked > > at > > > > the > > > > > >> same > > > > > >> > > time? > > > > > >> > > > > > > * #2 Which implementation of OMP gives the best > > > > > >> performance? (See > > > > > >> > > > > > > total training time of my measurement for a partial > > > > answer) > > > > > >> > > > > > > * #3 Should we have a build flag so we can choose > > the OMP > > > > > >> version > > > > > >> > > at > > > > > >> > > > > > > runtime? > > > > > >> > > > > > > * #4 Which Compiler and build flags did Chris use to > > get > > > > 10x > > > > > >> > > slowdown? > > > > > >> > > > > > > * #5 @Stas: is there a script to replicate your > > > > benchmarks > > > > > >> > > easily? If > > > > > >> > > > > > > so could you provide a link? I think we would need to > > > > > >> reproduce > > > > > >> > > your > > > > > >> > > > > > > benchmarks and verify which versions are being linked. > > > > It's > > > > > >> > > possible > > > > > >> > > > > > > that while compiling with MKL intel's omp was pulled > > in > > > > > >> instead of > > > > > >> > > > > > > GNU OpenMP. > > > > > >> > > > > > > * #6 @Chris: how to maintain the copy of LLVM's > > Openmp? > > > > > >> Should we > > > > > >> > > > > > > update the subrepo regularly? > > > > > >> > > > > > > > > > > > >> > > > > > > My conclusion so far: > > > > > >> > > > > > > > > > > > >> > > > > > > * #1 We should avoid linking two versions of omp if > > > > possible > > > > > >> and > > > > > >> > > > > > > allow users to choose one in the build as we do for > > BLAS. > > > > > >> > > > > > > * #2 For performance reasons and more control vs > > > > different > > > > > >> > > compiler > > > > > >> > > > > > > versions seems it makes indeed sense to keep the LLVM > > > > OpenMP > > > > > >> > > version > > > > > >> > > > > > > in 3rdparty for now. So unless some more data is > > > > gathered, it > > > > > >> makes > > > > > >> > > > > > > sense not to remove it as of now. > > > > > >> > > > > > > * #3 We should provide build options to choose which > > > > openmp > > > > > >> > > library > > > > > >> > > > > > > is to be used from the three options available, > > including > > > > > >> libgomp. > > > > > >> > > > > > > * #4 Refining the build we could also enable OpenMP > > in > > > > mac > > > > > >> without > > > > > >> > > > > > > additional contortions (doesn't work as of today): > > > > > >> > > > > > > https://iscinumpy.gitlab.io/post/omp-on-high-sierra/ > > > > > >> > > > > > > * #5 We should add different omp versions to our > > > > benchmarks > > > > > >> and > > > > > >> > > track > > > > > >> > > > > > > the performance, so this data is available for > > prescribing > > > > > >> the best > > > > > >> > > > > > > build options and for binary releases. > > > > > >> > > > > > > > > > > > >> > > > > > > This is also an interesting related gh issue posted > > in the > > > > > >> mkl-dnn > > > > > >> > > > > > > repository: > > https://github.com/intel/mkl-dnn/issues/230 > > > > > >> > > > > > > > > > > > >> > > > > > > > > > > > >> > > > > > > I don't observe the order of magnitude divergence > > > > reported by > > > > > >> > > Chris in > > > > > >> > > > > > > vanilla Ubuntu 18.04 in samples / s but the full > > training > > > > > >> finishes > > > > > >> > > > > > > indeed faster with the OMP from 3rdparty (LLVM > > openmp) vs > > > > > >> libgomp. > > > > > >> > > > > > > > > > > > >> > > > > > > There's also differences in training time when using > > MKL > > > > and > > > > > >> the , > > > > > >> > > > > > > it's actually a bit slower, I don't know if it's > > related > > > > to > > > > > >> OMP. > > > > > >> > > > > > > > > > > > >> > > > > > > gcc version 7.4.0 (Ubuntu 7.4.0-1ubuntu1~18.04.1) > > > > > >> > > > > > > > > > > > >> > > > > > > Anton's branch: g...@github.com: > > lebeg/incubator-mxnet.git > > > > > >> branch > > > > > >> > > > > 'omp' > > > > > >> > > > > > > (py3_venv) piotr@ec2 cpu:0: ~/mxnet_openmp [omp]> ldd > > > > > >> > > > > > > build/libmxnet.so |grep -i omp > > > > > >> > > > > > > libgomp.so.1 => > > > > /usr/lib/x86_64-linux-gnu/libgomp.so.1 > > > > > >> > > > > > > (0x00007fd99a51d000) > > > > > >> > > > > > > > > > > > >> > > > > > > time python train_mnist.py > > > > > >> > > > > > > > > > > > >> > > > > > > INFO:root:Epoch[18] Validation-accuracy=0.984176 > > > > > >> > > > > > > INFO:root:Epoch[19] Batch [0-100] Speed: > > 41617.00 > > > > > >> samples/sec > > > > > >> > > > > > > accuracy=1.000000 > > > > > >> > > > > > > INFO:root:Epoch[19] Batch [100-200] Speed: > > 47990.69 > > > > > >> samples/sec > > > > > >> > > > > > > accuracy=0.999531 > > > > > >> > > > > > > INFO:root:Epoch[19] Batch [200-300] Speed: > > 47517.01 > > > > > >> samples/sec > > > > > >> > > > > > > accuracy=0.999687 > > > > > >> > > > > > > INFO:root:Epoch[19] Batch [300-400] Speed: > > 47430.53 > > > > > >> samples/sec > > > > > >> > > > > > > accuracy=1.000000 > > > > > >> > > > > > > INFO:root:Epoch[19] Batch [400-500] Speed: > > 47649.77 > > > > > >> samples/sec > > > > > >> > > > > > > accuracy=0.999687 > > > > > >> > > > > > > INFO:root:Epoch[19] Batch [500-600] Speed: > > 51708.12 > > > > > >> samples/sec > > > > > >> > > > > > > accuracy=0.999687 > > > > > >> > > > > > > INFO:root:Epoch[19] Batch [600-700] Speed: > > 57228.63 > > > > > >> samples/sec > > > > > >> > > > > > > accuracy=0.999375 > > > > > >> > > > > > > INFO:root:Epoch[19] Batch [700-800] Speed: > > 50887.85 > > > > > >> samples/sec > > > > > >> > > > > > > accuracy=0.999844 > > > > > >> > > > > > > INFO:root:Epoch[19] Batch [800-900] Speed: > > 53947.98 > > > > > >> samples/sec > > > > > >> > > > > > > accuracy=0.999531 > > > > > >> > > > > > > INFO:root:Epoch[19] Train-accuracy=0.999717 > > > > > >> > > > > > > INFO:root:Epoch[19] Time cost=1.219 > > > > > >> > > > > > > INFO:root:Epoch[19] Validation-accuracy=0.983977 > > > > > >> > > > > > > 1011.98user 26.78system 0:31.54elapsed 3292%CPU > > > > > >> (0avgtext+0avgdata > > > > > >> > > > > > > 1146052maxresident)k > > > > > >> > > > > > > 0inputs+0outputs (0major+3496364minor)pagefaults > > 0swaps > > > > > >> > > > > > > > > > > > >> > > > > > > Master, MKL ON: > > > > > >> > > > > > > > > > > > >> > > > > > > (py3_venv) piotr@ec2 cpu:1: > > ~/m/e/image-classification > > > > > >> [master]> > > > > > >> > > ldd > > > > > >> > > > > > > ../../build/libmxnet.so | grep -i omp > > > > > >> > > > > > > libomp.so => > > > > > >> > > > > > > > > > > > >> > > > > > > /home/piotr/mxnet_master/build/3rdparty/openmp/runtime/src/libomp.so > > > > > >> > > > > > > (0x00007f05ba38f000) > > > > > >> > > > > > > libiomp5.so => > > > > > >> > > > > > > > > > > > >> > > > > > > > > > > > >> > > > > > > > > > >> > > > > > > > >> > > > > > > /home/piotr/mxnet_master/build/mklml/mklml_lnx_2019.0.5.20190502/lib/libiomp5.so > > > > > >> > > > > > > (0x00007f05b09f4000) > > > > > >> > > > > > > > > > > > >> > > > > > > INFO:root:Epoch[18] Validation-accuracy=0.982484 > > > > > >> > > > > > > INFO:root:Epoch[19] Batch [0-100] Speed: > > 36651.63 > > > > > >> samples/sec > > > > > >> > > > > > > accuracy=0.999691 > > > > > >> > > > > > > INFO:root:Epoch[19] Batch [100-200] Speed: > > 45093.98 > > > > > >> samples/sec > > > > > >> > > > > > > accuracy=0.999844 > > > > > >> > > > > > > INFO:root:Epoch[19] Batch [200-300] Speed: > > 45146.84 > > > > > >> samples/sec > > > > > >> > > > > > > accuracy=0.999687 > > > > > >> > > > > > > INFO:root:Epoch[19] Batch [300-400] Speed: > > 45119.90 > > > > > >> samples/sec > > > > > >> > > > > > > accuracy=0.999687 > > > > > >> > > > > > > INFO:root:Epoch[19] Batch [400-500] Speed: > > 44998.96 > > > > > >> samples/sec > > > > > >> > > > > > > accuracy=0.999531 > > > > > >> > > > > > > INFO:root:Epoch[19] Batch [500-600] Speed: > > 45072.25 > > > > > >> samples/sec > > > > > >> > > > > > > accuracy=0.999844 > > > > > >> > > > > > > INFO:root:Epoch[19] Batch [600-700] Speed: > > 44969.79 > > > > > >> samples/sec > > > > > >> > > > > > > accuracy=0.999844 > > > > > >> > > > > > > INFO:root:Epoch[19] Batch [700-800] Speed: > > 44962.78 > > > > > >> samples/sec > > > > > >> > > > > > > accuracy=0.999844 > > > > > >> > > > > > > INFO:root:Epoch[19] Batch [800-900] Speed: > > 44945.47 > > > > > >> samples/sec > > > > > >> > > > > > > accuracy=0.999375 > > > > > >> > > > > > > INFO:root:Epoch[19] Train-accuracy=0.999717 > > > > > >> > > > > > > INFO:root:Epoch[19] Time cost=1.367 > > > > > >> > > > > > > INFO:root:Epoch[19] Validation-accuracy=0.982783 > > > > > >> > > > > > > 854.97user 847.21system 0:41.44elapsed 4106%CPU > > > > > >> (0avgtext+0avgdata > > > > > >> > > > > > > 1154348maxresident)k > > > > > >> > > > > > > 0inputs+0outputs (0major+3624361minor)pagefaults > > 0swaps > > > > > >> > > > > > > > > > > > >> > > > > > > > > > > > >> > > > > > > MKL OFF: > > > > > >> > > > > > > (py3_venv) piotr@ec2 cpu:0: ~/mxnet_master [master]> > > > > grep -i > > > > > >> MKL > > > > > >> > > > > > > cmake_options.yml > > > > > >> > > > > > > USE_MKL_IF_AVAILABLE: "OFF" # Use MKL if found > > > > > >> > > > > > > USE_MKLML_MKL: "OFF" # Use MKLDNN variant of MKL (if > > MKL > > > > > >> found) IF > > > > > >> > > > > > > USE_MKL_IF_AVAILABLE AND (NOT APPLE) > > > > > >> > > > > > > USE_MKLDNN: "OFF" # Use MKLDNN variant of MKL (if MKL > > > > found) > > > > > >> IF > > > > > >> > > > > > > USE_MKL_IF_AVAILABLE AND (NOT APPLE) > > > > > >> > > > > > > (py3_venv) piotr@ec2 cpu:0: ~/mxnet_master [master]> > > ldd > > > > > >> > > > > > > build/libmxnet.so |grep -i omp > > > > > >> > > > > > > libomp.so => > > > > > >> > > > > > > > > > > > >> > > > > > > /home/piotr/mxnet_master/build/3rdparty/openmp/runtime/src/libomp.so > > > > > >> > > > > > > (0x00007fb720c54000) > > > > > >> > > > > > > > > > > > >> > > > > > > INFO:root:Epoch[18] Validation-accuracy=0.983479 > > > > > >> > > > > > > INFO:root:Epoch[19] Batch [0-100] Speed: > > 46784.02 > > > > > >> samples/sec > > > > > >> > > > > > > accuracy=1.000000 > > > > > >> > > > > > > INFO:root:Epoch[19] Batch [100-200] Speed: > > 48824.29 > > > > > >> samples/sec > > > > > >> > > > > > > accuracy=0.999687 > > > > > >> > > > > > > INFO:root:Epoch[19] Batch [200-300] Speed: > > 49190.31 > > > > > >> samples/sec > > > > > >> > > > > > > accuracy=0.999687 > > > > > >> > > > > > > INFO:root:Epoch[19] Batch [300-400] Speed: > > 51518.77 > > > > > >> samples/sec > > > > > >> > > > > > > accuracy=0.999844 > > > > > >> > > > > > > INFO:root:Epoch[19] Batch [400-500] Speed: > > 51551.62 > > > > > >> samples/sec > > > > > >> > > > > > > accuracy=0.999844 > > > > > >> > > > > > > INFO:root:Epoch[19] Batch [500-600] Speed: > > 49026.35 > > > > > >> samples/sec > > > > > >> > > > > > > accuracy=0.999844 > > > > > >> > > > > > > INFO:root:Epoch[19] Batch [600-700] Speed: > > 49002.46 > > > > > >> samples/sec > > > > > >> > > > > > > accuracy=0.999375 > > > > > >> > > > > > > INFO:root:Epoch[19] Batch [700-800] Speed: > > 48980.55 > > > > > >> samples/sec > > > > > >> > > > > > > accuracy=0.999687 > > > > > >> > > > > > > INFO:root:Epoch[19] Batch [800-900] Speed: > > 47402.56 > > > > > >> samples/sec > > > > > >> > > > > > > accuracy=0.999844 > > > > > >> > > > > > > INFO:root:Epoch[19] Train-accuracy=0.999767 > > > > > >> > > > > > > INFO:root:Epoch[19] Time cost=1.259 > > > > > >> > > > > > > INFO:root:Epoch[19] Validation-accuracy=0.983181 > > > > > >> > > > > > > 755.36user 754.94system 0:35.89elapsed 4207%CPU > > > > > >> (0avgtext+0avgdata > > > > > >> > > > > > > 1147008maxresident)k > > > > > >> > > > > > > 0inputs+3112outputs (0major+3568826minor)pagefaults > > 0swaps > > > > > >> > > > > > > > > > > > >> > > > > > > Let me know what you think. > > > > > >> > > > > > > > > > > > >> > > > > > > Link to the original PR: > > > > > >> > > > > > > https://github.com/apache/incubator-mxnet/pull/12160 > > > > > >> > > > > > > > > > > > >> > > > > > > Thanks. > > > > > >> > > > > > > > > > > > >> > > > > > > On Wed, Jun 19, 2019 at 5:35 PM kellen sunderland > > > > > >> > > > > > > <kellen.sunderl...@gmail.com> wrote: > > > > > >> > > > > > > > > > > > > >> > > > > > > > "if you’re linking in two then you’re doing > > something > > > > > >> wrong." > > > > > >> > > > > Correct, > > > > > >> > > > > > > > that's one thing I believe we've got consensus on. > > So > > > > > >> let's call > > > > > >> > > > > that > > > > > >> > > > > > > out > > > > > >> > > > > > > > as a bug to be fixed. > > > > > >> > > > > > > > > > > > > >> > > > > > > > Let's move forward with some reproducible numbers > > and > > > > then > > > > > >> > > discuss > > > > > >> > > > > the > > > > > >> > > > > > > pros > > > > > >> > > > > > > > / cons of which particular OMP implementation we > > should > > > > use. > > > > > >> > > > > > > > > > > > > >> > > > > > > > On Wed, Jun 19, 2019 at 3:06 PM Pedro Larroy < > > > > > >> > > > > > > pedro.larroy.li...@gmail.com> > > > > > >> > > > > > > > wrote: > > > > > >> > > > > > > > > > > > > >> > > > > > > > > Hi Chris > > > > > >> > > > > > > > > > > > > > >> > > > > > > > > I would ask you to have a bit of patience and > > help us > > > > > >> with your > > > > > >> > > > > > > > > experience in this matter. Nobody is ignoring > > > > anything, I > > > > > >> > > think we > > > > > >> > > > > are > > > > > >> > > > > > > > > individually gathering feedbacks and trying to > > > > understand > > > > > >> the > > > > > >> > > > > multiple > > > > > >> > > > > > > > > contributions done to this topic including yours, > > > > then go > > > > > >> step > > > > > >> > > by > > > > > >> > > > > > > > > step, understand what is going on and run > > experiments > > > > and > > > > > >> > > report > > > > > >> > > > > back > > > > > >> > > > > > > > > to the list or the corresponding github item. It > > was > > > > > >> suggested > > > > > >> > > by > > > > > >> > > > > > > > > Kellen to prepare some containers, this takes > > effort. > > > > > >> > > > > > > > > > > > > > >> > > > > > > > > Regarding your final comment, most of us also have > > > > many > > > > > >> other > > > > > >> > > > > things > > > > > >> > > > > > > > > to do and responsibilities even if our daytime > > jobs > > > > might > > > > > >> > > involve > > > > > >> > > > > > > > > MXNet in some form or another. I think that's > > part of > > > > the > > > > > >> > > privilege > > > > > >> > > > > > > > > and responsibility of working close with an open > > > > source > > > > > >> > > project and > > > > > >> > > > > > > > > the magic of collaboration across organizations. > > Let's > > > > > >> all be > > > > > >> > > > > patient > > > > > >> > > > > > > > > and take some time to understand and reason about > > this > > > > > >> topic > > > > > >> > > which > > > > > >> > > > > is > > > > > >> > > > > > > > > not simple. Since we decided to step back and > > gather > > > > more > > > > > >> data > > > > > >> > > > > let's > > > > > >> > > > > > > > > take time and do it properly. > > > > > >> > > > > > > > > > > > > > >> > > > > > > > > Personally I hope to find time to look again into > > this > > > > > >> issue > > > > > >> > > before > > > > > >> > > > > > > > > the end of the week. > > > > > >> > > > > > > > > > > > > > >> > > > > > > > > Thanks. > > > > > >> > > > > > > > > > > > > > >> > > > > > > > > Pedro. > > > > > >> > > > > > > > > > > > > > >> > > > > > > > > On Wed, Jun 19, 2019 at 2:43 PM Chris Olivier < > > > > > >> > > > > cjolivie...@apache.org> > > > > > >> > > > > > > > > wrote: > > > > > >> > > > > > > > > > > > > > > >> > > > > > > > > > if you’re linking in two then you’re doing > > something > > > > > >> wrong. > > > > > >> > > You > > > > > >> > > > > can > > > > > >> > > > > > > see > > > > > >> > > > > > > > > by > > > > > >> > > > > > > > > > my email yesterday that only one is linked in. > > This > > > > is > > > > > >> also > > > > > >> > > the > > > > > >> > > > > case > > > > > >> > > > > > > with > > > > > >> > > > > > > > > > the mkl version built by the Makefile — only the > > > > Intel > > > > > >> OMP > > > > > >> > > > > library is > > > > > >> > > > > > > > > used > > > > > >> > > > > > > > > > (no libgomp). > > > > > >> > > > > > > > > > > > > > > >> > > > > > > > > > That being said, Do you have clear evidence that > > > > using > > > > > >> Intel > > > > > >> > > OMP > > > > > >> > > > > is > > > > > >> > > > > > > both > > > > > >> > > > > > > > > > problematic and the situation isn’t fixable? > > The > > > > > >> burden of > > > > > >> > > > > proof is > > > > > >> > > > > > > on > > > > > >> > > > > > > > > the > > > > > >> > > > > > > > > > ones requesting the change — it is not my > > > > > >> responsibility to > > > > > >> > > > > justify > > > > > >> > > > > > > the > > > > > >> > > > > > > > > > current state. There must be something > > “terrible” > > > > and > > > > > >> > > unfixable > > > > > >> > > > > to > > > > > >> > > > > > > > > justify > > > > > >> > > > > > > > > > a change. I have seen no proof of this in all > > this > > > > > >> time. > > > > > >> > > > > > > > > > > > > > > >> > > > > > > > > > On a side note, I mentioned a couple of things > > in my > > > > > >> email > > > > > >> > > > > yesterday > > > > > >> > > > > > > that > > > > > >> > > > > > > > > > still are not being responded to (they were also > > > > > >> ignored in > > > > > >> > > the > > > > > >> > > > > last > > > > > >> > > > > > > > > > incarnation of this “discussion” — I have much > > > > > >> experience in > > > > > >> > > this > > > > > >> > > > > > > matter > > > > > >> > > > > > > > > to > > > > > >> > > > > > > > > > assume “discussion” is a waste of my time, > > seeing > > > > and I > > > > > >> am > > > > > >> > > not > > > > > >> > > > > paid > > > > > >> > > > > > > to > > > > > >> > > > > > > > > > “work on” mxnet like y’all are). > > > > > >> > > > > > > > > > > > > > > >> > > > > > > > > > -C > > > > > >> > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > >> > > > > > > > > > On Wed, Jun 19, 2019 at 10:28 AM kellen > > sunderland < > > > > > >> > > > > > > > > > kellen.sunderl...@gmail.com> wrote: > > > > > >> > > > > > > > > > > > > > > >> > > > > > > > > > > I've also quite often seen two versions of > > OpenMP > > > > > >> linked. > > > > > >> > > I > > > > > >> > > > > think > > > > > >> > > > > > > we > > > > > >> > > > > > > > > can > > > > > >> > > > > > > > > > > all agree we probably want to avoid linking > > in two > > > > > >> > > libraries > > > > > >> > > > > that > > > > > >> > > > > > > do > > > > > >> > > > > > > > > > > effectively the same thing. > > > > > >> > > > > > > > > > > > > > > > >> > > > > > > > > > > The performance questions should be fairly > > > > straight > > > > > >> > > forward to > > > > > >> > > > > > > > > demonstrate > > > > > >> > > > > > > > > > > right? Could we just collaborate on a few > > minimal > > > > > >> > > Dockerfiles > > > > > >> > > > > that > > > > > >> > > > > > > > > show > > > > > >> > > > > > > > > > > (or don't show) Intel OpenMP performance > > speedups > > > > > >> with the > > > > > >> > > > > > > workloads > > > > > >> > > > > > > > > Chris > > > > > >> > > > > > > > > > > is referencing? > > > > > >> > > > > > > > > > > > > > > > >> > > > > > > > > > > On Wed, Jun 19, 2019 at 4:44 AM Tsukrov, > > > > Stanislav < > > > > > >> > > > > > > > > > > stanislav.tsuk...@gmail.com> wrote: > > > > > >> > > > > > > > > > > > > > > > >> > > > > > > > > > > > Hi, Chris! > > > > > >> > > > > > > > > > > > > > > > > >> > > > > > > > > > > > Stas here - I've gathered that performance > > data. > > > > > >> > > > > > > > > > > > Sure thing, I can be wrong, but please > > > > elaborate a > > > > > >> bit on > > > > > >> > > > > what > > > > > >> > > > > > > we are > > > > > >> > > > > > > > > > > > missing. > > > > > >> > > > > > > > > > > > Be assured, intentional misdirection was > > never a > > > > > >> case. > > > > > >> > > > > > > > > > > > > > > > > >> > > > > > > > > > > > Thanks a lot for being constructive. > > > > > >> > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > Turning Intel OMP on and off (and MKL as > > well, > > > > > >> since it > > > > > >> > > > > tends > > > > > >> > > > > > > to > > > > > >> > > > > > > > > pull > > > > > >> > > > > > > > > > > in > > > > > >> > > > > > > > > > > > omp, depending which one is linked in). > > > > > >> > > > > > > > > > > > > > > > > >> > > > > > > > > > > > We never ever considered turning MKL off. We > > > > are on > > > > > >> the > > > > > >> > > same > > > > > >> > > > > page > > > > > >> > > > > > > > > here - > > > > > >> > > > > > > > > > > > MKL is crucial for the performance. > > > > > >> > > > > > > > > > > > Why should we? There's a GOMP-linked > > version of > > > > MKL, > > > > > >> > > that we > > > > > >> > > > > can > > > > > >> > > > > > > use. > > > > > >> > > > > > > > > > > > > > > > > >> > > > > > > > > > > > What we did - we measured, if using > > compilers > > > > > >> default > > > > > >> > > OpenMP > > > > > >> > > > > > > > > > > > implementation instead of referenced source > > code > > > > > >> > > > > distribution of > > > > > >> > > > > > > > > OpenMP > > > > > >> > > > > > > > > > > > makes anything slower. > > > > > >> > > > > > > > > > > > We have found the impact to be hardly > > > > measurable. > > > > > >> > > > > > > > > > > > The difference between GOMP and iOMP is <5% > > on > > > > our > > > > > >> > > > > benchmarks, > > > > > >> > > > > > > most > > > > > >> > > > > > > > > of > > > > > >> > > > > > > > > > > the > > > > > >> > > > > > > > > > > > time less than that. > > > > > >> > > > > > > > > > > > > > > > > >> > > > > > > > > > > > We just suggest to simplify the build of > > mxnet, > > > > by > > > > > >> > > removing > > > > > >> > > > > the > > > > > >> > > > > > > > > > > > unnecessary dependency. > > > > > >> > > > > > > > > > > > > > > > > >> > > > > > > > > > > > During that we discovered for example the > > > > following > > > > > >> > > amazing > > > > > >> > > > > > > issue: > > > > > >> > > > > > > > > > > > > > > > > >> https://github.com/apache/incubator-mxnet/issues/14087 > > > > > >> > > > > > > > > > > > > > > > > >> > > > > > > > > > > > Best Regards > > > > > >> > > > > > > > > > > > > > > > > >> > > > > > > > > > > > Stas > > > > > >> > > > > > > > > > > > > > > > > >> > > > > > > > > > > > On 18.06.19, 18:24, "Chris Olivier" < > > > > > >> > > cjolivie...@gmail.com> > > > > > >> > > > > > > wrote: > > > > > >> > > > > > > > > > > > > > > > > >> > > > > > > > > > > > I am very reluctant to feed the trolls > > > > again, > > > > > >> and > > > > > >> > > this > > > > > >> > > > > will > > > > > >> > > > > > > be > > > > > >> > > > > > > > > teh > > > > > >> > > > > > > > > > > last > > > > > >> > > > > > > > > > > > time I address Pedro or Anton on the > > > > subject, > > > > > >> but > > > > > >> > > since I > > > > > >> > > > > > > think > > > > > >> > > > > > > > > the > > > > > >> > > > > > > > > > > > numbers > > > > > >> > > > > > > > > > > > being presented are incorrect (either > > by te > > > > > >> builders > > > > > >> > > not > > > > > >> > > > > > > really > > > > > >> > > > > > > > > > > > understanding what they are building, or > > > > > >> possibly > > > > > >> > > > > intentional > > > > > >> > > > > > > > > > > > misdirection): > > > > > >> > > > > > > > > > > > > > > > > >> > > > > > > > > > > > Turning Intel OMP on and off (and MKL as > > > > well, > > > > > >> since > > > > > >> > > it > > > > > >> > > > > > > tends to > > > > > >> > > > > > > > > pull > > > > > >> > > > > > > > > > > > in > > > > > >> > > > > > > > > > > > omp, depending which one is linked in). > > > > > >> > > > > > > > > > > > There is a HUGE difference. This is > > > > consistent > > > > > >> with > > > > > >> > > my > > > > > >> > > > > > > > > experience > > > > > >> > > > > > > > > > > > before > > > > > >> > > > > > > > > > > > when it was added. > > > > > >> > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > >> > > > > > > > > > > > default mnist: > > > > > >> > > > > > > > > > > > > > > > > >> > > > > > > > > > > > python > > > > > >> ../example/image-classification/train_mnist.py > > > > > >> > > > > > > > > > > > INFO:root:start with arguments > > > > > >> > > Namespace(add_stn=False, > > > > > >> > > > > > > > > > > batch_size=64, > > > > > >> > > > > > > > > > > > disp_batches=100, dtype='float32', > > > > > >> gc_threshold=0.5, > > > > > >> > > > > > > > > gc_type='none', > > > > > >> > > > > > > > > > > > gpus=None, image_shape='1, 28, 28', > > > > > >> > > > > initializer='default', > > > > > >> > > > > > > > > > > > kv_store='device', load_epoch=None, > > loss='', > > > > > >> lr=0.05, > > > > > >> > > > > > > > > lr_factor=0.1, > > > > > >> > > > > > > > > > > > lr_step_epochs='10', macrobatch_size=0, > > > > > >> > > > > model_prefix=None, > > > > > >> > > > > > > > > mom=0.9, > > > > > >> > > > > > > > > > > > monitor=0, network='mlp', > > num_classes=10, > > > > > >> > > num_epochs=20, > > > > > >> > > > > > > > > > > > num_examples=60000, num_layers=None, > > > > > >> optimizer='sgd', > > > > > >> > > > > > > > > > > > profile_server_suffix='', > > > > > >> profile_worker_suffix='', > > > > > >> > > > > > > > > save_period=1, > > > > > >> > > > > > > > > > > > test_io=0, top_k=0, warmup_epochs=5, > > > > > >> > > > > > > warmup_strategy='linear', > > > > > >> > > > > > > > > > > > wd=0.0001) > > > > > >> > > > > > > > > > > > > > > > > >> > > > > > > > > > > > INTEL OMP: > > > > > >> > > > > > > > > > > > > > > > > >> > > > > > > > > > > > ldd libmxnet.so | grep omp > > > > > >> > > > > > > > > > > > libomp.so => > > > > > >> > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > >> > > > > > > > > > >> > > /home/chris/src/mxnet/cmake_omp/3rdparty/openmp/runtime/src/libomp.so > > > > > >> > > > > > > > > > > > (0x00007f978fde7000) > > > > > >> > > > > > > > > > > > > > > > > >> > > > > > > > > > > > :root:Epoch[0] Batch [0-100] > > Speed: > > > > > >> 31548.09 > > > > > >> > > > > > > samples/sec > > > > > >> > > > > > > > > > > > accuracy=0.780012 > > > > > >> > > > > > > > > > > > INFO:root:Epoch[0] Batch [100-200] > > > > Speed: > > > > > >> > > 16073.21 > > > > > >> > > > > > > > > samples/sec > > > > > >> > > > > > > > > > > > accuracy=0.920469 > > > > > >> > > > > > > > > > > > INFO:root:Epoch[0] Batch [200-300] > > > > Speed: > > > > > >> > > 19075.91 > > > > > >> > > > > > > > > samples/sec > > > > > >> > > > > > > > > > > > accuracy=0.928281 > > > > > >> > > > > > > > > > > > INFO:root:Epoch[0] Batch [300-400] > > > > Speed: > > > > > >> > > 23211.36 > > > > > >> > > > > > > > > samples/sec > > > > > >> > > > > > > > > > > > accuracy=0.942813 > > > > > >> > > > > > > > > > > > INFO:root:Epoch[0] Batch [400-500] > > > > Speed: > > > > > >> > > 22139.79 > > > > > >> > > > > > > > > samples/sec > > > > > >> > > > > > > > > > > > accuracy=0.938750 > > > > > >> > > > > > > > > > > > INFO:root:Epoch[0] Batch [500-600] > > > > Speed: > > > > > >> > > 23225.52 > > > > > >> > > > > > > > > samples/sec > > > > > >> > > > > > > > > > > > accuracy=0.946562 > > > > > >> > > > > > > > > > > > INFO:root:Epoch[0] Batch [600-700] > > > > Speed: > > > > > >> > > 19547.41 > > > > > >> > > > > > > > > samples/sec > > > > > >> > > > > > > > > > > > accuracy=0.953281 > > > > > >> > > > > > > > > > > > INFO:root:Epoch[0] Batch [700-800] > > > > Speed: > > > > > >> > > 24111.73 > > > > > >> > > > > > > > > samples/sec > > > > > >> > > > > > > > > > > > accuracy=0.951562 > > > > > >> > > > > > > > > > > > INFO:root:Epoch[0] Batch [800-900] > > > > Speed: > > > > > >> > > 13959.88 > > > > > >> > > > > > > > > samples/sec > > > > > >> > > > > > > > > > > > accuracy=0.957500 > > > > > >> > > > > > > > > > > > INFO:root:Epoch[0] > > Train-accuracy=0.925423 > > > > > >> > > > > > > > > > > > INFO:root:Epoch[0] Time cost=3.806 > > > > > >> > > > > > > > > > > > INFO:root:Epoch[0] > > > > Validation-accuracy=0.962580 > > > > > >> > > > > > > > > > > > INFO:root:Epoch[1] Batch [0-100] > > > > Speed: > > > > > >> > > 24560.21 > > > > > >> > > > > > > > > samples/sec > > > > > >> > > > > > > > > > > > accuracy=0.968131 > > > > > >> > > > > > > > > > > > INFO:root:Epoch[1] Batch [100-200] > > > > Speed: > > > > > >> > > 23457.03 > > > > > >> > > > > > > > > samples/sec > > > > > >> > > > > > > > > > > > accuracy=0.966250 > > > > > >> > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > >> > > > > > > > > > > > LIBGOMP: > > > > > >> > > > > > > > > > > > > > > > > >> > > > > > > > > > > > ldd libmxnet.so | grep omp > > > > > >> > > > > > > > > > > > libgomp.so.1 => > > > > > >> > > > > > > /usr/lib/x86_64-linux-gnu/libgomp.so.1 > > > > > >> > > > > > > > > > > > (0x00007f25c25dd000) > > > > > >> > > > > > > > > > > > > > > > > >> > > > > > > > > > > > INFO:root:Epoch[0] Batch [0-100] > > > > Speed: > > > > > >> > > 1731.01 > > > > > >> > > > > > > > > samples/sec > > > > > >> > > > > > > > > > > > accuracy=0.782488 > > > > > >> > > > > > > > > > > > INFO:root:Epoch[0] Batch [100-200] > > > > Speed: > > > > > >> > > 3551.32 > > > > > >> > > > > > > > > samples/sec > > > > > >> > > > > > > > > > > > accuracy=0.907813 > > > > > >> > > > > > > > > > > > INFO:root:Epoch[0] Batch [200-300] > > > > Speed: > > > > > >> > > 1991.00 > > > > > >> > > > > > > > > samples/sec > > > > > >> > > > > > > > > > > > accuracy=0.927188 > > > > > >> > > > > > > > > > > > INFO:root:Epoch[0] Batch [300-400] > > > > Speed: > > > > > >> > > 2175.45 > > > > > >> > > > > > > > > samples/sec > > > > > >> > > > > > > > > > > > accuracy=0.937969 > > > > > >> > > > > > > > > > > > INFO:root:Epoch[0] Batch [400-500] > > > > Speed: > > > > > >> > > 1644.95 > > > > > >> > > > > > > > > samples/sec > > > > > >> > > > > > > > > > > > accuracy=0.942187 > > > > > >> > > > > > > > > > > > INFO:root:Epoch[0] Batch [500-600] > > > > Speed: > > > > > >> > > 6444.58 > > > > > >> > > > > > > > > samples/sec > > > > > >> > > > > > > > > > > > accuracy=0.950156 > > > > > >> > > > > > > > > > > > INFO:root:Epoch[0] Batch [600-700] > > > > Speed: > > > > > >> > > 7842.16 > > > > > >> > > > > > > > > samples/sec > > > > > >> > > > > > > > > > > > accuracy=0.947969 > > > > > >> > > > > > > > > > > > INFO:root:Epoch[0] Batch [700-800] > > > > Speed: > > > > > >> > > 9412.07 > > > > > >> > > > > > > > > samples/sec > > > > > >> > > > > > > > > > > > accuracy=0.953750 > > > > > >> > > > > > > > > > > > INFO:root:Epoch[0] Batch [800-900] > > > > Speed: > > > > > >> > > 12707.58 > > > > > >> > > > > > > > > samples/sec > > > > > >> > > > > > > > > > > > accuracy=0.953125 > > > > > >> > > > > > > > > > > > > > > > > >> > > > > > > > > > > > That being said, there's other issued > > beyond > > > > > >> speed. > > > > > >> > > The > > > > > >> > > > > > > DEFAULT > > > > > >> > > > > > > > > > > build > > > > > >> > > > > > > > > > > > from > > > > > >> > > > > > > > > > > > makefile (not CMake) uses Intel OMP mkl > > (I > > > > > >> showed > > > > > >> > > > > before) and > > > > > >> > > > > > > > > > > > mysteriously > > > > > >> > > > > > > > > > > > it has no issues? This seems highly > > > > suspicious. > > > > > >> > > All I > > > > > >> > > > > see > > > > > >> > > > > > > is a > > > > > >> > > > > > > > > lot > > > > > >> > > > > > > > > > > of > > > > > >> > > > > > > > > > > > hand-waving and conjecture and pointing > > to > > > > > >> > > StackOverflow > > > > > >> > > > > > > posts > > > > > >> > > > > > > > > made > > > > > >> > > > > > > > > > > by > > > > > >> > > > > > > > > > > > people who may be of questionable > > pedigree > > > > to > > > > > >> begin > > > > > >> > > with. > > > > > >> > > > > > > This > > > > > >> > > > > > > > > > > smells > > > > > >> > > > > > > > > > > > of a > > > > > >> > > > > > > > > > > > Pedro-ego-fight rather than one of > > purely > > > > > >> technical > > > > > >> > > > > merit. > > > > > >> > > > > > > > > Also, if > > > > > >> > > > > > > > > > > > one > > > > > >> > > > > > > > > > > > knows how OMP works, they would be very > > > > > >> suspicious > > > > > >> > > of > > > > > >> > > > > the > > > > > >> > > > > > > > > > > > "intermittent > > > > > >> > > > > > > > > > > > hangs" claim -- that's probably just > > broken > > > > race > > > > > >> > > > > conditions > > > > > >> > > > > > > > > elsewhere > > > > > >> > > > > > > > > > > > until > > > > > >> > > > > > > > > > > > proven differently. It'd tend freeze > > on the > > > > > >> first > > > > > >> > > use if > > > > > >> > > > > > > > > something > > > > > >> > > > > > > > > > > is > > > > > >> > > > > > > > > > > > wrong (try using libgomp after a fork > > and > > > > see), > > > > > >> since > > > > > >> > > > > worker > > > > > >> > > > > > > > > threads" > > > > > >> > > > > > > > > > > > wouldn't be assigned/joined properly. > > > > IntelOMP > > > > > >> is > > > > > >> > > > > faster, > > > > > >> > > > > > > but > > > > > >> > > > > > > > > also > > > > > >> > > > > > > > > > > has > > > > > >> > > > > > > > > > > > other advantages, such as allowing OMP > > > > after a > > > > > >> fork. > > > > > >> > > > > > > > > > > > > > > > > >> > > > > > > > > > > > I actually addressed a lot of issues and > > > > ask for > > > > > >> > > > > > > clarification > > > > > >> > > > > > > > > in the > > > > > >> > > > > > > > > > > > original PR's way back when, but > > they're all > > > > > >> just > > > > > >> > > > > ignored. > > > > > >> > > > > > > > > > > > > > > > > >> > > > > > > > > > > > -Chris > > > > > >> > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > >> > > > > > > > > > > > >> > > > > > > > > > >> > > > > > > > >> > > > > > > > > > > > > > > > > > >