i have not seen any proof that any crashes are due to llvm openmp usage. On Thu, Nov 22, 2018 at 2:35 AM Anton Chernov <mecher...@gmail.com> wrote:
> Dear MXNet community, > > I propose to raise the minimal required cmake version that is needed to > build MXNet to 3.10 which was tagged on March 16 2018 [1]. > > The effort of repairing cmake scripts in general is targeting to deprecate > make and maintain only 1 build system. > > *Need* > > The build system is the foundation of every software project. It's quality > is directly impacting the quality of the project. The MXNet build system is > fragile, partially broken and not maintained. > > Users of MXNet and developers are confused by the fact that 2 build systems > exist at the same time: make and CMake. > > The main functional areas which are impacted by the current state of the > cmake files are: > > *OpenMP* > The current CMake files mix OpenMP libraries from different compliers which > is undefined behaviour. It leads to indeterministic crashes on some > platforms. Build and deployment are very hard. No evidence exists that > proves that there is any benefit of having llvm OpenMP library as a > submodule in MXNet. > > *BLAS and LAPACK* > Basic math library usage is mixed up. It is hard and confusing to configure > and the choosing logic of the most optimal library is not present. MKL and > OpenBLAS are intermixed in an unpredictable manner. > > *Profiling* > The profiler is always on even for production release builds, because MXNet > can not be build without it [2]. > > *CUDA* > CUDA is detected by 3 different files in the current cmake scripts and the > choice of those is based on a obscure logic with involves different > versions of cmake and platforms which it's building on > > * CMakeLists.txt > * cmake/FirstClassLangCuda.cmake > * 3rdparty/mshadow/cmake/Cuda.cmake > > > *Confusing and misleading cmake user options* > For example, USE_CUDA / USE_OLDCMAKECUDA. Some of them will do or not do > what they supposed to based on cmake generator version and version of cmake > [3]. > There are currently more than 30 build parameters for MXNet none of them > documented. Some of them not even located in the main CMakeLists.txt file, > for example 'BLAS'. > > > *Issues* > There is a significant amount of github issues related to cmake or build in > general. New tickets are issued frequently. > > * #8702 (https://github.com/apache/incubator-mxnet/issues/8702) > [DISCUSSION] Should we deprecate Makefile and only use CMake? > * #5079 (https://github.com/apache/incubator-mxnet/issues/5079) troubles > building python interface on raspberry pi 3 > * #1722 (https://github.com/apache/incubator-mxnet/issues/1722) problem: > compile mxnet with hdfs > * #11549 (https://github.com/apache/incubator-mxnet/issues/11549) Pip > package can be much faster (OpenCV version?) > * #11417 (https://github.com/apache/incubator-mxnet/issues/11417) > libomp.so > dependency (need REAL fix) > * #8532 (https://github.com/apache/incubator-mxnet/issues/8532) > mxnet-mkl > (v0.12.0) crash when using (conda-installed) numpy with MKL // (indirectly) > * #11131 (https://github.com/apache/incubator-mxnet/issues/11131) > mxnet-cu92 low efficiency // (indirectly) > * #10743 (https://github.com/apache/incubator-mxnet/issues/10743) CUDA > 9.1.xx failed if not set OLDCMAKECUDA on cmake 3.10.3 with unix makefile or > Ninja generator > * #10742 (https://github.com/apache/incubator-mxnet/issues/10742) typo in > cpp-package/CMakeLists.txt > * #10737 (https://github.com/apache/incubator-mxnet/issues/10737) Cmake is > running again when execute make install > * #10543 (https://github.com/apache/incubator-mxnet/issues/10543) Failed > to > build from source when set USE_CPP_PACKAGE = 1, fatal error C1083: unabel > to open file: “mxnet-cpp/op.h”: No such file or directory > * #10217 (https://github.com/apache/incubator-mxnet/issues/10217) Building > with OpenCV causes link errors > * #10175 (https://github.com/apache/incubator-mxnet/issues/10175) MXNet > MKLDNN build dependency/flow discussion > * #10009 (https://github.com/apache/incubator-mxnet/issues/10009) > [CMAKE][IoT] Remove pthread from android_arm64 build > * #9944 (https://github.com/apache/incubator-mxnet/issues/9944) MXNet > MinGW-w64 build error // (indirectly) > * #9868 (https://github.com/apache/incubator-mxnet/issues/9868) MKL and > CMake > * #9516 (https://github.com/apache/incubator-mxnet/issues/9516) cmake > cuda arch issues > * #9105 (https://github.com/apache/incubator-mxnet/issues/9105) > libmxnet.so load path error > * #9096 (https://github.com/apache/incubator-mxnet/issues/9096) MXNet > built with GPerftools crashes > * #8786 (https://github.com/apache/incubator-mxnet/issues/8786) Link > failure on DEBUG=1 (static member symbol not defined) // (indirectly) > * #8729 (https://github.com/apache/incubator-mxnet/issues/8729) Build > amalgamation using a docker // (indirectly) > * #8667 (https://github.com/apache/incubator-mxnet/issues/8667) > Compiler/linker error while trying to build from source on Mac OSX Sierra > 10.12.6 > * #8295 (https://github.com/apache/incubator-mxnet/issues/8295) Building > with cmake - error > * #7852 (https://github.com/apache/incubator-mxnet/issues/7852) Trouble > installing MXNet on Raspberry Pi 3 > * #13303 (https://github.com/apache/incubator-mxnet/issues/13303) > mxnet-cpp > package cross-compilation fails with OSError: "wrong ELF class: ELFCLASS32" > * #13245 (https://github.com/apache/incubator-mxnet/issues/13245) > mxnet::cpp::NDArray::WaitAll() take about 160ms on gtx1080ti // > (indirectly, cmake impact on performance) > * #12849 (https://github.com/apache/incubator-mxnet/issues/12849) > [cmake][cpp-package] Building with cmake does not install the cpp-package > API > * #12568 (https://github.com/apache/incubator-mxnet/issues/12568) > [Scala][macOS] Trying to build from source > * #12134 (https://github.com/apache/incubator-mxnet/issues/12134) why MKL > and MKL-DNN can't be used simultaneously in ChooseBlas.cmake > * #12107 (https://github.com/apache/incubator-mxnet/issues/12107) Faulty > CUDA detection with cmake > * #11769 (https://github.com/apache/incubator-mxnet/issues/11769) > USE_BLAS=MKL fails due to mshadow requiring openblas > * #11563 (https://github.com/apache/incubator-mxnet/issues/11563) > Deprecate > USE_PROFILER from make/cmake > * #10856 (https://github.com/apache/incubator-mxnet/issues/10856) Failed > OpenMP assertion when loading MXNet compiled with DEBUG=1 > * #10742 (https://github.com/apache/incubator-mxnet/issues/10742) typo in > cpp-package/CMakeLists.txt > > > *Approach* > > We are going to iteratively fix and simplify the cmake build system and > once is possible deprecate and remove the make system. This PR's have been > opened so far: > > > * #11148 (https://github.com/apache/incubator-mxnet/pull/11148) > [MXNET-679] > Refactor handling BLAS libraries with cmake > * #12160 (https://github.com/apache/incubator-mxnet/pull/12160) Remove > conflicting llvm OpenMP from cmake builds > * #10564 (https://github.com/apache/incubator-mxnet/pull/10564) Simplified > CUDA language detection in cmake > * #10530 (https://github.com/apache/incubator-mxnet/pull/10530) Jetson > build with cmake and CUDA > > Unfortunately, none of them with any success. The question of updating the > minimal required version was not asked before, so I'm raising it now. > > By upgrading the version we would remove all custom error-prone cmake files > that are related to: CUDA, BLAS and LAPACK. Essentially covering most of > the problems. > > OpenMP and profiling would need to be addressed separately. > > *Benefit* > > Ease of maintaining of MXNet build, clarity for users, quality and > predictability. > > *Alternatives* > > * Leave the situation as is > * Proceed with the make build > > > I would appreciate hearing your thoughts. > > Best > Anton > > [1] https://github.com/Kitware/CMake/releases/tag/v3.10.3 > [2] https://github.com/apache/incubator-mxnet/issues/11563 > [3] > > https://github.com/apache/incubator-mxnet/blob/master/CMakeLists.txt#L46-L57 >