Dear MXNet community, I propose to raise the minimal required cmake version that is needed to build MXNet to 3.10 which was tagged on March 16 2018 [1].
The effort of repairing cmake scripts in general is targeting to deprecate make and maintain only 1 build system. *Need* The build system is the foundation of every software project. It's quality is directly impacting the quality of the project. The MXNet build system is fragile, partially broken and not maintained. Users of MXNet and developers are confused by the fact that 2 build systems exist at the same time: make and CMake. The main functional areas which are impacted by the current state of the cmake files are: *OpenMP* The current CMake files mix OpenMP libraries from different compliers which is undefined behaviour. It leads to indeterministic crashes on some platforms. Build and deployment are very hard. No evidence exists that proves that there is any benefit of having llvm OpenMP library as a submodule in MXNet. *BLAS and LAPACK* Basic math library usage is mixed up. It is hard and confusing to configure and the choosing logic of the most optimal library is not present. MKL and OpenBLAS are intermixed in an unpredictable manner. *Profiling* The profiler is always on even for production release builds, because MXNet can not be build without it [2]. *CUDA* CUDA is detected by 3 different files in the current cmake scripts and the choice of those is based on a obscure logic with involves different versions of cmake and platforms which it's building on * CMakeLists.txt * cmake/FirstClassLangCuda.cmake * 3rdparty/mshadow/cmake/Cuda.cmake *Confusing and misleading cmake user options* For example, USE_CUDA / USE_OLDCMAKECUDA. Some of them will do or not do what they supposed to based on cmake generator version and version of cmake [3]. There are currently more than 30 build parameters for MXNet none of them documented. Some of them not even located in the main CMakeLists.txt file, for example 'BLAS'. *Issues* There is a significant amount of github issues related to cmake or build in general. New tickets are issued frequently. * #8702 (https://github.com/apache/incubator-mxnet/issues/8702) [DISCUSSION] Should we deprecate Makefile and only use CMake? * #5079 (https://github.com/apache/incubator-mxnet/issues/5079) troubles building python interface on raspberry pi 3 * #1722 (https://github.com/apache/incubator-mxnet/issues/1722) problem: compile mxnet with hdfs * #11549 (https://github.com/apache/incubator-mxnet/issues/11549) Pip package can be much faster (OpenCV version?) * #11417 (https://github.com/apache/incubator-mxnet/issues/11417) libomp.so dependency (need REAL fix) * #8532 (https://github.com/apache/incubator-mxnet/issues/8532) mxnet-mkl (v0.12.0) crash when using (conda-installed) numpy with MKL // (indirectly) * #11131 (https://github.com/apache/incubator-mxnet/issues/11131) mxnet-cu92 low efficiency // (indirectly) * #10743 (https://github.com/apache/incubator-mxnet/issues/10743) CUDA 9.1.xx failed if not set OLDCMAKECUDA on cmake 3.10.3 with unix makefile or Ninja generator * #10742 (https://github.com/apache/incubator-mxnet/issues/10742) typo in cpp-package/CMakeLists.txt * #10737 (https://github.com/apache/incubator-mxnet/issues/10737) Cmake is running again when execute make install * #10543 (https://github.com/apache/incubator-mxnet/issues/10543) Failed to build from source when set USE_CPP_PACKAGE = 1, fatal error C1083: unabel to open file: “mxnet-cpp/op.h”: No such file or directory * #10217 (https://github.com/apache/incubator-mxnet/issues/10217) Building with OpenCV causes link errors * #10175 (https://github.com/apache/incubator-mxnet/issues/10175) MXNet MKLDNN build dependency/flow discussion * #10009 (https://github.com/apache/incubator-mxnet/issues/10009) [CMAKE][IoT] Remove pthread from android_arm64 build * #9944 (https://github.com/apache/incubator-mxnet/issues/9944) MXNet MinGW-w64 build error // (indirectly) * #9868 (https://github.com/apache/incubator-mxnet/issues/9868) MKL and CMake * #9516 (https://github.com/apache/incubator-mxnet/issues/9516) cmake cuda arch issues * #9105 (https://github.com/apache/incubator-mxnet/issues/9105) libmxnet.so load path error * #9096 (https://github.com/apache/incubator-mxnet/issues/9096) MXNet built with GPerftools crashes * #8786 (https://github.com/apache/incubator-mxnet/issues/8786) Link failure on DEBUG=1 (static member symbol not defined) // (indirectly) * #8729 (https://github.com/apache/incubator-mxnet/issues/8729) Build amalgamation using a docker // (indirectly) * #8667 (https://github.com/apache/incubator-mxnet/issues/8667) Compiler/linker error while trying to build from source on Mac OSX Sierra 10.12.6 * #8295 (https://github.com/apache/incubator-mxnet/issues/8295) Building with cmake - error * #7852 (https://github.com/apache/incubator-mxnet/issues/7852) Trouble installing MXNet on Raspberry Pi 3 * #13303 (https://github.com/apache/incubator-mxnet/issues/13303) mxnet-cpp package cross-compilation fails with OSError: "wrong ELF class: ELFCLASS32" * #13245 (https://github.com/apache/incubator-mxnet/issues/13245) mxnet::cpp::NDArray::WaitAll() take about 160ms on gtx1080ti // (indirectly, cmake impact on performance) * #12849 (https://github.com/apache/incubator-mxnet/issues/12849) [cmake][cpp-package] Building with cmake does not install the cpp-package API * #12568 (https://github.com/apache/incubator-mxnet/issues/12568) [Scala][macOS] Trying to build from source * #12134 (https://github.com/apache/incubator-mxnet/issues/12134) why MKL and MKL-DNN can't be used simultaneously in ChooseBlas.cmake * #12107 (https://github.com/apache/incubator-mxnet/issues/12107) Faulty CUDA detection with cmake * #11769 (https://github.com/apache/incubator-mxnet/issues/11769) USE_BLAS=MKL fails due to mshadow requiring openblas * #11563 (https://github.com/apache/incubator-mxnet/issues/11563) Deprecate USE_PROFILER from make/cmake * #10856 (https://github.com/apache/incubator-mxnet/issues/10856) Failed OpenMP assertion when loading MXNet compiled with DEBUG=1 * #10742 (https://github.com/apache/incubator-mxnet/issues/10742) typo in cpp-package/CMakeLists.txt *Approach* We are going to iteratively fix and simplify the cmake build system and once is possible deprecate and remove the make system. This PR's have been opened so far: * #11148 (https://github.com/apache/incubator-mxnet/pull/11148) [MXNET-679] Refactor handling BLAS libraries with cmake * #12160 (https://github.com/apache/incubator-mxnet/pull/12160) Remove conflicting llvm OpenMP from cmake builds * #10564 (https://github.com/apache/incubator-mxnet/pull/10564) Simplified CUDA language detection in cmake * #10530 (https://github.com/apache/incubator-mxnet/pull/10530) Jetson build with cmake and CUDA Unfortunately, none of them with any success. The question of updating the minimal required version was not asked before, so I'm raising it now. By upgrading the version we would remove all custom error-prone cmake files that are related to: CUDA, BLAS and LAPACK. Essentially covering most of the problems. OpenMP and profiling would need to be addressed separately. *Benefit* Ease of maintaining of MXNet build, clarity for users, quality and predictability. *Alternatives* * Leave the situation as is * Proceed with the make build I would appreciate hearing your thoughts. Best Anton [1] https://github.com/Kitware/CMake/releases/tag/v3.10.3 [2] https://github.com/apache/incubator-mxnet/issues/11563 [3] https://github.com/apache/incubator-mxnet/blob/master/CMakeLists.txt#L46-L57