Dear MXNet community,

I propose to raise the minimal required cmake version that is needed to
build MXNet to 3.10 which was tagged on March 16 2018 [1].

The effort of repairing cmake scripts in general is targeting to deprecate
make and maintain only 1 build system.

*Need*

The build system is the foundation of every software project. It's quality
is directly impacting the quality of the project. The MXNet build system is
fragile, partially broken and not maintained.

Users of MXNet and developers are confused by the fact that 2 build systems
exist at the same time: make and CMake.

The main functional areas which are impacted by the current state of the
cmake files are:

*OpenMP*
The current CMake files mix OpenMP libraries from different compliers which
is undefined behaviour. It leads to indeterministic crashes on some
platforms. Build and deployment are very hard. No evidence exists that
proves that there is any benefit of having llvm OpenMP library as a
submodule in MXNet.

*BLAS and LAPACK*
Basic math library usage is mixed up. It is hard and confusing to configure
and the choosing logic of the most optimal library is not present. MKL and
OpenBLAS are intermixed in an unpredictable manner.

*Profiling*
The profiler is always on even for production release builds, because MXNet
can not be build without it [2].

*CUDA*
CUDA is detected by 3 different files in the current cmake scripts and the
choice of those is based on a obscure logic with involves different
versions of cmake and platforms which it's building on

* CMakeLists.txt
* cmake/FirstClassLangCuda.cmake
* 3rdparty/mshadow/cmake/Cuda.cmake


*Confusing and misleading cmake user options*
For example, USE_CUDA / USE_OLDCMAKECUDA. Some of them will do or not do
what they supposed to based on cmake generator version and version of cmake
[3].
There are currently more than 30 build parameters for MXNet none of them
documented. Some of them not even located in the main CMakeLists.txt file,
for example 'BLAS'.


*Issues*
There is a significant amount of github issues related to cmake or build in
general. New tickets are issued frequently.

* #8702 (https://github.com/apache/incubator-mxnet/issues/8702)
 [DISCUSSION] Should we deprecate Makefile and only use CMake?
* #5079 (https://github.com/apache/incubator-mxnet/issues/5079)   troubles
building python interface on raspberry pi 3
* #1722 (https://github.com/apache/incubator-mxnet/issues/1722)   problem:
compile mxnet with hdfs
* #11549 (https://github.com/apache/incubator-mxnet/issues/11549) Pip
package can be much faster (OpenCV version?)
* #11417 (https://github.com/apache/incubator-mxnet/issues/11417) libomp.so
dependency (need REAL fix)
* #8532 (https://github.com/apache/incubator-mxnet/issues/8532)   mxnet-mkl
(v0.12.0) crash when using (conda-installed) numpy with MKL // (indirectly)
* #11131 (https://github.com/apache/incubator-mxnet/issues/11131)
mxnet-cu92 low efficiency  // (indirectly)
* #10743 (https://github.com/apache/incubator-mxnet/issues/10743) CUDA
9.1.xx failed if not set OLDCMAKECUDA on cmake 3.10.3 with unix makefile or
Ninja generator
* #10742 (https://github.com/apache/incubator-mxnet/issues/10742) typo in
cpp-package/CMakeLists.txt
* #10737 (https://github.com/apache/incubator-mxnet/issues/10737) Cmake is
running again when execute make install
* #10543 (https://github.com/apache/incubator-mxnet/issues/10543) Failed to
build from source when set USE_CPP_PACKAGE = 1, fatal error C1083: unabel
to open file: “mxnet-cpp/op.h”: No such file or directory
* #10217 (https://github.com/apache/incubator-mxnet/issues/10217) Building
with OpenCV causes link errors
* #10175 (https://github.com/apache/incubator-mxnet/issues/10175) MXNet
MKLDNN build dependency/flow discussion
* #10009 (https://github.com/apache/incubator-mxnet/issues/10009)
[CMAKE][IoT] Remove pthread from android_arm64 build
* #9944 (https://github.com/apache/incubator-mxnet/issues/9944)   MXNet
MinGW-w64 build error // (indirectly)
* #9868 (https://github.com/apache/incubator-mxnet/issues/9868)   MKL and
CMake
* #9516 (https://github.com/apache/incubator-mxnet/issues/9516)   cmake
cuda arch issues
* #9105 (https://github.com/apache/incubator-mxnet/issues/9105)
 libmxnet.so load path error
* #9096 (https://github.com/apache/incubator-mxnet/issues/9096)   MXNet
built with GPerftools crashes
* #8786 (https://github.com/apache/incubator-mxnet/issues/8786)   Link
failure on DEBUG=1 (static member symbol not defined) // (indirectly)
* #8729 (https://github.com/apache/incubator-mxnet/issues/8729)   Build
amalgamation using a docker // (indirectly)
* #8667 (https://github.com/apache/incubator-mxnet/issues/8667)
 Compiler/linker error while trying to build from source on Mac OSX Sierra
10.12.6
* #8295 (https://github.com/apache/incubator-mxnet/issues/8295)   Building
with cmake - error
* #7852 (https://github.com/apache/incubator-mxnet/issues/7852)   Trouble
installing MXNet on Raspberry Pi 3
* #13303 (https://github.com/apache/incubator-mxnet/issues/13303) mxnet-cpp
package cross-compilation fails with OSError: "wrong ELF class: ELFCLASS32"
* #13245 (https://github.com/apache/incubator-mxnet/issues/13245)
mxnet::cpp::NDArray::WaitAll() take about 160ms on gtx1080ti //
(indirectly, cmake impact on performance)
* #12849 (https://github.com/apache/incubator-mxnet/issues/12849)
[cmake][cpp-package] Building with cmake does not install the cpp-package
API
* #12568 (https://github.com/apache/incubator-mxnet/issues/12568)
[Scala][macOS] Trying to build from source
* #12134 (https://github.com/apache/incubator-mxnet/issues/12134) why MKL
and MKL-DNN can't be used simultaneously in ChooseBlas.cmake
* #12107 (https://github.com/apache/incubator-mxnet/issues/12107) Faulty
CUDA detection with cmake
* #11769 (https://github.com/apache/incubator-mxnet/issues/11769)
USE_BLAS=MKL fails due to mshadow requiring openblas
* #11563 (https://github.com/apache/incubator-mxnet/issues/11563) Deprecate
USE_PROFILER from make/cmake
* #10856 (https://github.com/apache/incubator-mxnet/issues/10856) Failed
OpenMP assertion when loading MXNet compiled with DEBUG=1
* #10742 (https://github.com/apache/incubator-mxnet/issues/10742) typo in
cpp-package/CMakeLists.txt


*Approach*

We are going to iteratively fix and simplify the cmake build system and
once is possible deprecate and remove the make system. This PR's have been
opened so far:


* #11148 (https://github.com/apache/incubator-mxnet/pull/11148) [MXNET-679]
Refactor handling BLAS libraries with cmake
* #12160 (https://github.com/apache/incubator-mxnet/pull/12160) Remove
conflicting llvm OpenMP from cmake builds
* #10564 (https://github.com/apache/incubator-mxnet/pull/10564) Simplified
CUDA language detection in cmake
* #10530 (https://github.com/apache/incubator-mxnet/pull/10530) Jetson
build with cmake and CUDA

Unfortunately, none of them with any success. The question of updating the
minimal required version was not asked before, so I'm raising it now.

By upgrading the version we would remove all custom error-prone cmake files
that are related to: CUDA, BLAS and LAPACK. Essentially covering most of
the problems.

OpenMP and profiling would need to be addressed separately.

*Benefit*

Ease of maintaining of MXNet build, clarity for users, quality and
predictability.

*Alternatives*

* Leave the situation as is
* Proceed with the make build


I would appreciate hearing your thoughts.

Best
Anton

[1] https://github.com/Kitware/CMake/releases/tag/v3.10.3
[2] https://github.com/apache/incubator-mxnet/issues/11563
[3]
https://github.com/apache/incubator-mxnet/blob/master/CMakeLists.txt#L46-L57

Reply via email to