Proposal For Build Improvements

The Mesos build process is in dire need of some build infrastructure 
improvements. These improvements will improve speed and ease of work in 
particular components, and dramatically improve overall build time, especially 
in the Windows environment, but likely in the Linux environment as well.


Background:

It is currently recommended to use the ccache project with the Mesos build 
process. This makes the Linux build process more tolerable in terms of speed, 
but unfortunately such software is not available on Windows. Ultimately, 
though, the caching software is covering up two fundamental flaws in the 
overall build process:

1. Lack of use of libraries
2. Lack of precompiled headers

By not allowing use of libraries, the overall build process is often much 
longer, particularly when a lot of work is being done in a particular 
component. If work is being done in a particular component, only that library 
need be rebuilt (and then the overall image relinked). Currently, since there 
is no such modularization, all source files must be considered at build time. 
Interestingly enough, there is such modularization in the source code layout; 
that modularization just isn't utilized at the compiler level.

Precompiled headers exist on both Windows and Linux. For Linux, you can refer 
to https://gcc.gnu.org/onlinedocs/gcc/Precompiled-Headers.html. Straight from 
the GNU CC documentation: "The time the compiler takes to process these header 
files over and over again can account for nearly all of the time required to 
build the project."

In my prior use of precompiled headers, each C or C++ file generally took about 
4 seconds to compile. After switching to precompiled headers, the precompiled 
header creation took about 4 seconds, but each C/C++ file now took about 200 
milliseconds to compile. The overall build speed was thus dramatically reduced.


Scope of Changes:

These changes are only being proposed for the CMake system. Going forward, the 
CMake system is the easiest way to maintain some level of portability between 
the Linux and Windows platforms.


Details for Modularization:

For the modularization, the intent is to simply make each source directory of 
files, if functionally separate, to be compiled into an archive (.a) file. 
These archive files will then be linked together to form the actual 
executables. These changes will primarily be in the CMake system, and should 
have limited effect on any actual source code.

At a later date, if it makes sense, we can look at building shared library 
(.so) files. However, this only makes the most sense if the code is truly 
shared between different executable files. If that's not the case, then it 
likely makes sense just to stick with .a files. Regardless, generation of .so 
files is out of scope for this change.


Details for Precompiled Header Changes:

Precompiled headers will make use of stout (a very large header-only library) 
essentially "free" from a compile-time overhead point of view. Basically, 
precompiled headers will take a list of header files (including very long 
header files, like "windows.h"), and generate the compiler memory structures 
for their representation.

During precompiled header generation, these memory structures are flushed to 
disk. Then, when components are built, the memory structures are reloaded from 
disk, which is dramatically faster than actually parsing the tens of thousands 
of lines of header files and building the memory structures.

For precompiled headers to be useful, a relatively "consistent" set of headers 
must be included by all of the C/C++ files. So, for example, consider the 
following C file:

#if defined(windows)
#include <windows.h>
#endif

#include <header-a>
#include <header-b>
#include <header-c>

< - Remainder of module - >

To make a precompiled header for this module, all of the #include files would 
be included in a new file, mesos_common.h. The C file would then be changed as 
follows:

#include "mesos_common.h"

< - Remainder of module - >

Structurally, the code is identical, and need not be built with precompiled 
headers. However, use of precompiled headers will make file compilation 
dramatically faster.

Note that other include files can be included after the precompiled header if 
appropriate. For example, the following is valid:

#include "mesos_common.h"
#inclue <header-d>

< - Remainder of module - >

For efficiency purposes, if a header file is included by 50% or more of the 
source files, it should be included in the precompiled header. If a header is 
included in fewer than 50% of the source files, then it can be separately 
included (and thus would not benefit from precompiled headers). Note that this 
is a guideline; even if a header is used by less than 50% of source files, if 
it's very large, we still may decide to throw it in the precompiled header.

Note that, for use of precompiled headers, there will be a great deal of code 
churn (almost exclusively in the #include list of source files). This will mean 
that there will be a lot of code merges, but ultimately no "code logic" 
changes. If merges are not done in a timely fashion, this can easily result in 
needless hand merging of changes. Due to these issues, we will need a dedicated 
sheppard that will integrate the patches quickly. This kind of work is easily 
invalidated when the include list is changed by another developer, 
necessitating us to redo the patch. [Note that Joseph has stepped up to the 
plate for this, thanks Joseph!]


This is the end of my proposal, feedback would be appreciated.

Reply via email to