On Fri, Aug 21, 2020 at 12:00 AM Giuliano Belinassi
<giuliano.belina...@usp.br> wrote:
>
> This patch series add a new flag "-fparallel-jobs=" to control if the
> compiler should try to compile the current file in parallel.
>
> There are three modes which is supported by now:
>
> 1. -fparallel-jobs=<N>: Try to compile the file using a maximum of N
> jobs.
>
> 2. -fparallel-jobs=jobserver: Check if there is a running GNU Make
> Jobserver. If positive, communicate with it in order to launch jobs,
> but alert the user if the jobserver was not found, since it requires
> modifications in the project Makefile.
>
> 3. -fparallel-jobs=auto: Same as 2., but quietly fall back to a maximum
> of 2 jobs if the jobserver was not found.
>
> The parallelization works by using a modified LTO engine, as no IR is
> dumped into the disk, and a new partitioner is employed to find
> symbols which must be partitioned together.
>
> In order to implement the parallelism feature, we:
>
> 1. The driver will pass a hidden -fsplit-outputs=<filename> to cc1*.
>
> 2. After IPA, cc1* will search for symbols in which must be partitioned
> together.  If the user allows GCC to automatically promote symbols to
> globals through "--param=promote-statics=1" for a better parallel
> compilation performance, it will also be done.  However, if it decides
> that partitioning is a bad idea, it will continue with a default serial
> compilation, and the additional <filename> will not be created.  It will
> avoid compiling in parallel if and only if:
>
>   * File size exceeds the minimum file size specified by LTO default
>   --param=lto-min-partition.

less than the minimum size I suppose.

>   * The partitioner is unable to find any point of partitioning in the
>   file.

It might make sense to increase the minimum partition size and also
check the partitioning result against unreasonable bias (one very
large and one very small partition).

> 3. cc1* will fork itself; one fork for each partition. Each child
> process will apply its partition mask generated by the partitioner
> and write a new assembler name file to <filename> pointed by the driver.

For the first partition there's no fork (but the main process is used) and
the main output file will be used, correct?

> 4. The driver will open each file and partially link them together into
> a single .o file, if -c was requested, else into a binary.  -S and -E
> is unsupported for now and probably will remain so.

That also applies to -save-temps mode I assume which makes
debugging issues a bit tricky and involves manual invocation
of the cc1 command to have the file with the output filenames preserved.

>
> Speedups ranged from 0.95x to 1.9x on a Quad-Core Intel Core-i7 8565U
> when testing with two files in GCC, as stated in the following table.
> The test was the result of a single execution with a previous warm up
> execution. The compiled GCC had checking enabled, and therefore release
> version might have better timings in both sequential and parallel, but the
> speedup may remain the same.
>
> |                |            | Without Static | With Static |   Max   |
> | File           | Sequential |    Promotion   |  Promotion  | Speedup |
> |----------------|------------|----------------|-----------------------|
> | gimple-match.c |     60s    |       63s      |     34s     |   1.7x  |
> | insn-emit.c    |     37s    |       19s      |     20s     |   1.9x  |
>
> Notice that we have a slowdown in some cases when it is enabled, that
> is why the parallelism feature is enabled with a flag for now.

One reason why promote-statics is not enabled by default is that
it creates new hidden symbols (LTO does so as well) which might
be undesirable.  If deemed OK in general we could enable it by
default.  Note that originally I wanted to have -fparallel-jobs=auto
be enabled by default which should not end up with visible changes
like this(?)

> Bootstrapped and Regtested on Linux x86_64.
>
> Giuliano Belinassi (6):
>   Modify gcc driver for parallel compilation
>   Implement a new partitioner for parallel compilation
>   Implement fork-based parallelism engine
>   Add `+' for Jobserver Integration
>   Add invoke documentation
>   New tests for parallel compilation feature
>
>  gcc/Makefile.in                               |    6 +-
>  gcc/cgraph.c                                  |   16 +
>  gcc/cgraph.h                                  |   13 +
>  gcc/cgraphunit.c                              |  198 ++-
>  gcc/common.opt                                |    4 +
>  gcc/doc/invoke.texi                           |   32 +-
>  gcc/gcc.c                                     | 1219 +++++++++++++----
>  gcc/ipa-fnsummary.c                           |    2 +-
>  gcc/ipa-icf.c                                 |    3 +-
>  gcc/ipa-visibility.c                          |    3 +-
>  gcc/ipa.c                                     |    4 +-
>  gcc/jobserver.cc                              |  168 +++
>  gcc/jobserver.h                               |   33 +
>  gcc/lto-cgraph.c                              |  172 +++
>  gcc/{lto => }/lto-partition.c                 |  463 ++++++-
>  gcc/{lto => }/lto-partition.h                 |    4 +-
>  gcc/lto-streamer.h                            |    4 +
>  gcc/lto/Make-lang.in                          |    4 +-
>  gcc/lto/lto.c                                 |    2 +-
>  gcc/params.opt                                |    8 +
>  gcc/symtab.c                                  |   46 +-
>  gcc/testsuite/driver/a.c                      |    6 +
>  gcc/testsuite/driver/b.c                      |    6 +
>  gcc/testsuite/driver/driver.exp               |   80 ++
>  gcc/testsuite/driver/empty.c                  |    0
>  gcc/testsuite/driver/foo.c                    |    7 +
>  .../gcc.dg/parallel-early-constant.c          |   22 +
>  gcc/testsuite/gcc.dg/parallel-static-1.c      |   21 +
>  gcc/testsuite/gcc.dg/parallel-static-2.c      |   21 +
>  .../gcc.dg/parallel-static-clash-1.c          |   23 +
>  .../gcc.dg/parallel-static-clash-aux.c        |   14 +
>  gcc/toplev.c                                  |   58 +-
>  gcc/toplev.h                                  |    3 +
>  gcc/tree.c                                    |   23 +-
>  gcc/varasm.c                                  |   26 +-
>  intl/Makefile.in                              |    2 +-
>  libbacktrace/Makefile.in                      |    2 +-
>  libcpp/Makefile.in                            |    2 +-
>  libdecnumber/Makefile.in                      |    2 +-
>  libiberty/Makefile.in                         |  212 +--
>  zlib/Makefile.in                              |   64 +-
>  41 files changed, 2539 insertions(+), 459 deletions(-)
>  create mode 100644 gcc/jobserver.cc
>  create mode 100644 gcc/jobserver.h
>  rename gcc/{lto => }/lto-partition.c (78%)
>  rename gcc/{lto => }/lto-partition.h (89%)
>  create mode 100644 gcc/testsuite/driver/a.c
>  create mode 100644 gcc/testsuite/driver/b.c
>  create mode 100644 gcc/testsuite/driver/driver.exp
>  create mode 100644 gcc/testsuite/driver/empty.c
>  create mode 100644 gcc/testsuite/driver/foo.c
>  create mode 100644 gcc/testsuite/gcc.dg/parallel-early-constant.c
>  create mode 100644 gcc/testsuite/gcc.dg/parallel-static-1.c
>  create mode 100644 gcc/testsuite/gcc.dg/parallel-static-2.c
>  create mode 100644 gcc/testsuite/gcc.dg/parallel-static-clash-1.c
>  create mode 100644 gcc/testsuite/gcc.dg/parallel-static-clash-aux.c
>
> --
> 2.28.0
>

Reply via email to