How to debug/improve excessive compiler memory usage and compile times

Matthias Kretz via Gcc Tue, 01 Oct 2024 08:20:04 -0700

Hi,

the <experimental/simd> unit tests are my long-standing pain point of 
excessive compiler memory usage and compile times. I've always worked around 
the memory usage problem by splitting the test matrix into multiple 
translations (with different -D flags) of the same source file. I.e. pay with 
a huge number of compiler invocations to be able to compile at all. OOM kills 
/ thrashing isn't fun.

Recently, the GNU Radio 4 implementation hit a similar issue of excessive
compiler memory usage and compile times. Worst case example I have tested (a
single TU on a Xeon @ 4.50 GHz, 64 GB RAM (no swapping while compiling)):

GCC 15: 13m03s, 30.413 GB (checking enabled)
GCC 14: 12m03s, 15.248 GB
GCC 13: 11m40s, 14.862 GB
Clang 18: 8m10s, 10.811 GB

That's supposed to be a unit test. But it's nothing one can use for test-
driven development, obviously. But how do mere mortals optimize code for
better compile times? -ftime-report is interesting but not really helpful. -Q
has interesting information, but the output format is unusable for C++ and
it's really hard to post-process.

When compiler memory usage goes through the roof it's fairly obvious that
compile times have to suffer. So I was wondering whether there are any low-
hanging fruit to pick. I've managed to come up with a small torture test that
shows interesting behavior. I put it at
https://github.com/mattkretz/template-torture-test. Simply do

git clone https://github.com/mattkretz/template-torture-test
cd template-torture-test
make STRESS=7
make TORTURE=1 STRESS=5

These numbers can already "kill" smaller machines. Be prepared to kill cc1plus
before things get out of hand.

The bit I find interesting in this test is switched with the -D GO_FAST macro
(the 'all' target always compiles with and without GO_FAST). With the macro,
template arguments to 'Operand<typename...>' are tree-like and the resulting
type name is *longer*. But GGC usage is only at 442M. Without GO_FAST,
template arguments to 'Operand<typename...>' are a flat list. But GGC usage is
at 22890M. The latter variant needs 24x longer to compile.

Are long flat template argument/parameter lists a special problem? Why does it
make overload resolution *so much more* expensive?

Beyond that torture test (should I turn it into a PR?), what can I do to help?

Thanks,
Matthias

--
──────────────────────────────────────────────────────────────────────────
Dr. Matthias Kretz https://mattkretz.github.io
GSI Helmholtz Center for Heavy Ion Research https://gsi.de
std::simd
──────────────────────────────────────────────────────────────────────────

signature.asc
Description: This is a digitally signed message part.

How to debug/improve excessive compiler memory usage and compile times

Reply via email to