> Am 01.10.2024 um 17:11 schrieb Matthias Kretz via Gcc <gcc@gcc.gnu.org>:
> 
> Hi,
> 
> the <experimental/simd> unit tests are my long-standing pain point of
> excessive compiler memory usage and compile times. I've always worked around
> the memory usage problem by splitting the test matrix into multiple
> translations (with different -D flags) of the same source file. I.e. pay with
> a huge number of compiler invocations to be able to compile at all. OOM kills
> / thrashing isn't fun.
> 
> Recently, the GNU Radio 4 implementation hit a similar issue of excessive
> compiler memory usage and compile times. Worst case example I have tested (a
> single TU on a Xeon @ 4.50 GHz, 64 GB RAM (no swapping while compiling)):
> 
> GCC 15: 13m03s, 30.413 GB (checking enabled)
> GCC 14: 12m03s, 15.248 GB
> GCC 13: 11m40s, 14.862 GB
> Clang 18: 8m10s, 10.811 GB
> 
> That's supposed to be a unit test. But it's nothing one can use for test-
> driven development, obviously. But how do mere mortals optimize code for
> better compile times? -ftime-report is interesting but not really helpful. -Q
> has interesting information, but the output format is unusable for C++ and
> it's really hard to post-process.
> 
> When compiler memory usage goes through the roof it's fairly obvious that
> compile times have to suffer. So I was wondering whether there are any low-
> hanging fruit to pick. I've managed to come up with a small torture test that
> shows interesting behavior. I put it at 
> https://github.com/mattkretz/template-torture-test. Simply do
> 
> git clone https://github.com/mattkretz/template-torture-test
> cd template-torture-test
> make STRESS=7
> make TORTURE=1 STRESS=5
> 
> These numbers can already "kill" smaller machines. Be prepared to kill cc1plus
> before things get out of hand.
> 
> The bit I find interesting in this test is switched with the -D GO_FAST macro
> (the 'all' target always compiles with and without GO_FAST). With the macro,
> template arguments to 'Operand<typename...>' are tree-like and the resulting
> type name is *longer*. But GGC usage is only at 442M. Without GO_FAST,
> template arguments to 'Operand<typename...>' are a flat list. But GGC usage is
> at 22890M. The latter variant needs 24x longer to compile.
> 
> Are long flat template argument/parameter lists a special problem? Why does it
> make overload resolution *so much more* expensive?
> 
> Beyond that torture test (should I turn it into a PR?), what can I do to help?

Analyze where the compile time is spent and where memory is spent.  Identify 
unfitting data structures and algorithms causing the issue.  Replace with 
better ones.  That’s what I do for these kind of issues in the middle end.

Richard 

> Thanks,
>  Matthias
> 
> --
> ──────────────────────────────────────────────────────────────────────────
> Dr. Matthias Kretz                           https://mattkretz.github.io
> GSI Helmholtz Center for Heavy Ion Research               https://gsi.de
> std::simd
> ──────────────────────────────────────────────────────────────────────────
> <signature.asc>

Reply via email to