Hi, the <experimental/simd> unit tests are my long-standing pain point of excessive compiler memory usage and compile times. I've always worked around the memory usage problem by splitting the test matrix into multiple translations (with different -D flags) of the same source file. I.e. pay with a huge number of compiler invocations to be able to compile at all. OOM kills / thrashing isn't fun.
Recently, the GNU Radio 4 implementation hit a similar issue of excessive compiler memory usage and compile times. Worst case example I have tested (a single TU on a Xeon @ 4.50 GHz, 64 GB RAM (no swapping while compiling)): GCC 15: 13m03s, 30.413 GB (checking enabled) GCC 14: 12m03s, 15.248 GB GCC 13: 11m40s, 14.862 GB Clang 18: 8m10s, 10.811 GB That's supposed to be a unit test. But it's nothing one can use for test- driven development, obviously. But how do mere mortals optimize code for better compile times? -ftime-report is interesting but not really helpful. -Q has interesting information, but the output format is unusable for C++ and it's really hard to post-process. When compiler memory usage goes through the roof it's fairly obvious that compile times have to suffer. So I was wondering whether there are any low- hanging fruit to pick. I've managed to come up with a small torture test that shows interesting behavior. I put it at https://github.com/mattkretz/template-torture-test. Simply do git clone https://github.com/mattkretz/template-torture-test cd template-torture-test make STRESS=7 make TORTURE=1 STRESS=5 These numbers can already "kill" smaller machines. Be prepared to kill cc1plus before things get out of hand. The bit I find interesting in this test is switched with the -D GO_FAST macro (the 'all' target always compiles with and without GO_FAST). With the macro, template arguments to 'Operand<typename...>' are tree-like and the resulting type name is *longer*. But GGC usage is only at 442M. Without GO_FAST, template arguments to 'Operand<typename...>' are a flat list. But GGC usage is at 22890M. The latter variant needs 24x longer to compile. Are long flat template argument/parameter lists a special problem? Why does it make overload resolution *so much more* expensive? Beyond that torture test (should I turn it into a PR?), what can I do to help? Thanks, Matthias -- ────────────────────────────────────────────────────────────────────────── Dr. Matthias Kretz https://mattkretz.github.io GSI Helmholtz Center for Heavy Ion Research https://gsi.de std::simd ──────────────────────────────────────────────────────────────────────────
signature.asc
Description: This is a digitally signed message part.