https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116062
Bug ID: 116062
Summary: Exponentially slow compilation at -O3 with
__attribute__((flatten))
Product: gcc
Version: 14.1.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: c++
Assignee: unassigned at gcc dot gnu.org
Reporter: valentin at tolmer dot fr
Target Milestone: ---
After much reducing, I got to this example (as you can imagine, it comes from
very templated code):
$ cat example.cpp
#include <set>
using T = std::set<std::set<std::set<int>>>;
using TSeq = std::set<T>;
struct Inserter {
TSeq& out;
void operator()(const T& t) const __attribute__((flatten)) { out.insert(t); }
};
struct Forwarder {
Inserter& consumer;
void operator()(const T& t) const __attribute__((flatten)) { consumer(t); }
};
struct Container {
void iterate(const Forwarder&) const __attribute__((flatten));
TSeq sequence;
};
void Container::iterate(const Forwarder& consumer) const {
return [&]() __attribute__((flatten)) {
for (const T& elem : sequence) {
consumer(elem);
}
}
();
}
$ time g++-14.1.0 -std=c++20 -O3 example.cpp -o example.o -c
46.07s user 0.44s system 99% cpu 46.518 total
It scales up _very_ fast with the number of nested std::set in T: 2 nested sets
is 2.8s, 3 is 46s, 4 is over 6min35. It's very sensitive, so inlining the
lambda inside iterate brings the compilation speed to <1s, removing pretty much
any flatten makes it go very fast as well. Interestingly, even inlining the
definition of iterate into the class makes the bug go away.
The size of the output scale similarly, going from 77K to 374K to 1.5M when
adding nested std::set.
Interestingly, running with -O2 -fgcse-after-reload -fipa-cp-clone
-floop-interchange -floop-unroll-and-jam -fpeel-loops -fpredictive-commoning
-fsplit-loops -fsplit-paths -ftree-loop-distribution -ftree-partial-pre
-funswitch-loops -fvect-cost-model=dynamic -fversion-loops-for-strides instead
of -O3 (which should be equivalent) also compiles very fast. I tried to compare
the output of -Q --help=optimizers but there's no diff, so I was unable to
pinpoint the optimization pass responsible.
It's not a new thing, it also compiled very slowly with gcc 4.9.4 (I
spot-checked a few in between), obviously with -std=c++11. I didn't go any
further back because it started to require meaningful changes to the code.