https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119153

Arthur O'Dwyer <arthur.j.odwyer at gmail dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |arthur.j.odwyer at gmail dot 
com

--- Comment #4 from Arthur O'Dwyer <arthur.j.odwyer at gmail dot com> ---
"There will always be one copy on the stack though" — Agreed (since P2752
doesn't extend to array literals). But a sufficiently smart compiler *could*
observe that the array temporary on the stack can be initialized *from* the
unrelated initializer_list's backing array in rodata; we don't need to keep a
whole nother copy of the data just to initialize the array temporary from.

This is complicated in practice because GCC chunks up the array temporary's
initializer into a series of 16-byte values stored in .rodata.cst16 — it can do
this because they're loaded onto the stack one by one — whereas the
initializer_list's backing array cannot be chunked up like that. Chunking up
the array's initializer into .rodata.cst16 is a good idea because it allows the
compiler and linker to deduplicate repeated chunks, as shown here:

// https://godbolt.org/z/3GfsP99EP
void f(std::initializer_list<int> il);
template <std::size_t N> void g(int const (&&il)[N]);
void t() {
    f({3, 1, 4, 1, 3, 1, 4, 1});
    g({3, 1, 4, 1, 3, 1, 4, 1});
}

That code (on GCC 16) puts C.0.0={3,1,4,1,3,1,4,1} in .rodata.cst32 as the
backing array of the initializer_list, and puts .LC0={3,1,4,1} in .rodata.cst16
as the only chunk we need in order to initialize the array temporary. A
sufficiently smart GCC could figure out that the latter *could* be a pointer
into the former; but I bet that requires several kinds of smarts that GCC
doesn't currently have and that would be annoying to implement.

If I were trying to make this issue sound like a big deal, I'd leave the "array
temporary" part out of it and simply give a test case like this one:

// https://godbolt.org/z/Yb9qaYY8Y
void f1(std::initializer_list<int> il);
void f2(std::initializer_list<unsigned> il);
void t() {
    f1({3, 1, 4, 1, 3, 1, 4, 1, 3, 1, 4, 1, 3, 1, 4, 1});
    f2({3, 1, 4, 1, 3, 1, 4, 1, 3, 1, 4, 1, 3, 1, 4, 1});
}

GCC 13 would put .LC0={3,1,4,1} into .rodata.cst16, and do a bunch of loads
from there onto the stack for both f1 and f2. GCC 16 puts
C.0.0={3,1,4,1,3,1,4,1,3,1,4,1,3,1,4,1} into .rodata, and then again puts
C.1.1={3u,1u,4u,1u,3u,1u,4u,1u,3u,1u,4u,1u,3u,1u,4u,1u} into .rodata; GCC 16 is
not smart enough to merge these because the types differ, and the linker can't
merge them either because they're not in an SHF_MERGE section. So while GCC 16
avoids the runtime cost and stack-blowing risk (thank you!), and we must expect
some tradeoff in rodata size as a result, the tradeoff in this specific case is
costlier than I wish it were.
  • [Bug c++/119153] Static stor... arthur.j.odwyer at gmail dot com via Gcc-bugs

Reply via email to