https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92964

            Bug ID: 92964
           Summary: order of base class members generates vastly different
                    code
           Product: gcc
           Version: 9.2.1
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: c++
          Assignee: unassigned at gcc dot gnu.org
          Reporter: john at drouhard dot dev
  Target Milestone: ---

Created attachment 47510
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=47510&action=edit
reproducible c++ source

I'm new at submitting bug reports, sorry if this one is invalid.D

I have attached a repro file that actually exhibits what I believe to be two
issues, but I'm unsure of the second. If I need to open a second report for it
(or if it's really not a problem) just let me know. I've reproduced this
behavior on gcc 8.3, gcc 9.2, and gcc trunk.

Godbolt link if interested: https://godbolt.org/z/F756Pe

Anyway, looking at the generated assembly when compiling the program with
-std=c++17 -O2 (and -O3) seems to indicate a missed optimization of some kind.

bar1() and bar3() are equivalent, other than the order of the union and the
bool in the base storage class. I believe gcc should be able to generate
basically the same assembly as well (other than swapping %rax/%rdx), but bar3
utilizes the stack, even using SIMD instructions for the false branch. bar1
doesn't do this. clang generates equivalent code for bar1/bar3.

I'm assuming it has something to do with alignment, but what's strange is that
if the intermediary Payload class is bypassed by using a PayloadBase directly
in Foo<>, bar1 and bar3 are generated (almost) equivalently as expected (no
SIMD instructions or stack usage).



The other issue (please let me know if I should just open another report or if
this is expected), is the difference between bar1/bar2 and bar3/bar4. They only
differ in how they return the Foo<> object. bar1/bar3 return a default
constructed Foo<>, but bar2/bar4 return a sentinel and rely on converting to
Foo<> using the user-defined constructor. Generated assembly for these pairs of
functions are very different (at least at -O2 optimization).

With the -O2 level optimization, bar2 does not set %rdx in the false branch
(bar1 does). clang sets it in both. I realize it's not strictly necessary to
ensure that a value is returned in %rdx since the empty union member is the one
being initialized, but bar1 takes care to make sure it's set to 0 (as does
clang for both). Bumping the optimization level up to -O3 actually causes bar2
to ensure both are set to 0 explicitly.

bar3 and bar4, however, both intentionally populate %rax with uninitialized
junk from the stack in the false branch (both -O2 and -O3), though it's clearer
in bar4 since it doesn't use SIMD instructions there (related to the first
point?) If initializing %rax isn't necessary, why does it move memory from the
uninitialized stack into the return register at all?

I apologize if this is not well-thought-out or if there's something obvious I'm
missing. Let me know if you need me to provide any more information.

Reply via email to