https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67886

            Bug ID: 67886
           Summary: Incomplete optimization for virtual function call into
                    freshly constructed object
           Product: gcc
           Version: 4.9.2
            Status: UNCONFIRMED
          Severity: minor
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: Simon.Richter at hogyros dot de
  Target Milestone: ---

This is a bit of a corner/academic case, but came up in a Stack Overflow
discussion:

    struct Base {
        virtual void func() = 0;
    };

    struct Derived : Base {
        virtual void func() { };
    };

    void test()
    {
        Base* base = new Derived;

        for (int i = 0; i < 1000; ++i)
        {
            base->func();
        }
    }

The generated assembler code on x86_64 with -O3 is

Disassembly of section .text:

0000000000000000 <test()>:
   0:   55                      push   %rbp
   1:   53                      push   %rbx
   2:   bf 08 00 00 00          mov    $0x8,%edi
   7:   bb e8 03 00 00          mov    $0x3e8,%ebx
   c:   48 83 ec 08             sub    $0x8,%rsp
  10:   e8 00 00 00 00          callq  15 <test()+0x15>
                        11: R_X86_64_PC32       operator new(unsigned long)-0x4
  15:   ba 00 00 00 00          mov    $0x0,%edx
                        16: R_X86_64_32 vtable for Derived+0x10
  1a:   48 89 c5                mov    %rax,%rbp
  1d:   48 c7 00 00 00 00 00    movq   $0x0,(%rax)
                        20: R_X86_64_32S        vtable for Derived+0x10
  24:   eb 13                   jmp    39 <test()+0x39>
  26:   66 2e 0f 1f 84 00 00    nopw   %cs:0x0(%rax,%rax,1)
  2d:   00 00 00 
  30:   83 eb 01                sub    $0x1,%ebx
  33:   74 1a                   je     4f <test()+0x4f>
  35:   48 8b 55 00             mov    0x0(%rbp),%rdx
  39:   48 8b 12                mov    (%rdx),%rdx
  3c:   48 81 fa 00 00 00 00    cmp    $0x0,%rdx
                        3f: R_X86_64_32S        Derived::func()
  43:   74 eb                   je     30 <test()+0x30>
  45:   48 89 ef                mov    %rbp,%rdi
  48:   ff d2                   callq  *%rdx
  4a:   83 eb 01                sub    $0x1,%ebx
  4d:   75 e6                   jne    35 <test()+0x35>
  4f:   48 83 c4 08             add    $0x8,%rsp
  53:   5b                      pop    %rbx
  54:   5d                      pop    %rbp
  55:   c3                      retq   

Disassembly of section .text._ZN7Derived4funcEv:

0000000000000000 <Derived::func()>:
   0:   f3 c3                   repz retq 

This looks like an optimization half-done. The optimizer correctly inlines the
function call to Derived::func() into the loop, and also correctly verifies
that the function pointer found in the vtable is indeed the same function that
was inlined -- otherwise, the inlined function is skipped and the regular
function called.

I presume that the pointer is rechecked on every loop iteration because it is
possible that the function call can destroy the object and create a new one in
its place that still derives from Base, so that is correct.

If you set -fPIC, the actual values for the vtable pointer and the pointer to
Derived::func() are fetched outside of the loop, and rechecked on each loop
iteration, again, correctly.

However: without -fPIC, there is no way to get a different definition of
Derived::func() without invoking UB, so the function pointer check is
tautological and can be optimized out, unraveling the entire fuzzy ball, as the
inlined function does not destroy the object, and inlining it into the loop
should give an empty loop that can be removed.

Also, wouldn't setting -fvisibility=hidden also take Derived's symbols out of
the dynamic symbol table, in which case I wouldn't be able to override them at
runtime with a preload library?

The optimal solution from an assembler programmer's perspective would be to
take the knowledge that the inlined function does not touch the object's
vtable, and create a path that handles the remaining loop iterations after the
object was shown to be a Derived object once -- this would probably be
optimized to a conditional jump to the ret instruction in the RTL pass -- but I
don't have enough knowledge to tell whether that would be easily doable in this
case.

Reply via email to