I know this sounds like it might be better answered in gcc-help, but
if I am right this is a bug report.

I'm using gcc 4.5 branch, rev. 165881 (a week old), on x86-64 Linux.

This testcase is derived from a larger program. I have looked at the
assembly and was puzzled.

#include <set>
#include <stdio.h>

int main()
{
  static const int array[] = { 1,2,3,4,5,6,7,8,9,10,6 };
  std::set<int> the_set;
  int count = 0;
  for (unsigned i = 0; i < sizeof(array)/sizeof(*array); i++)
  {
    std::pair<std::set<int>::iterator, bool> result =
      the_set.insert(array[i]);
    if (result.second)
      count++;
  }
  printf("%d unique items in array.\n", count);
  return 0;
}

compiled using g++ -std=c++98 -Os this produces what looks to me as
very inefficient code.
Particularly this loop in main():

  40076d:       89 d8                   mov    %ebx,%eax
  40076f:       4c 89 e7                mov    %r12,%rdi
  400772:       48 8d 34 85 60 0a 40    lea    0x400a60(,%rax,4),%rsi
  400779:       00
  40077a:       e8 b1 01 00 00          callq  400930 <std::set<int,
std::less<int>, std::allocator<int> >::insert(int const&)>
  40077f:       48 89 04 24             mov    %rax,(%rsp)
  400783:       89 54 24 08             mov    %edx,0x8(%rsp)
  400787:       48 89 44 24 40          mov    %rax,0x40(%rsp)
  40078c:       48 8b 44 24 08          mov    0x8(%rsp),%rax
  400791:       3c 01                   cmp    $0x1,%al
  400793:       48 89 44 24 48          mov    %rax,0x48(%rsp)
  400798:       83 dd ff                sbb    $0xffffffffffffffff,%ebp
  40079b:       ff c3                   inc    %ebx
  40079d:       83 fb 0b                cmp    $0xb,%ebx
  4007a0:       75 cb                   jne    40076d <main+0x19>

And the uninlined set::insert():

0000000000400930 <std::set<int, std::less<int>, std::allocator<int>
>::insert(int const&)>:
  400930:       48 83 ec 48             sub    $0x48,%rsp
  400934:       e8 5f ff ff ff          callq  400898
<std::_Rb_tree<int, int, std::_Identity<int>, std::less<int>,
std::allocator<int> >::_M_insert_unique(int const&)>
  400939:       89 54 24 18             mov    %edx,0x18(%rsp)
  40093d:       8a 54 24 18             mov    0x18(%rsp),%dl
  400941:       88 54 24 28             mov    %dl,0x28(%rsp)
  400945:       8b 54 24 28             mov    0x28(%rsp),%edx
  400949:       48 83 c4 48             add    $0x48,%rsp
  40094d:       c3                      retq
  40094e:       90                      nop
  40094f:       90                      nop

In the larger program, I got almost these exact results using
"-fprofile-use -O3", but since I have replicated it using -Os and
without PGO maybe it will be easier to debug/optimize.

That zero-result stack shuffling, and the unused stack frames are strange.
Am I reading the code wrong? Should I be using a different version of
the compiler? Is this a known bug?

Please advise.

Reply via email to