http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50213

             Bug #: 50213
           Summary: Regression in space-optimized code relative to 4.5.x
    Classification: Unclassified
           Product: gcc
           Version: 4.7.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: rtl-optimization
        AssignedTo: unassig...@gcc.gnu.org
        ReportedBy: big...@acm.org


Some change between 4.5.x and 4.6.x decreased generated code quality,
especially noticeable with -Os, for the following function:

int checkTLV (unsigned char* tlv)
{
  unsigned int csum;
  extern unsigned int __info_segment_size;
  const unsigned int* p = (const unsigned int*)(tlv + __info_segment_size);

  csum = 0;
  while (--p > (const unsigned int*)tlv)
    csum ^= *p;
  csum += *p;
  return 0 == csum;
}

In 4.5.2 -Os, the loop has an 11 byte body:

  11 000c EB02                  jmp     .L2     # 52    jump    [length = 2]
  12                    .L3:
  13 000e 3310                  xorl    (%rax), %edx    # 15    *xorsi_1/1     
[length = 2]
  14                    .L2:
  15 0010 4883E804              subq    $4, %rax        # 18    *adddi_1/1     
[length = 4]
  16 0014 4839F8                cmpq    %rdi, %rax      # 20    *cmpdi_1/1     
[length = 3]
  17 0017 77F5                  ja      .L3     # 21    *jcc_1  [length = 2]

while in 4.6.1 -Os the loop has a 15 byte body:

  11 000b EB06                  jmp     .L2     # 66    jump    [length = 2]
  12                    .L3:
  13 000d 3350FC                xorl    -4(%rax), %edx  # 29    *xorsi_1/1     
[length = 3]
  14 0010 4889C8                movq    %rcx, %rax      # 21   
*movdi_internal_rex64/2 [length = 3]
  15                    .L2:
  16 0013 488D48FC              leaq    -4(%rax), %rcx  # 57    *lea_1  [length
= 4]
  17 0017 4839F9                cmpq    %rdi, %rcx      # 34    *cmpdi_1/1     
[length = 3]
  18 001a 77F1                  ja      .L3     # 35    *jcc_1  [length = 2]

My hypothesis is that the label L2 is preventing the pseudo registers for the
loop variable and the memory access from being merged into the same hardware
register.  This is not specific to the x86 back-end: I noticed it in an
unsupported back-end, where the suboptimal register allocation causes a spill.

The current trunk (svn 178132) generates the same suboptimal code as 4.6.1.

Reply via email to