http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60890

            Bug ID: 60890
           Summary: Performance regression in 4.8 for memory postinc
           Product: gcc
           Version: 4.8.1
            Status: UNCONFIRMED
          Severity: enhancement
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: hariharan.gcc at gmail dot com

Created attachment 32632
  --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=32632&action=edit
The source testcase

The attached testcase is a simplified version of the code where i saw this
problem initially. At the end of tree stages, 4.7 compiler used to generate two
consecutive stores, and then add to update the pointer as shown below for the
stores in the innermost loop.

  MEM[base: aptr_54, offset: 0B] = res1_22;
  MEM[base: aptr_54, offset: 4B] = res2_27;
  D.1771_63 = (sizetype) aptr_54;
  D.1772_64 = D.1771_63 + 8;

4.8 compiler generates

  MEM[base: base_76, offset: 0B] = res1_32;
  _29 = (unsigned long) base_76;
  _83 = _29 + 8;
  base_84 = (int *) _83;
  MEM[base: base_84, offset: -4B] = res2_37;

for the same 2 stores. In our private port, which can do post-inc on load/store
operations, 4.7 used to generate optimal code whereas 4.8 code is not very
pretty.

The problem seems to stem from the fix made to Bug 48814, which generates
post-inc operations in a different order from 4.7. Should the tree optimization
passes have fixed it up?

At the end of tree-optimization passes, i can see the problem in x86 as well.
Compile the attached code with 4.7.x and 4.8.x to see the difference.

Reply via email to