https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64705

            Bug ID: 64705
           Summary: Bad code generation of sieve on x86-64 because of too
                    aggressive IV optimizations
           Product: gcc
           Version: 5.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: vmakarov at gcc dot gnu.org

Created attachment 34510
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=34510&action=edit
preprocessed sieve program from aburto benchmarks

GCC on trunk generates a bad x86-64 code for the hotest loop in sieve
(preprocessed file in the attachment) for -Ofast -march=core-avx2:

for(k = i + prime ; k<=size ; k+=prime)
 {
   ci++;
   *(flags+k)=0;
 }

The GCC generated code is

.L82:
        movb    $0, (%rax)
        addq    %rsi, %rax
        addq    $1, %rbx
        leaq    (%rax,%rcx), %rdx
        cmpq    %rdx, %rbp
        jge     .L82

Here is the code generated by LLVM-3.5 using the same options:

   .LBB41_51:                              # %for.body26.unr
                                        #   Parent Loop BB41_45 Depth=1
                                        # =>  This Inner Loop Header: Depth=2
        incq    %r12
        movb    $0, (%rbx,%rax)
        addq    %r14, %rax
        cmpq    %r15, %rax
        jle     .LBB41_51

LLVM generates 5 insns loop instead of 6 insns loop in GCC.

It is achieved by using base+index addressing instead of just base
addressing in GCC which is a result of induction of flags+k expression.

I tried to make base+index addressing with zero cost by the following
patch.

Index: tree-ssa-loop-ivopts.c
===================================================================
--- tree-ssa-loop-ivopts.c      (revision 219705)
+++ tree-ssa-loop-ivopts.c      (working copy)
@@ -3458,7 +3458,7 @@ get_address_cost (bool symbol_present, b
          end_sequence ();

          acost = seq_cost (seq, speed);
-         acost += address_cost (addr, mem_mode, as, speed);
+         acost = 0;

          if (!acost)
            acost = 1;


I got base+index addressing but still it is 6 insn loop becuase of
induction of other expressions.

.L82:
        movb    $0, (%r8,%rax)
        addq    %rcx, %rax
        addq    $1, %rbx
        leaq    (%rsi,%rax), %rdx
        cmpq    %rdx, %r14
        jge     .L82

Again, too aggressive iv optimization results in worse code generated
by GCC.  The code can be better which is demonstrated by LLVM-3.5.

Reply via email to