https://bugs.llvm.org/show_bug.cgi?id=44488

            Bug ID: 44488
           Summary: Miscompile with opt -loop-vectorize
           Product: libraries
           Version: trunk
          Hardware: PC
                OS: Linux
            Status: NEW
          Severity: enhancement
          Priority: P
         Component: Loop Optimizer
          Assignee: unassignedb...@nondot.org
          Reporter: mikael.hol...@ericsson.com
                CC: llvm-bugs@lists.llvm.org

Created attachment 22998
  --> https://bugs.llvm.org/attachment.cgi?id=22998&action=edit
bbi-37172.ll reproducer

I think I've found a case where -loop-vectorize produces wrong code.

Reproduce with
 opt -loop-vectorize -S -o - bbi-37172.ll

The input looks like

----------------------
@v_38 = global i16 12061, align 1
@v_39 = global i16 11333, align 1

define i16 @main() {
entry:
  br label %for.body

for.cond.cleanup:                                 ; preds = %cond.end5
  %0 = load i16, i16* @v_39, align 1
  ret i16 %0

for.body:                                         ; preds = %entry, %cond.end5
  %i.07 = phi i16 [ 99, %entry ], [ %inc7, %cond.end5 ]
  %1 = load i16, i16* @v_38, align 1
  %cmp1 = icmp eq i16 %1, 32767
  br i1 %cmp1, label %cond.end, label %cond.end

cond.end:                                         ; preds = %for.body,
%for.body
  %cmp2 = icmp eq i16 %1, 0
  br i1 %cmp2, label %cond.end5, label %cond.false4

cond.false4:                                      ; preds = %cond.end
  %rem = srem i16 5786, %1
  br label %cond.end5

cond.end5:                                        ; preds = %cond.end,
%cond.false4
  %cond6 = phi i16 [ %rem, %cond.false4 ], [ 5786, %cond.end ]
  store i16 %cond6, i16* @v_39, align 1
  %inc7 = add nsw i16 %i.07, 1
  %cmp = icmp slt i16 %inc7, 111
  br i1 %cmp, label %for.body, label %for.cond.cleanup, !llvm.loop !26
}
----------------------

If we just focus at the beginning of the loop we have

----------------------
for.body:                                         ; preds = %entry, %cond.end5
  %i.07 = phi i16 [ 99, %entry ], [ %inc7, %cond.end5 ]
  %1 = load i16, i16* @v_38, align 1
  %cmp1 = icmp eq i16 %1, 32767
  br i1 %cmp1, label %cond.end, label %cond.end

cond.end:                                         ; preds = %for.body,
%for.body
  %cmp2 = icmp eq i16 %1, 0
  br i1 %cmp2, label %cond.end5, label %cond.false4

cond.false4:                                      ; preds = %cond.end
  %rem = srem i16 5786, %1
  br label %cond.end5
----------------------

v_38 is 12061 and we will always branch from for.body to cond.end.
Then we do


  %cmp2 = icmp eq i16 %1, 0

which will be false, so

  br i1 %cmp2, label %cond.end5, label %cond.false4

will branch to cond.false4 where we will execute the srem

  %rem = srem i16 5786, %1

v_38 doesn't change throughout the program so this is the path we will use
every loop round.

After vectorization, we instead get the following:

----------------------
vector.body:                                      ; preds =
%pred.srem.continue2, %vector.ph
  %index = phi i32 [ 0, %vector.ph ], [ %index.next, %pred.srem.continue2 ]
  %0 = trunc i32 %index to i16
  %offset.idx = add i16 99, %0
  %broadcast.splatinsert = insertelement <2 x i16> undef, i16 %offset.idx, i32
0
  %broadcast.splat = shufflevector <2 x i16> %broadcast.splatinsert, <2 x i16>
undef, <2 x i32> zeroinitializer
  %induction = add <2 x i16> %broadcast.splat, <i16 0, i16 1>
  %1 = add i16 %offset.idx, 0
  %2 = load i16, i16* @v_38, align 1
  %3 = load i16, i16* @v_38, align 1
  %4 = insertelement <2 x i16> undef, i16 %2, i32 0
  %5 = insertelement <2 x i16> %4, i16 %3, i32 1
  %6 = icmp eq <2 x i16> %5, <i16 32767, i16 32767>
  %7 = icmp eq <2 x i16> %5, zeroinitializer
  %8 = or <2 x i1> %6, %6
  %9 = xor <2 x i1> %7, <i1 true, i1 true>
  %10 = and <2 x i1> %9, %8
  %11 = extractelement <2 x i1> %10, i32 0
  br i1 %11, label %pred.srem.if, label %pred.srem.continue

pred.srem.if:                                     ; preds = %vector.body
  %12 = srem i16 5786, %2
  %13 = insertelement <2 x i16> undef, i16 %12, i32 0
  br label %pred.srem.continue
----------------------

Since v_38 is 12061, %5 will be <12061, 12061>, and thus both %6 and %7 will
be <0, 0>.

>From this we get that %8 is <0, 0>, %9 is <1, 1> and finally %10 is then <0,
0>.
%11 will be 0, and we will thus branch to pred.srem.continue and not execute
the srem.

In pred.srem.continue we then have

----------------------
pred.srem.continue:                               ; preds = %pred.srem.if,
%vector.body
  %14 = phi <2 x i16> [ undef, %vector.body ], [ %13, %pred.srem.if ]
  %15 = extractelement <2 x i1> %10, i32 1
  br i1 %15, label %pred.srem.if1, label %pred.srem.continue2

pred.srem.if1:                                    ; preds = %pred.srem.continue
  %16 = srem i16 5786, %3
  %17 = insertelement <2 x i16> %14, i16 %16, i32 1
  br label %pred.srem.continue2
----------------------

%10 is still <0, 0> so also %15 will be 0, and we will branch to
pred.srem.continue2 and also skip the srem in pred.srem.if1.

Phew.

So, before vectorization we would execute the srem, but after vectorization
we don't anymore. This looks wrong to me.

I don't know really what is happening inside the vectorizer, but I have to say
I'm suspicious towards the generated

  %8 = or <2 x i1> %6, %6
  %9 = xor <2 x i1> %7, <i1 true, i1 true>
  %10 = and <2 x i1> %9, %8

Especially the

  %8 = or <2 x i1> %6, %6

looks pointless. I suppose in some way it's connected to the

  br i1 %cmp1, label %cond.end, label %cond.end

part in the input, but while that branch always takes us to cond.end,
the %8 will always be <0, 0> and then prevent us from reaching any of the
"srem" instructions since %8 is used in the and to calculate %10.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
_______________________________________________
llvm-bugs mailing list
llvm-bugs@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs

Reply via email to