https://bugs.llvm.org/show_bug.cgi?id=44488
Bug ID: 44488
Summary: Miscompile with opt -loop-vectorize
Product: libraries
Version: trunk
Hardware: PC
OS: Linux
Status: NEW
Severity: enhancement
Priority: P
Component: Loop Optimizer
Assignee: unassignedb...@nondot.org
Reporter: mikael.hol...@ericsson.com
CC: llvm-bugs@lists.llvm.org
Created attachment 22998
--> https://bugs.llvm.org/attachment.cgi?id=22998&action=edit
bbi-37172.ll reproducer
I think I've found a case where -loop-vectorize produces wrong code.
Reproduce with
opt -loop-vectorize -S -o - bbi-37172.ll
The input looks like
----------------------
@v_38 = global i16 12061, align 1
@v_39 = global i16 11333, align 1
define i16 @main() {
entry:
br label %for.body
for.cond.cleanup: ; preds = %cond.end5
%0 = load i16, i16* @v_39, align 1
ret i16 %0
for.body: ; preds = %entry, %cond.end5
%i.07 = phi i16 [ 99, %entry ], [ %inc7, %cond.end5 ]
%1 = load i16, i16* @v_38, align 1
%cmp1 = icmp eq i16 %1, 32767
br i1 %cmp1, label %cond.end, label %cond.end
cond.end: ; preds = %for.body,
%for.body
%cmp2 = icmp eq i16 %1, 0
br i1 %cmp2, label %cond.end5, label %cond.false4
cond.false4: ; preds = %cond.end
%rem = srem i16 5786, %1
br label %cond.end5
cond.end5: ; preds = %cond.end,
%cond.false4
%cond6 = phi i16 [ %rem, %cond.false4 ], [ 5786, %cond.end ]
store i16 %cond6, i16* @v_39, align 1
%inc7 = add nsw i16 %i.07, 1
%cmp = icmp slt i16 %inc7, 111
br i1 %cmp, label %for.body, label %for.cond.cleanup, !llvm.loop !26
}
----------------------
If we just focus at the beginning of the loop we have
----------------------
for.body: ; preds = %entry, %cond.end5
%i.07 = phi i16 [ 99, %entry ], [ %inc7, %cond.end5 ]
%1 = load i16, i16* @v_38, align 1
%cmp1 = icmp eq i16 %1, 32767
br i1 %cmp1, label %cond.end, label %cond.end
cond.end: ; preds = %for.body,
%for.body
%cmp2 = icmp eq i16 %1, 0
br i1 %cmp2, label %cond.end5, label %cond.false4
cond.false4: ; preds = %cond.end
%rem = srem i16 5786, %1
br label %cond.end5
----------------------
v_38 is 12061 and we will always branch from for.body to cond.end.
Then we do
%cmp2 = icmp eq i16 %1, 0
which will be false, so
br i1 %cmp2, label %cond.end5, label %cond.false4
will branch to cond.false4 where we will execute the srem
%rem = srem i16 5786, %1
v_38 doesn't change throughout the program so this is the path we will use
every loop round.
After vectorization, we instead get the following:
----------------------
vector.body: ; preds =
%pred.srem.continue2, %vector.ph
%index = phi i32 [ 0, %vector.ph ], [ %index.next, %pred.srem.continue2 ]
%0 = trunc i32 %index to i16
%offset.idx = add i16 99, %0
%broadcast.splatinsert = insertelement <2 x i16> undef, i16 %offset.idx, i32
0
%broadcast.splat = shufflevector <2 x i16> %broadcast.splatinsert, <2 x i16>
undef, <2 x i32> zeroinitializer
%induction = add <2 x i16> %broadcast.splat, <i16 0, i16 1>
%1 = add i16 %offset.idx, 0
%2 = load i16, i16* @v_38, align 1
%3 = load i16, i16* @v_38, align 1
%4 = insertelement <2 x i16> undef, i16 %2, i32 0
%5 = insertelement <2 x i16> %4, i16 %3, i32 1
%6 = icmp eq <2 x i16> %5, <i16 32767, i16 32767>
%7 = icmp eq <2 x i16> %5, zeroinitializer
%8 = or <2 x i1> %6, %6
%9 = xor <2 x i1> %7, <i1 true, i1 true>
%10 = and <2 x i1> %9, %8
%11 = extractelement <2 x i1> %10, i32 0
br i1 %11, label %pred.srem.if, label %pred.srem.continue
pred.srem.if: ; preds = %vector.body
%12 = srem i16 5786, %2
%13 = insertelement <2 x i16> undef, i16 %12, i32 0
br label %pred.srem.continue
----------------------
Since v_38 is 12061, %5 will be <12061, 12061>, and thus both %6 and %7 will
be <0, 0>.
>From this we get that %8 is <0, 0>, %9 is <1, 1> and finally %10 is then <0,
0>.
%11 will be 0, and we will thus branch to pred.srem.continue and not execute
the srem.
In pred.srem.continue we then have
----------------------
pred.srem.continue: ; preds = %pred.srem.if,
%vector.body
%14 = phi <2 x i16> [ undef, %vector.body ], [ %13, %pred.srem.if ]
%15 = extractelement <2 x i1> %10, i32 1
br i1 %15, label %pred.srem.if1, label %pred.srem.continue2
pred.srem.if1: ; preds = %pred.srem.continue
%16 = srem i16 5786, %3
%17 = insertelement <2 x i16> %14, i16 %16, i32 1
br label %pred.srem.continue2
----------------------
%10 is still <0, 0> so also %15 will be 0, and we will branch to
pred.srem.continue2 and also skip the srem in pred.srem.if1.
Phew.
So, before vectorization we would execute the srem, but after vectorization
we don't anymore. This looks wrong to me.
I don't know really what is happening inside the vectorizer, but I have to say
I'm suspicious towards the generated
%8 = or <2 x i1> %6, %6
%9 = xor <2 x i1> %7, <i1 true, i1 true>
%10 = and <2 x i1> %9, %8
Especially the
%8 = or <2 x i1> %6, %6
looks pointless. I suppose in some way it's connected to the
br i1 %cmp1, label %cond.end, label %cond.end
part in the input, but while that branch always takes us to cond.end,
the %8 will always be <0, 0> and then prevent us from reaching any of the
"srem" instructions since %8 is used in the and to calculate %10.
--
You are receiving this mail because:
You are on the CC list for the bug.
_______________________________________________
llvm-bugs mailing list
llvm-bugs@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs