[PATCH] D144015: [OpenMP]Fix PR55970: Miscompile of collapse(3) with non-rectangular loop nest.

2023-02-14 Thread Alexey Bataev via Phabricator via cfe-commits
This revision was landed with ongoing or failed builds.
This revision was automatically updated to reflect the committed changes.
Closed by commit rGddde06906be1: [OpenMP]Fix PR55970: Miscompile of collapse(3) 
with non-rectangular loop nest. (authored by ABataev).

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D144015/new/

https://reviews.llvm.org/D144015

Files:
  clang/lib/Sema/SemaOpenMP.cpp
  clang/test/OpenMP/for_codegen.cpp
  clang/test/OpenMP/tile_codegen.cpp
  openmp/runtime/test/worksharing/for/omp_for_collapse_non_rectangular.c

Index: openmp/runtime/test/worksharing/for/omp_for_collapse_non_rectangular.c
===
--- /dev/null
+++ openmp/runtime/test/worksharing/for/omp_for_collapse_non_rectangular.c
@@ -0,0 +1,22 @@
+// RUN: %libomp-compile-and-run
+
+#include 
+
+#define N 3
+
+int arr[N][N][N];
+int main() {
+#pragma omp for collapse(3)
+  for (unsigned int i = 0; i < N; ++i)
+for (unsigned int j = i; j < N; ++j)
+  for (unsigned int k = j; k < N; ++k)
+arr[i][j][k] = 1;
+  int num_failed = 0;
+  for (unsigned int i = 0; i < N; ++i)
+for (unsigned int j = 0; j < N; ++j)
+  for (unsigned int k = 0; k < N; ++k)
+if (arr[i][j][k] == (j >= i && k >= j) ? 0 : 1)
+  ++num_failed;
+
+  return num_failed;
+}
Index: clang/test/OpenMP/tile_codegen.cpp
===
--- clang/test/OpenMP/tile_codegen.cpp
+++ clang/test/OpenMP/tile_codegen.cpp
@@ -179,8 +179,8 @@
 // CHECK1-NEXT:[[I:%.*]] = alloca i32, align 4
 // CHECK1-NEXT:[[DOTCAPTURE_EXPR_:%.*]] = alloca i32, align 4
 // CHECK1-NEXT:[[DOTCAPTURE_EXPR_1:%.*]] = alloca i32, align 4
+// CHECK1-NEXT:[[DOTNEW_STEP:%.*]] = alloca i32, align 4
 // CHECK1-NEXT:[[DOTCAPTURE_EXPR_2:%.*]] = alloca i32, align 4
-// CHECK1-NEXT:[[DOTCAPTURE_EXPR_3:%.*]] = alloca i32, align 4
 // CHECK1-NEXT:[[DOTFLOOR_0_IV_I:%.*]] = alloca i32, align 4
 // CHECK1-NEXT:[[DOTTILE_0_IV_I:%.*]] = alloca i32, align 4
 // CHECK1-NEXT:store i32 [[START]], ptr [[START_ADDR]], align 4
@@ -191,56 +191,56 @@
 // CHECK1-NEXT:[[TMP1:%.*]] = load i32, ptr [[END_ADDR]], align 4
 // CHECK1-NEXT:store i32 [[TMP1]], ptr [[DOTCAPTURE_EXPR_1]], align 4
 // CHECK1-NEXT:[[TMP2:%.*]] = load i32, ptr [[STEP_ADDR]], align 4
-// CHECK1-NEXT:store i32 [[TMP2]], ptr [[DOTCAPTURE_EXPR_2]], align 4
+// CHECK1-NEXT:store i32 [[TMP2]], ptr [[DOTNEW_STEP]], align 4
 // CHECK1-NEXT:[[TMP3:%.*]] = load i32, ptr [[DOTCAPTURE_EXPR_1]], align 4
 // CHECK1-NEXT:[[TMP4:%.*]] = load i32, ptr [[DOTCAPTURE_EXPR_]], align 4
 // CHECK1-NEXT:[[SUB:%.*]] = sub i32 [[TMP3]], [[TMP4]]
-// CHECK1-NEXT:[[SUB4:%.*]] = sub i32 [[SUB]], 1
-// CHECK1-NEXT:[[TMP5:%.*]] = load i32, ptr [[DOTCAPTURE_EXPR_2]], align 4
-// CHECK1-NEXT:[[ADD:%.*]] = add i32 [[SUB4]], [[TMP5]]
-// CHECK1-NEXT:[[TMP6:%.*]] = load i32, ptr [[DOTCAPTURE_EXPR_2]], align 4
+// CHECK1-NEXT:[[SUB3:%.*]] = sub i32 [[SUB]], 1
+// CHECK1-NEXT:[[TMP5:%.*]] = load i32, ptr [[DOTNEW_STEP]], align 4
+// CHECK1-NEXT:[[ADD:%.*]] = add i32 [[SUB3]], [[TMP5]]
+// CHECK1-NEXT:[[TMP6:%.*]] = load i32, ptr [[DOTNEW_STEP]], align 4
 // CHECK1-NEXT:[[DIV:%.*]] = udiv i32 [[ADD]], [[TMP6]]
-// CHECK1-NEXT:[[SUB5:%.*]] = sub i32 [[DIV]], 1
-// CHECK1-NEXT:store i32 [[SUB5]], ptr [[DOTCAPTURE_EXPR_3]], align 4
+// CHECK1-NEXT:[[SUB4:%.*]] = sub i32 [[DIV]], 1
+// CHECK1-NEXT:store i32 [[SUB4]], ptr [[DOTCAPTURE_EXPR_2]], align 4
 // CHECK1-NEXT:store i32 0, ptr [[DOTFLOOR_0_IV_I]], align 4
 // CHECK1-NEXT:br label [[FOR_COND:%.*]]
 // CHECK1:   for.cond:
 // CHECK1-NEXT:[[TMP7:%.*]] = load i32, ptr [[DOTFLOOR_0_IV_I]], align 4
-// CHECK1-NEXT:[[TMP8:%.*]] = load i32, ptr [[DOTCAPTURE_EXPR_3]], align 4
-// CHECK1-NEXT:[[ADD6:%.*]] = add i32 [[TMP8]], 1
-// CHECK1-NEXT:[[CMP:%.*]] = icmp ult i32 [[TMP7]], [[ADD6]]
-// CHECK1-NEXT:br i1 [[CMP]], label [[FOR_BODY:%.*]], label [[FOR_END18:%.*]]
+// CHECK1-NEXT:[[TMP8:%.*]] = load i32, ptr [[DOTCAPTURE_EXPR_2]], align 4
+// CHECK1-NEXT:[[ADD5:%.*]] = add i32 [[TMP8]], 1
+// CHECK1-NEXT:[[CMP:%.*]] = icmp ult i32 [[TMP7]], [[ADD5]]
+// CHECK1-NEXT:br i1 [[CMP]], label [[FOR_BODY:%.*]], label [[FOR_END17:%.*]]
 // CHECK1:   for.body:
 // CHECK1-NEXT:[[TMP9:%.*]] = load i32, ptr [[DOTFLOOR_0_IV_I]], align 4
 // CHECK1-NEXT:store i32 [[TMP9]], ptr [[DOTTILE_0_IV_I]], align 4
-// CHECK1-NEXT:br label [[FOR_COND7:%.*]]
-// CHECK1:   for.cond7:
+// CHECK1-NEXT:br label [[FOR_COND6:%.*]]
+// CHECK1:   for.cond6:
 // CHECK1-NEXT:[[TMP10:%.*]] = load i32, ptr [[DOTTILE_0_IV_I]], align 4
-// CHECK1-NEXT:[[TMP11:%.*]] = load i32, ptr [[DOTCAPTURE_EXPR_3]], align 4
-// CHECK1-NEXT:[[ADD8:%.*]] = add i32 [[TMP11]], 1
+// CHECK1-NEXT:[[TMP11:%.*]]

[PATCH] D144015: [OpenMP]Fix PR55970: Miscompile of collapse(3) with non-rectangular loop nest.

2023-02-14 Thread Mike Rice via Phabricator via cfe-commits
mikerice accepted this revision.
mikerice added a comment.
This revision is now accepted and ready to land.

LGTM. Thanks!


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D144015/new/

https://reviews.llvm.org/D144015

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D144015: [OpenMP]Fix PR55970: Miscompile of collapse(3) with non-rectangular loop nest.

2023-02-14 Thread Alexey Bataev via Phabricator via cfe-commits
ABataev created this revision.
ABataev added a reviewer: mikerice.
Herald added subscribers: guansong, yaxunl.
Herald added a project: All.
ABataev requested review of this revision.
Herald added a reviewer: jdoerfert.
Herald added subscribers: openmp-commits, sstefan1.
Herald added projects: clang, OpenMP.

Need to assign the calculated lower bound back to temp variable,
otherwise incorrect value (upper bound instead of lower bound) might be
used.


Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D144015

Files:
  clang/lib/Sema/SemaOpenMP.cpp
  clang/test/OpenMP/for_codegen.cpp
  clang/test/OpenMP/tile_codegen.cpp
  openmp/runtime/test/worksharing/for/omp_for_collapse_non_rectangular.c

Index: openmp/runtime/test/worksharing/for/omp_for_collapse_non_rectangular.c
===
--- /dev/null
+++ openmp/runtime/test/worksharing/for/omp_for_collapse_non_rectangular.c
@@ -0,0 +1,22 @@
+// RUN: %libomp-compile-and-run
+
+#include 
+
+#define N 3
+
+int arr[N][N][N];
+int main() {
+#pragma omp for collapse(3)
+  for (unsigned int i = 0; i < N; ++i)
+for (unsigned int j = i; j < N; ++j)
+  for (unsigned int k = j; k < N; ++k)
+arr[i][j][k] = 1;
+  int num_failed = 0;
+  for (unsigned int i = 0; i < N; ++i)
+for (unsigned int j = 0; j < N; ++j)
+  for (unsigned int k = 0; k < N; ++k)
+if (arr[i][j][k] == (j >= i && k >= j) ? 0 : 1)
+  ++num_failed;
+
+  return num_failed;
+}
Index: clang/test/OpenMP/tile_codegen.cpp
===
--- clang/test/OpenMP/tile_codegen.cpp
+++ clang/test/OpenMP/tile_codegen.cpp
@@ -179,8 +179,8 @@
 // CHECK1-NEXT:[[I:%.*]] = alloca i32, align 4
 // CHECK1-NEXT:[[DOTCAPTURE_EXPR_:%.*]] = alloca i32, align 4
 // CHECK1-NEXT:[[DOTCAPTURE_EXPR_1:%.*]] = alloca i32, align 4
+// CHECK1-NEXT:[[DOTNEW_STEP:%.*]] = alloca i32, align 4
 // CHECK1-NEXT:[[DOTCAPTURE_EXPR_2:%.*]] = alloca i32, align 4
-// CHECK1-NEXT:[[DOTCAPTURE_EXPR_3:%.*]] = alloca i32, align 4
 // CHECK1-NEXT:[[DOTFLOOR_0_IV_I:%.*]] = alloca i32, align 4
 // CHECK1-NEXT:[[DOTTILE_0_IV_I:%.*]] = alloca i32, align 4
 // CHECK1-NEXT:store i32 [[START]], ptr [[START_ADDR]], align 4
@@ -191,56 +191,56 @@
 // CHECK1-NEXT:[[TMP1:%.*]] = load i32, ptr [[END_ADDR]], align 4
 // CHECK1-NEXT:store i32 [[TMP1]], ptr [[DOTCAPTURE_EXPR_1]], align 4
 // CHECK1-NEXT:[[TMP2:%.*]] = load i32, ptr [[STEP_ADDR]], align 4
-// CHECK1-NEXT:store i32 [[TMP2]], ptr [[DOTCAPTURE_EXPR_2]], align 4
+// CHECK1-NEXT:store i32 [[TMP2]], ptr [[DOTNEW_STEP]], align 4
 // CHECK1-NEXT:[[TMP3:%.*]] = load i32, ptr [[DOTCAPTURE_EXPR_1]], align 4
 // CHECK1-NEXT:[[TMP4:%.*]] = load i32, ptr [[DOTCAPTURE_EXPR_]], align 4
 // CHECK1-NEXT:[[SUB:%.*]] = sub i32 [[TMP3]], [[TMP4]]
-// CHECK1-NEXT:[[SUB4:%.*]] = sub i32 [[SUB]], 1
-// CHECK1-NEXT:[[TMP5:%.*]] = load i32, ptr [[DOTCAPTURE_EXPR_2]], align 4
-// CHECK1-NEXT:[[ADD:%.*]] = add i32 [[SUB4]], [[TMP5]]
-// CHECK1-NEXT:[[TMP6:%.*]] = load i32, ptr [[DOTCAPTURE_EXPR_2]], align 4
+// CHECK1-NEXT:[[SUB3:%.*]] = sub i32 [[SUB]], 1
+// CHECK1-NEXT:[[TMP5:%.*]] = load i32, ptr [[DOTNEW_STEP]], align 4
+// CHECK1-NEXT:[[ADD:%.*]] = add i32 [[SUB3]], [[TMP5]]
+// CHECK1-NEXT:[[TMP6:%.*]] = load i32, ptr [[DOTNEW_STEP]], align 4
 // CHECK1-NEXT:[[DIV:%.*]] = udiv i32 [[ADD]], [[TMP6]]
-// CHECK1-NEXT:[[SUB5:%.*]] = sub i32 [[DIV]], 1
-// CHECK1-NEXT:store i32 [[SUB5]], ptr [[DOTCAPTURE_EXPR_3]], align 4
+// CHECK1-NEXT:[[SUB4:%.*]] = sub i32 [[DIV]], 1
+// CHECK1-NEXT:store i32 [[SUB4]], ptr [[DOTCAPTURE_EXPR_2]], align 4
 // CHECK1-NEXT:store i32 0, ptr [[DOTFLOOR_0_IV_I]], align 4
 // CHECK1-NEXT:br label [[FOR_COND:%.*]]
 // CHECK1:   for.cond:
 // CHECK1-NEXT:[[TMP7:%.*]] = load i32, ptr [[DOTFLOOR_0_IV_I]], align 4
-// CHECK1-NEXT:[[TMP8:%.*]] = load i32, ptr [[DOTCAPTURE_EXPR_3]], align 4
-// CHECK1-NEXT:[[ADD6:%.*]] = add i32 [[TMP8]], 1
-// CHECK1-NEXT:[[CMP:%.*]] = icmp ult i32 [[TMP7]], [[ADD6]]
-// CHECK1-NEXT:br i1 [[CMP]], label [[FOR_BODY:%.*]], label [[FOR_END18:%.*]]
+// CHECK1-NEXT:[[TMP8:%.*]] = load i32, ptr [[DOTCAPTURE_EXPR_2]], align 4
+// CHECK1-NEXT:[[ADD5:%.*]] = add i32 [[TMP8]], 1
+// CHECK1-NEXT:[[CMP:%.*]] = icmp ult i32 [[TMP7]], [[ADD5]]
+// CHECK1-NEXT:br i1 [[CMP]], label [[FOR_BODY:%.*]], label [[FOR_END17:%.*]]
 // CHECK1:   for.body:
 // CHECK1-NEXT:[[TMP9:%.*]] = load i32, ptr [[DOTFLOOR_0_IV_I]], align 4
 // CHECK1-NEXT:store i32 [[TMP9]], ptr [[DOTTILE_0_IV_I]], align 4
-// CHECK1-NEXT:br label [[FOR_COND7:%.*]]
-// CHECK1:   for.cond7:
+// CHECK1-NEXT:br label [[FOR_COND6:%.*]]
+// CHECK1:   for.cond6:
 // CHECK1-NEXT:[[TMP10:%.*]] = load i32, ptr [[DOTTILE_0_IV_I]], align 4
-// CHECK1-NEXT:[[TMP11:%.*]] = load i32, ptr