[PATCH] D144015: [OpenMP]Fix PR55970: Miscompile of collapse(3) with non-rectangular loop nest.
This revision was landed with ongoing or failed builds. This revision was automatically updated to reflect the committed changes. Closed by commit rGddde06906be1: [OpenMP]Fix PR55970: Miscompile of collapse(3) with non-rectangular loop nest. (authored by ABataev). Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D144015/new/ https://reviews.llvm.org/D144015 Files: clang/lib/Sema/SemaOpenMP.cpp clang/test/OpenMP/for_codegen.cpp clang/test/OpenMP/tile_codegen.cpp openmp/runtime/test/worksharing/for/omp_for_collapse_non_rectangular.c Index: openmp/runtime/test/worksharing/for/omp_for_collapse_non_rectangular.c === --- /dev/null +++ openmp/runtime/test/worksharing/for/omp_for_collapse_non_rectangular.c @@ -0,0 +1,22 @@ +// RUN: %libomp-compile-and-run + +#include + +#define N 3 + +int arr[N][N][N]; +int main() { +#pragma omp for collapse(3) + for (unsigned int i = 0; i < N; ++i) +for (unsigned int j = i; j < N; ++j) + for (unsigned int k = j; k < N; ++k) +arr[i][j][k] = 1; + int num_failed = 0; + for (unsigned int i = 0; i < N; ++i) +for (unsigned int j = 0; j < N; ++j) + for (unsigned int k = 0; k < N; ++k) +if (arr[i][j][k] == (j >= i && k >= j) ? 0 : 1) + ++num_failed; + + return num_failed; +} Index: clang/test/OpenMP/tile_codegen.cpp === --- clang/test/OpenMP/tile_codegen.cpp +++ clang/test/OpenMP/tile_codegen.cpp @@ -179,8 +179,8 @@ // CHECK1-NEXT:[[I:%.*]] = alloca i32, align 4 // CHECK1-NEXT:[[DOTCAPTURE_EXPR_:%.*]] = alloca i32, align 4 // CHECK1-NEXT:[[DOTCAPTURE_EXPR_1:%.*]] = alloca i32, align 4 +// CHECK1-NEXT:[[DOTNEW_STEP:%.*]] = alloca i32, align 4 // CHECK1-NEXT:[[DOTCAPTURE_EXPR_2:%.*]] = alloca i32, align 4 -// CHECK1-NEXT:[[DOTCAPTURE_EXPR_3:%.*]] = alloca i32, align 4 // CHECK1-NEXT:[[DOTFLOOR_0_IV_I:%.*]] = alloca i32, align 4 // CHECK1-NEXT:[[DOTTILE_0_IV_I:%.*]] = alloca i32, align 4 // CHECK1-NEXT:store i32 [[START]], ptr [[START_ADDR]], align 4 @@ -191,56 +191,56 @@ // CHECK1-NEXT:[[TMP1:%.*]] = load i32, ptr [[END_ADDR]], align 4 // CHECK1-NEXT:store i32 [[TMP1]], ptr [[DOTCAPTURE_EXPR_1]], align 4 // CHECK1-NEXT:[[TMP2:%.*]] = load i32, ptr [[STEP_ADDR]], align 4 -// CHECK1-NEXT:store i32 [[TMP2]], ptr [[DOTCAPTURE_EXPR_2]], align 4 +// CHECK1-NEXT:store i32 [[TMP2]], ptr [[DOTNEW_STEP]], align 4 // CHECK1-NEXT:[[TMP3:%.*]] = load i32, ptr [[DOTCAPTURE_EXPR_1]], align 4 // CHECK1-NEXT:[[TMP4:%.*]] = load i32, ptr [[DOTCAPTURE_EXPR_]], align 4 // CHECK1-NEXT:[[SUB:%.*]] = sub i32 [[TMP3]], [[TMP4]] -// CHECK1-NEXT:[[SUB4:%.*]] = sub i32 [[SUB]], 1 -// CHECK1-NEXT:[[TMP5:%.*]] = load i32, ptr [[DOTCAPTURE_EXPR_2]], align 4 -// CHECK1-NEXT:[[ADD:%.*]] = add i32 [[SUB4]], [[TMP5]] -// CHECK1-NEXT:[[TMP6:%.*]] = load i32, ptr [[DOTCAPTURE_EXPR_2]], align 4 +// CHECK1-NEXT:[[SUB3:%.*]] = sub i32 [[SUB]], 1 +// CHECK1-NEXT:[[TMP5:%.*]] = load i32, ptr [[DOTNEW_STEP]], align 4 +// CHECK1-NEXT:[[ADD:%.*]] = add i32 [[SUB3]], [[TMP5]] +// CHECK1-NEXT:[[TMP6:%.*]] = load i32, ptr [[DOTNEW_STEP]], align 4 // CHECK1-NEXT:[[DIV:%.*]] = udiv i32 [[ADD]], [[TMP6]] -// CHECK1-NEXT:[[SUB5:%.*]] = sub i32 [[DIV]], 1 -// CHECK1-NEXT:store i32 [[SUB5]], ptr [[DOTCAPTURE_EXPR_3]], align 4 +// CHECK1-NEXT:[[SUB4:%.*]] = sub i32 [[DIV]], 1 +// CHECK1-NEXT:store i32 [[SUB4]], ptr [[DOTCAPTURE_EXPR_2]], align 4 // CHECK1-NEXT:store i32 0, ptr [[DOTFLOOR_0_IV_I]], align 4 // CHECK1-NEXT:br label [[FOR_COND:%.*]] // CHECK1: for.cond: // CHECK1-NEXT:[[TMP7:%.*]] = load i32, ptr [[DOTFLOOR_0_IV_I]], align 4 -// CHECK1-NEXT:[[TMP8:%.*]] = load i32, ptr [[DOTCAPTURE_EXPR_3]], align 4 -// CHECK1-NEXT:[[ADD6:%.*]] = add i32 [[TMP8]], 1 -// CHECK1-NEXT:[[CMP:%.*]] = icmp ult i32 [[TMP7]], [[ADD6]] -// CHECK1-NEXT:br i1 [[CMP]], label [[FOR_BODY:%.*]], label [[FOR_END18:%.*]] +// CHECK1-NEXT:[[TMP8:%.*]] = load i32, ptr [[DOTCAPTURE_EXPR_2]], align 4 +// CHECK1-NEXT:[[ADD5:%.*]] = add i32 [[TMP8]], 1 +// CHECK1-NEXT:[[CMP:%.*]] = icmp ult i32 [[TMP7]], [[ADD5]] +// CHECK1-NEXT:br i1 [[CMP]], label [[FOR_BODY:%.*]], label [[FOR_END17:%.*]] // CHECK1: for.body: // CHECK1-NEXT:[[TMP9:%.*]] = load i32, ptr [[DOTFLOOR_0_IV_I]], align 4 // CHECK1-NEXT:store i32 [[TMP9]], ptr [[DOTTILE_0_IV_I]], align 4 -// CHECK1-NEXT:br label [[FOR_COND7:%.*]] -// CHECK1: for.cond7: +// CHECK1-NEXT:br label [[FOR_COND6:%.*]] +// CHECK1: for.cond6: // CHECK1-NEXT:[[TMP10:%.*]] = load i32, ptr [[DOTTILE_0_IV_I]], align 4 -// CHECK1-NEXT:[[TMP11:%.*]] = load i32, ptr [[DOTCAPTURE_EXPR_3]], align 4 -// CHECK1-NEXT:[[ADD8:%.*]] = add i32 [[TMP11]], 1 +// CHECK1-NEXT:[[TMP11:%.*]]
[PATCH] D144015: [OpenMP]Fix PR55970: Miscompile of collapse(3) with non-rectangular loop nest.
mikerice accepted this revision. mikerice added a comment. This revision is now accepted and ready to land. LGTM. Thanks! Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D144015/new/ https://reviews.llvm.org/D144015 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D144015: [OpenMP]Fix PR55970: Miscompile of collapse(3) with non-rectangular loop nest.
ABataev created this revision. ABataev added a reviewer: mikerice. Herald added subscribers: guansong, yaxunl. Herald added a project: All. ABataev requested review of this revision. Herald added a reviewer: jdoerfert. Herald added subscribers: openmp-commits, sstefan1. Herald added projects: clang, OpenMP. Need to assign the calculated lower bound back to temp variable, otherwise incorrect value (upper bound instead of lower bound) might be used. Repository: rG LLVM Github Monorepo https://reviews.llvm.org/D144015 Files: clang/lib/Sema/SemaOpenMP.cpp clang/test/OpenMP/for_codegen.cpp clang/test/OpenMP/tile_codegen.cpp openmp/runtime/test/worksharing/for/omp_for_collapse_non_rectangular.c Index: openmp/runtime/test/worksharing/for/omp_for_collapse_non_rectangular.c === --- /dev/null +++ openmp/runtime/test/worksharing/for/omp_for_collapse_non_rectangular.c @@ -0,0 +1,22 @@ +// RUN: %libomp-compile-and-run + +#include + +#define N 3 + +int arr[N][N][N]; +int main() { +#pragma omp for collapse(3) + for (unsigned int i = 0; i < N; ++i) +for (unsigned int j = i; j < N; ++j) + for (unsigned int k = j; k < N; ++k) +arr[i][j][k] = 1; + int num_failed = 0; + for (unsigned int i = 0; i < N; ++i) +for (unsigned int j = 0; j < N; ++j) + for (unsigned int k = 0; k < N; ++k) +if (arr[i][j][k] == (j >= i && k >= j) ? 0 : 1) + ++num_failed; + + return num_failed; +} Index: clang/test/OpenMP/tile_codegen.cpp === --- clang/test/OpenMP/tile_codegen.cpp +++ clang/test/OpenMP/tile_codegen.cpp @@ -179,8 +179,8 @@ // CHECK1-NEXT:[[I:%.*]] = alloca i32, align 4 // CHECK1-NEXT:[[DOTCAPTURE_EXPR_:%.*]] = alloca i32, align 4 // CHECK1-NEXT:[[DOTCAPTURE_EXPR_1:%.*]] = alloca i32, align 4 +// CHECK1-NEXT:[[DOTNEW_STEP:%.*]] = alloca i32, align 4 // CHECK1-NEXT:[[DOTCAPTURE_EXPR_2:%.*]] = alloca i32, align 4 -// CHECK1-NEXT:[[DOTCAPTURE_EXPR_3:%.*]] = alloca i32, align 4 // CHECK1-NEXT:[[DOTFLOOR_0_IV_I:%.*]] = alloca i32, align 4 // CHECK1-NEXT:[[DOTTILE_0_IV_I:%.*]] = alloca i32, align 4 // CHECK1-NEXT:store i32 [[START]], ptr [[START_ADDR]], align 4 @@ -191,56 +191,56 @@ // CHECK1-NEXT:[[TMP1:%.*]] = load i32, ptr [[END_ADDR]], align 4 // CHECK1-NEXT:store i32 [[TMP1]], ptr [[DOTCAPTURE_EXPR_1]], align 4 // CHECK1-NEXT:[[TMP2:%.*]] = load i32, ptr [[STEP_ADDR]], align 4 -// CHECK1-NEXT:store i32 [[TMP2]], ptr [[DOTCAPTURE_EXPR_2]], align 4 +// CHECK1-NEXT:store i32 [[TMP2]], ptr [[DOTNEW_STEP]], align 4 // CHECK1-NEXT:[[TMP3:%.*]] = load i32, ptr [[DOTCAPTURE_EXPR_1]], align 4 // CHECK1-NEXT:[[TMP4:%.*]] = load i32, ptr [[DOTCAPTURE_EXPR_]], align 4 // CHECK1-NEXT:[[SUB:%.*]] = sub i32 [[TMP3]], [[TMP4]] -// CHECK1-NEXT:[[SUB4:%.*]] = sub i32 [[SUB]], 1 -// CHECK1-NEXT:[[TMP5:%.*]] = load i32, ptr [[DOTCAPTURE_EXPR_2]], align 4 -// CHECK1-NEXT:[[ADD:%.*]] = add i32 [[SUB4]], [[TMP5]] -// CHECK1-NEXT:[[TMP6:%.*]] = load i32, ptr [[DOTCAPTURE_EXPR_2]], align 4 +// CHECK1-NEXT:[[SUB3:%.*]] = sub i32 [[SUB]], 1 +// CHECK1-NEXT:[[TMP5:%.*]] = load i32, ptr [[DOTNEW_STEP]], align 4 +// CHECK1-NEXT:[[ADD:%.*]] = add i32 [[SUB3]], [[TMP5]] +// CHECK1-NEXT:[[TMP6:%.*]] = load i32, ptr [[DOTNEW_STEP]], align 4 // CHECK1-NEXT:[[DIV:%.*]] = udiv i32 [[ADD]], [[TMP6]] -// CHECK1-NEXT:[[SUB5:%.*]] = sub i32 [[DIV]], 1 -// CHECK1-NEXT:store i32 [[SUB5]], ptr [[DOTCAPTURE_EXPR_3]], align 4 +// CHECK1-NEXT:[[SUB4:%.*]] = sub i32 [[DIV]], 1 +// CHECK1-NEXT:store i32 [[SUB4]], ptr [[DOTCAPTURE_EXPR_2]], align 4 // CHECK1-NEXT:store i32 0, ptr [[DOTFLOOR_0_IV_I]], align 4 // CHECK1-NEXT:br label [[FOR_COND:%.*]] // CHECK1: for.cond: // CHECK1-NEXT:[[TMP7:%.*]] = load i32, ptr [[DOTFLOOR_0_IV_I]], align 4 -// CHECK1-NEXT:[[TMP8:%.*]] = load i32, ptr [[DOTCAPTURE_EXPR_3]], align 4 -// CHECK1-NEXT:[[ADD6:%.*]] = add i32 [[TMP8]], 1 -// CHECK1-NEXT:[[CMP:%.*]] = icmp ult i32 [[TMP7]], [[ADD6]] -// CHECK1-NEXT:br i1 [[CMP]], label [[FOR_BODY:%.*]], label [[FOR_END18:%.*]] +// CHECK1-NEXT:[[TMP8:%.*]] = load i32, ptr [[DOTCAPTURE_EXPR_2]], align 4 +// CHECK1-NEXT:[[ADD5:%.*]] = add i32 [[TMP8]], 1 +// CHECK1-NEXT:[[CMP:%.*]] = icmp ult i32 [[TMP7]], [[ADD5]] +// CHECK1-NEXT:br i1 [[CMP]], label [[FOR_BODY:%.*]], label [[FOR_END17:%.*]] // CHECK1: for.body: // CHECK1-NEXT:[[TMP9:%.*]] = load i32, ptr [[DOTFLOOR_0_IV_I]], align 4 // CHECK1-NEXT:store i32 [[TMP9]], ptr [[DOTTILE_0_IV_I]], align 4 -// CHECK1-NEXT:br label [[FOR_COND7:%.*]] -// CHECK1: for.cond7: +// CHECK1-NEXT:br label [[FOR_COND6:%.*]] +// CHECK1: for.cond6: // CHECK1-NEXT:[[TMP10:%.*]] = load i32, ptr [[DOTTILE_0_IV_I]], align 4 -// CHECK1-NEXT:[[TMP11:%.*]] = load i32, ptr