[clang] [llvm] [LoopUnroll] Add flag to enforce loop unroll pragma regardless of expensive trip count (PR #180961)

via cfe-commits Wed, 11 Feb 2026 07:55:51 -0800

llvmbot wrote:


<!--LLVM PR SUMMARY COMMENT-->

@llvm/pr-subscribers-llvm-transforms

Author: Adel Ejjeh (adelejjeh)

<details>
<summary>Changes</summary>

This PR is intended to replace #<!-- -->171735.

There are cases where the compiler does not try to unroll a loop even if it has 
an unroll pragma, and where the user might want to force the compiler to unroll 
loops with pragmas even if the trip count SCEV is expensive. This PR provides 
users with a clang flag that they can use to override the default behavior of 
the compiler and forces it to adhere to the pragma and unroll the loop.

This change is intended as a stop-gap to unblock users that are facing issues 
with the current unrolling capabilities of the AMDGPU compiler. The following 
two longer-term solutions are planned:
1. (shorter-term) change the default behavior of the compiler for AMDGPU target 
to always enforce user-defined unroll pragmas while providing an opt-out flag 
that allows users to revert to legacy compiler behavior.
2. (longer-term) improve the heuristics that the AMDGPU target to enable more 
automatic unrolling of loops even if users do not provide the unroll pragma.

---

Patch is 77.51 KiB, truncated to 20.00 KiB below, full version: 
https://github.com/llvm/llvm-project/pull/180961.diff


8 Files Affected:

- (modified) clang/include/clang/Basic/CodeGenOptions.def (+1) 
- (modified) clang/include/clang/Options/Options.td (+2) 
- (modified) clang/lib/CodeGen/CGLoopInfo.cpp (+14) 
- (modified) clang/lib/CodeGen/CGLoopInfo.h (+3) 
- (modified) clang/lib/Frontend/CompilerInvocation.cpp (+4-1) 
- (added) clang/test/CodeGen/force-unroll-pragma.c (+339) 
- (modified) llvm/lib/Transforms/Scalar/LoopUnrollPass.cpp (+13-4) 
- (added) llvm/test/Transforms/LoopUnroll/expensive-tripcount.ll (+474) 


``````````diff
diff --git a/clang/include/clang/Basic/CodeGenOptions.def 
b/clang/include/clang/Basic/CodeGenOptions.def
index 8c056bb690690..1ff70ca69da23 100644
--- a/clang/include/clang/Basic/CodeGenOptions.def
+++ b/clang/include/clang/Basic/CodeGenOptions.def
@@ -339,6 +339,7 @@ VALUE_CODEGENOPT(TimeTraceGranularity, 32, 500, Benign) 
///< Minimum time granul
 CODEGENOPT(InterchangeLoops  , 1, 0, Benign) ///< Run loop-interchange.
 CODEGENOPT(FuseLoops         , 1, 0, Benign) ///< Run loop-fusion.
 CODEGENOPT(UnrollLoops       , 1, 0, Benign) ///< Control whether loops are 
unrolled.
+CODEGENOPT(ForceUnrollPragma , 1, 0, Benign) ///< Force unroll runtime loops 
when pragma provided.
 CODEGENOPT(RerollLoops       , 1, 0, Benign) ///< Control whether loops are 
rerolled.
 CODEGENOPT(NoUseJumpTables   , 1, 0, Benign) ///< Set when -fno-jump-tables is 
enabled.
 VALUE_CODEGENOPT(UnwindTables, 2, 0, Benign) ///< Unwind tables (1, Benign) or 
asynchronous unwind tables (2, Benign)
diff --git a/clang/include/clang/Options/Options.td 
b/clang/include/clang/Options/Options.td
index 155f19fb00bd8..09a4219d0f378 100644
--- a/clang/include/clang/Options/Options.td
+++ b/clang/include/clang/Options/Options.td
@@ -4492,6 +4492,8 @@ def funroll_loops : Flag<["-"], "funroll-loops">, 
Group<f_Group>,
   HelpText<"Turn on loop unroller">, Visibility<[ClangOption, CC1Option, 
FlangOption, FC1Option]>;
 def fno_unroll_loops : Flag<["-"], "fno-unroll-loops">, Group<f_Group>,
   HelpText<"Turn off loop unroller">, Visibility<[ClangOption, CC1Option, 
FlangOption, FC1Option]>;
+def force_unroll_pragma : Flag<["-"], "force-unroll-pragma">, Group<f_Group>,
+  HelpText<"Force unroll runtime loops when an unroll pragma is provided">, 
Visibility<[ClangOption, CC1Option]>;
 def ffinite_loops: Flag<["-"],  "ffinite-loops">, Group<f_Group>,
   HelpText<"Assume all non-trivial loops are finite.">, 
Visibility<[ClangOption, CC1Option]>;
 def fno_finite_loops: Flag<["-"], "fno-finite-loops">, Group<f_Group>,
diff --git a/clang/lib/CodeGen/CGLoopInfo.cpp b/clang/lib/CodeGen/CGLoopInfo.cpp
index b2b569a43038c..93486a65de22d 100644
--- a/clang/lib/CodeGen/CGLoopInfo.cpp
+++ b/clang/lib/CodeGen/CGLoopInfo.cpp
@@ -122,6 +122,13 @@ LoopInfo::createPartialUnrollMetadata(const LoopAttributes 
&Attrs,
     Args.push_back(MDNode::get(Ctx, Vals));
   }
 
+  // Emit metadata to allow expensive trip count if ForceUnrollPragma is set
+  // This applies when unroll pragma is specified without an explicit count
+  if (Attrs.ForceUnrollPragma) {
+    Metadata *Vals[] = {MDString::get(Ctx, "llvm.loop.unroll.runtime.force")};
+    Args.push_back(MDNode::get(Ctx, Vals));
+  }
+
   if (FollowupHasTransforms)
     Args.push_back(
         createFollowupMetadata("llvm.loop.unroll.followup_all", Followup));
@@ -821,6 +828,13 @@ void LoopInfoStack::push(BasicBlock *Header, 
clang::ASTContext &Ctx,
          StagedAttrs.UnrollCount == 0))
       setUnrollState(LoopAttributes::Disable);
 
+  // Set ForceUnrollPragma flag if the flag is enabled and there's an unroll
+  // pragma without an explicit count (pragmas with explicit counts already
+  // enable expensive trip count)
+  if (CGOpts.ForceUnrollPragma) {
+    StagedAttrs.ForceUnrollPragma = true;
+  }
+
   /// Stage the attributes.
   push(Header, StartLoc, EndLoc);
 }
diff --git a/clang/lib/CodeGen/CGLoopInfo.h b/clang/lib/CodeGen/CGLoopInfo.h
index 3c57124f4137c..e8ec8af55a616 100644
--- a/clang/lib/CodeGen/CGLoopInfo.h
+++ b/clang/lib/CodeGen/CGLoopInfo.h
@@ -84,6 +84,9 @@ struct LoopAttributes {
 
   /// Value for whether the loop is required to make progress.
   bool MustProgress;
+
+  /// Value for whether to force unroll pragma even with expensive trip count.
+  bool ForceUnrollPragma = false;
 };
 
 /// Information used when generating a structured loop.
diff --git a/clang/lib/Frontend/CompilerInvocation.cpp 
b/clang/lib/Frontend/CompilerInvocation.cpp
index 6aa2afb6f5918..005d1ae47b1a5 100644
--- a/clang/lib/Frontend/CompilerInvocation.cpp
+++ b/clang/lib/Frontend/CompilerInvocation.cpp
@@ -1603,7 +1603,8 @@ void CompilerInvocationBase::GenerateCodeGenArgs(const 
CodeGenOptions &Opts,
     GenerateArg(Consumer, OPT_funroll_loops);
   else if (!Opts.UnrollLoops && Opts.OptimizationLevel > 1)
     GenerateArg(Consumer, OPT_fno_unroll_loops);
-
+  if (Opts.ForceUnrollPragma)
+    GenerateArg(Consumer, OPT_force_unroll_pragma);
   if (Opts.InterchangeLoops)
     GenerateArg(Consumer, OPT_floop_interchange);
   else
@@ -1921,6 +1922,8 @@ bool CompilerInvocation::ParseCodeGenArgs(CodeGenOptions 
&Opts, ArgList &Args,
   Opts.UnrollLoops =
       Args.hasFlag(OPT_funroll_loops, OPT_fno_unroll_loops,
                    (Opts.OptimizationLevel > 1));
+  Opts.ForceUnrollPragma = Args.hasFlag(
+      OPT_force_unroll_pragma, /*OPT_fno_force_unroll_pragma*/ {}, false);
   Opts.InterchangeLoops =
       Args.hasFlag(OPT_floop_interchange, OPT_fno_loop_interchange, false);
   Opts.FuseLoops = Args.hasFlag(OPT_fexperimental_loop_fusion,
diff --git a/clang/test/CodeGen/force-unroll-pragma.c 
b/clang/test/CodeGen/force-unroll-pragma.c
new file mode 100644
index 0000000000000..8c79d5b7a1f5d
--- /dev/null
+++ b/clang/test/CodeGen/force-unroll-pragma.c
@@ -0,0 +1,339 @@
+// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py 
UTC_ARGS: --version 6
+// RUN: %clang_cc1 -triple x86_64-unknown-linux-gnu -O2 %s -emit-llvm -o - | 
FileCheck %s --check-prefixes=CHECK,CHECK-NOPRAGMA
+// RUN: %clang_cc1 -triple x86_64-unknown-linux-gnu -O2 -force-unroll-pragma 
%s -emit-llvm -o - | FileCheck %s --check-prefixes=CHECK,CHECK-PRAGMA
+
+const int output_vec_size = 4;
+struct ArgVec {
+  float v[output_vec_size];
+};
+
+// CHECK-LABEL: define dso_local i32 @calc_offset(
+// CHECK-SAME: i32 noundef [[INPUT_OFFSET:%.*]], i32 noundef [[OFF1:%.*]], i32 
noundef [[OFF2:%.*]]) local_unnamed_addr #[[ATTR0:[0-9]+]] {
+// CHECK-NEXT:  [[ENTRY:.*:]]
+// CHECK-NEXT:    [[ADD:%.*]] = add nsw i32 [[OFF1]], [[INPUT_OFFSET]]
+// CHECK-NEXT:    [[ADD1:%.*]] = add nsw i32 [[ADD]], [[OFF2]]
+// CHECK-NEXT:    ret i32 [[ADD1]]
+//
+int calc_offset(int input_offset, int off1, int off2) {
+  return input_offset + off1 + off2;
+}
+
+// CHECK-NOPRAGMA-LABEL: define dso_local void @complex_loop(
+// CHECK-NOPRAGMA-SAME: i32 noundef [[INPUT_OFFSET:%.*]], i32 noundef 
[[STEP:%.*]], i32 noundef [[N:%.*]], i32 noundef [[OFF1:%.*]], i32 noundef 
[[OFF2:%.*]], ptr noundef readonly captures(none) [[REDUCE_BUFFER:%.*]], ptr 
noundef captures(none) [[VALUE:%.*]]) local_unnamed_addr #[[ATTR1:[0-9]+]] {
+// CHECK-NOPRAGMA-NEXT:  [[ENTRY:.*:]]
+// CHECK-NOPRAGMA-NEXT:    [[CMP23:%.*]] = icmp slt i32 [[INPUT_OFFSET]], [[N]]
+// CHECK-NOPRAGMA-NEXT:    br i1 [[CMP23]], label %[[FOR_BODY_LR_PH:.*]], 
label %[[FOR_END14:.*]]
+// CHECK-NOPRAGMA:       [[FOR_BODY_LR_PH]]:
+// CHECK-NOPRAGMA-NEXT:    [[ADD_I:%.*]] = add i32 [[OFF2]], [[OFF1]]
+// CHECK-NOPRAGMA-NEXT:    [[TMP0:%.*]] = sext i32 [[INPUT_OFFSET]] to i64
+// CHECK-NOPRAGMA-NEXT:    [[TMP1:%.*]] = sext i32 [[STEP]] to i64
+// CHECK-NOPRAGMA-NEXT:    [[TMP2:%.*]] = sext i32 [[N]] to i64
+// CHECK-NOPRAGMA-NEXT:    [[DOTPRE:%.*]] = load float, ptr [[VALUE]], align 
4, !tbaa [[FLOAT_TBAA6:![0-9]+]]
+// CHECK-NOPRAGMA-NEXT:    [[ARRAYIDX5_1_PHI_TRANS_INSERT:%.*]] = 
getelementptr inbounds nuw i8, ptr [[VALUE]], i64 4
+// CHECK-NOPRAGMA-NEXT:    [[DOTPRE27:%.*]] = load float, ptr 
[[ARRAYIDX5_1_PHI_TRANS_INSERT]], align 4, !tbaa [[FLOAT_TBAA6]]
+// CHECK-NOPRAGMA-NEXT:    [[ARRAYIDX5_2_PHI_TRANS_INSERT:%.*]] = 
getelementptr inbounds nuw i8, ptr [[VALUE]], i64 8
+// CHECK-NOPRAGMA-NEXT:    [[DOTPRE28:%.*]] = load float, ptr 
[[ARRAYIDX5_2_PHI_TRANS_INSERT]], align 4, !tbaa [[FLOAT_TBAA6]]
+// CHECK-NOPRAGMA-NEXT:    [[ARRAYIDX5_3_PHI_TRANS_INSERT:%.*]] = 
getelementptr inbounds nuw i8, ptr [[VALUE]], i64 12
+// CHECK-NOPRAGMA-NEXT:    [[DOTPRE29:%.*]] = load float, ptr 
[[ARRAYIDX5_3_PHI_TRANS_INSERT]], align 4, !tbaa [[FLOAT_TBAA6]]
+// CHECK-NOPRAGMA-NEXT:    br label %[[FOR_BODY:.*]]
+// CHECK-NOPRAGMA:       [[FOR_BODY]]:
+// CHECK-NOPRAGMA-NEXT:    [[TMP3:%.*]] = phi float [ [[DOTPRE29]], 
%[[FOR_BODY_LR_PH]] ], [ [[ADD_3:%.*]], %[[FOR_BODY]] ]
+// CHECK-NOPRAGMA-NEXT:    [[TMP4:%.*]] = phi float [ [[DOTPRE28]], 
%[[FOR_BODY_LR_PH]] ], [ [[ADD_2:%.*]], %[[FOR_BODY]] ]
+// CHECK-NOPRAGMA-NEXT:    [[TMP5:%.*]] = phi float [ [[DOTPRE27]], 
%[[FOR_BODY_LR_PH]] ], [ [[ADD_1:%.*]], %[[FOR_BODY]] ]
+// CHECK-NOPRAGMA-NEXT:    [[TMP6:%.*]] = phi float [ [[DOTPRE]], 
%[[FOR_BODY_LR_PH]] ], [ [[ADD:%.*]], %[[FOR_BODY]] ]
+// CHECK-NOPRAGMA-NEXT:    [[INDVARS_IV:%.*]] = phi i64 [ [[TMP0]], 
%[[FOR_BODY_LR_PH]] ], [ [[INDVARS_IV_NEXT:%.*]], %[[FOR_BODY]] ]
+// CHECK-NOPRAGMA-NEXT:    [[TMP7:%.*]] = trunc nsw i64 [[INDVARS_IV]] to i32
+// CHECK-NOPRAGMA-NEXT:    [[ADD1_I:%.*]] = add i32 [[ADD_I]], [[TMP7]]
+// CHECK-NOPRAGMA-NEXT:    [[IDXPROM:%.*]] = sext i32 [[ADD1_I]] to i64
+// CHECK-NOPRAGMA-NEXT:    [[ARRAYIDX:%.*]] = getelementptr inbounds 
[[STRUCT_ARGVEC:%.*]], ptr [[REDUCE_BUFFER]], i64 [[IDXPROM]]
+// CHECK-NOPRAGMA-NEXT:    [[NEXT_SROA_0_0_COPYLOAD:%.*]] = load float, ptr 
[[ARRAYIDX]], align 4
+// CHECK-NOPRAGMA-NEXT:    [[NEXT_SROA_4_0_ARRAYIDX_SROA_IDX:%.*]] = 
getelementptr inbounds nuw i8, ptr [[ARRAYIDX]], i64 4
+// CHECK-NOPRAGMA-NEXT:    [[NEXT_SROA_4_0_COPYLOAD:%.*]] = load float, ptr 
[[NEXT_SROA_4_0_ARRAYIDX_SROA_IDX]], align 4
+// CHECK-NOPRAGMA-NEXT:    [[NEXT_SROA_5_0_ARRAYIDX_SROA_IDX:%.*]] = 
getelementptr inbounds nuw i8, ptr [[ARRAYIDX]], i64 8
+// CHECK-NOPRAGMA-NEXT:    [[NEXT_SROA_5_0_COPYLOAD:%.*]] = load float, ptr 
[[NEXT_SROA_5_0_ARRAYIDX_SROA_IDX]], align 4
+// CHECK-NOPRAGMA-NEXT:    [[NEXT_SROA_6_0_ARRAYIDX_SROA_IDX:%.*]] = 
getelementptr inbounds nuw i8, ptr [[ARRAYIDX]], i64 12
+// CHECK-NOPRAGMA-NEXT:    [[NEXT_SROA_6_0_COPYLOAD:%.*]] = load float, ptr 
[[NEXT_SROA_6_0_ARRAYIDX_SROA_IDX]], align 4, !tbaa [[CHAR_TBAA8:![0-9]+]]
+// CHECK-NOPRAGMA-NEXT:    [[ADD]] = fadd float [[TMP6]], 
[[NEXT_SROA_0_0_COPYLOAD]]
+// CHECK-NOPRAGMA-NEXT:    store float [[ADD]], ptr [[VALUE]], align 4, !tbaa 
[[FLOAT_TBAA6]]
+// CHECK-NOPRAGMA-NEXT:    [[ADD_1]] = fadd float [[TMP5]], 
[[NEXT_SROA_4_0_COPYLOAD]]
+// CHECK-NOPRAGMA-NEXT:    store float [[ADD_1]], ptr 
[[ARRAYIDX5_1_PHI_TRANS_INSERT]], align 4, !tbaa [[FLOAT_TBAA6]]
+// CHECK-NOPRAGMA-NEXT:    [[ADD_2]] = fadd float [[TMP4]], 
[[NEXT_SROA_5_0_COPYLOAD]]
+// CHECK-NOPRAGMA-NEXT:    store float [[ADD_2]], ptr 
[[ARRAYIDX5_2_PHI_TRANS_INSERT]], align 4, !tbaa [[FLOAT_TBAA6]]
+// CHECK-NOPRAGMA-NEXT:    [[ADD_3]] = fadd float [[TMP3]], 
[[NEXT_SROA_6_0_COPYLOAD]]
+// CHECK-NOPRAGMA-NEXT:    store float [[ADD_3]], ptr 
[[ARRAYIDX5_3_PHI_TRANS_INSERT]], align 4, !tbaa [[FLOAT_TBAA6]]
+// CHECK-NOPRAGMA-NEXT:    [[INDVARS_IV_NEXT]] = add nsw i64 [[INDVARS_IV]], 
[[TMP1]]
+// CHECK-NOPRAGMA-NEXT:    [[CMP:%.*]] = icmp slt i64 [[INDVARS_IV_NEXT]], 
[[TMP2]]
+// CHECK-NOPRAGMA-NEXT:    br i1 [[CMP]], label %[[FOR_BODY]], label 
%[[FOR_END14]], !llvm.loop [[LOOP9:![0-9]+]]
+// CHECK-NOPRAGMA:       [[FOR_END14]]:
+// CHECK-NOPRAGMA-NEXT:    ret void
+//
+// CHECK-PRAGMA-LABEL: define dso_local void @complex_loop(
+// CHECK-PRAGMA-SAME: i32 noundef [[INPUT_OFFSET:%.*]], i32 noundef 
[[STEP:%.*]], i32 noundef [[N:%.*]], i32 noundef [[OFF1:%.*]], i32 noundef 
[[OFF2:%.*]], ptr noundef readonly captures(none) [[REDUCE_BUFFER:%.*]], ptr 
noundef captures(none) [[VALUE:%.*]]) local_unnamed_addr #[[ATTR1:[0-9]+]] {
+// CHECK-PRAGMA-NEXT:  [[ENTRY:.*:]]
+// CHECK-PRAGMA-NEXT:    [[CMP23:%.*]] = icmp slt i32 [[INPUT_OFFSET]], [[N]]
+// CHECK-PRAGMA-NEXT:    br i1 [[CMP23]], label %[[FOR_BODY_LR_PH:.*]], label 
%[[FOR_END14:.*]]
+// CHECK-PRAGMA:       [[FOR_BODY_LR_PH]]:
+// CHECK-PRAGMA-NEXT:    [[ADD_I:%.*]] = add i32 [[OFF2]], [[OFF1]]
+// CHECK-PRAGMA-NEXT:    [[TMP0:%.*]] = sext i32 [[INPUT_OFFSET]] to i64
+// CHECK-PRAGMA-NEXT:    [[TMP1:%.*]] = sext i32 [[STEP]] to i64
+// CHECK-PRAGMA-NEXT:    [[TMP2:%.*]] = sext i32 [[N]] to i64
+// CHECK-PRAGMA-NEXT:    [[DOTPRE:%.*]] = load float, ptr [[VALUE]], align 4, 
!tbaa [[FLOAT_TBAA6:![0-9]+]]
+// CHECK-PRAGMA-NEXT:    [[ARRAYIDX5_1_PHI_TRANS_INSERT:%.*]] = getelementptr 
inbounds nuw i8, ptr [[VALUE]], i64 4
+// CHECK-PRAGMA-NEXT:    [[DOTPRE27:%.*]] = load float, ptr 
[[ARRAYIDX5_1_PHI_TRANS_INSERT]], align 4, !tbaa [[FLOAT_TBAA6]]
+// CHECK-PRAGMA-NEXT:    [[ARRAYIDX5_2_PHI_TRANS_INSERT:%.*]] = getelementptr 
inbounds nuw i8, ptr [[VALUE]], i64 8
+// CHECK-PRAGMA-NEXT:    [[DOTPRE28:%.*]] = load float, ptr 
[[ARRAYIDX5_2_PHI_TRANS_INSERT]], align 4, !tbaa [[FLOAT_TBAA6]]
+// CHECK-PRAGMA-NEXT:    [[ARRAYIDX5_3_PHI_TRANS_INSERT:%.*]] = getelementptr 
inbounds nuw i8, ptr [[VALUE]], i64 12
+// CHECK-PRAGMA-NEXT:    [[DOTPRE29:%.*]] = load float, ptr 
[[ARRAYIDX5_3_PHI_TRANS_INSERT]], align 4, !tbaa [[FLOAT_TBAA6]]
+// CHECK-PRAGMA-NEXT:    [[TMP3:%.*]] = add nsw i64 [[TMP1]], [[TMP0]]
+// CHECK-PRAGMA-NEXT:    [[SMAX:%.*]] = tail call i64 @llvm.smax.i64(i64 
[[TMP3]], i64 [[TMP2]])
+// CHECK-PRAGMA-NEXT:    [[TMP4:%.*]] = icmp slt i64 [[TMP3]], [[TMP2]]
+// CHECK-PRAGMA-NEXT:    [[UMIN:%.*]] = zext i1 [[TMP4]] to i64
+// CHECK-PRAGMA-NEXT:    [[TMP5:%.*]] = add nsw i64 [[TMP3]], [[UMIN]]
+// CHECK-PRAGMA-NEXT:    [[TMP6:%.*]] = sub i64 [[SMAX]], [[TMP5]]
+// CHECK-PRAGMA-NEXT:    [[TMP7:%.*]] = udiv i64 [[TMP6]], [[TMP1]]
+// CHECK-PRAGMA-NEXT:    [[TMP8:%.*]] = add i64 [[TMP7]], [[UMIN]]
+// CHECK-PRAGMA-NEXT:    [[TMP9:%.*]] = add i64 [[TMP8]], 1
+// CHECK-PRAGMA-NEXT:    [[XTRAITER:%.*]] = and i64 [[TMP9]], 7
+// CHECK-PRAGMA-NEXT:    [[LCMP_MOD_NOT:%.*]] = icmp eq i64 [[XTRAITER]], 0
+// CHECK-PRAGMA-NEXT:    br i1 [[LCMP_MOD_NOT]], label 
%[[FOR_BODY_PROL_LOOPEXIT:.*]], label %[[FOR_BODY_PROL:.*]]
+// CHECK-PRAGMA:       [[FOR_BODY_PROL]]:
+// CHECK-PRAGMA-NEXT:    [[TMP10:%.*]] = phi float [ [[ADD_3_PROL:%.*]], 
%[[FOR_BODY_PROL]] ], [ [[DOTPRE29]], %[[FOR_BODY_LR_PH]] ]
+// CHECK-PRAGMA-NEXT:    [[TMP11:%.*]] = phi float [ [[ADD_2_PROL:%.*]], 
%[[FOR_BODY_PROL]] ], [ [[DOTPRE28]], %[[FOR_BODY_LR_PH]] ]
+// CHECK-PRAGMA-NEXT:    [[TMP12:%.*]] = phi float [ [[ADD_1_PROL:%.*]], 
%[[FOR_BODY_PROL]] ], [ [[DOTPRE27]], %[[FOR_BODY_LR_PH]] ]
+// CHECK-PRAGMA-NEXT:    [[TMP13:%.*]] = phi float [ [[ADD_PROL:%.*]], 
%[[FOR_BODY_PROL]] ], [ [[DOTPRE]], %[[FOR_BODY_LR_PH]] ]
+// CHECK-PRAGMA-NEXT:    [[INDVARS_IV_PROL:%.*]] = phi i64 [ 
[[INDVARS_IV_NEXT_PROL:%.*]], %[[FOR_BODY_PROL]] ], [ [[TMP0]], 
%[[FOR_BODY_LR_PH]] ]
+// CHECK-PRAGMA-NEXT:    [[PROL_ITER:%.*]] = phi i64 [ [[PROL_ITER_NEXT:%.*]], 
%[[FOR_BODY_PROL]] ], [ 0, %[[FOR_BODY_LR_PH]] ]
+// CHECK-PRAGMA-NEXT:    [[TMP14:%.*]] = trunc nsw i64 [[INDVARS_IV_PROL]] to 
i32
+// CHECK-PRAGMA-NEXT:    [[ADD1_I_PROL:%.*]] = add i32 [[ADD_I]], [[TMP14]]
+// CHECK-PRAGMA-NEXT:    [[IDXPROM_PROL:%.*]] = sext i32 [[ADD1_I_PROL]] to i64
+// CHECK-PRAGMA-NEXT:    [[ARRAYIDX_PROL:%.*]] = getelementptr inbounds 
[[STRUCT_ARGVEC:%.*]], ptr [[REDUCE_BUFFER]], i64 [[IDXPROM_PROL]]
+// CHECK-PRAGMA-NEXT:    [[NEXT_SROA_0_0_COPYLOAD_PROL:%.*]] = load float, ptr 
[[ARRAYIDX_PROL]], align 4
+// CHECK-PRAGMA-NEXT:    [[NEXT_SROA_4_0_ARRAYIDX_SROA_IDX_PROL:%.*]] = 
getelementptr inbounds nuw i8, ptr [[ARRAYIDX_PROL]], i64 4
+// CHECK-PRAGMA-NEXT:    [[NEXT_SROA_4_0_COPYLOAD_PROL:%.*]] = load float, ptr 
[[NEXT_SROA_4_0_ARRAYIDX_SROA_IDX_PROL]], align 4
+// CHECK-PRAGMA-NEXT:    [[NEXT_SROA_5_0_ARRAYIDX_SROA_IDX_PROL:%.*]] = 
getelementptr inbounds nuw i8, ptr [[ARRAYIDX_PROL]], i64 8
+// CHECK-PRAGMA-NEXT:    [[NEXT_SROA_5_0_COPYLOAD_PROL:%.*]] = load float, ptr 
[[NEXT_SROA_5_0_ARRAYIDX_SROA_IDX_PROL]], align 4
+// CHECK-PRAGMA-NEXT:    [[NEXT_SROA_6_0_ARRAYIDX_SROA_IDX_PROL:%.*]] = 
getelementptr inbounds nuw i8, ptr [[ARRAYIDX_PROL]], i64 12
+// CHECK-PRAGMA-NEXT:    [[NEXT_SROA_6_0_COPYLOAD_PROL:%.*]] = load float, ptr 
[[NEXT_SROA_6_0_ARRAYIDX_SROA_IDX_PROL]], align 4, !tbaa [[CHAR_TBAA8:![0-9]+]]
+// CHECK-PRAGMA-NEXT:    [[ADD_PROL]] = fadd float [[TMP13]], 
[[NEXT_SROA_0_0_COPYLOAD_PROL]]
+// CHECK-PRAGMA-NEXT:    store float [[ADD_PROL]], ptr [[VALUE]], align 4, 
!tbaa [[FLOAT_TBAA6]]
+// CHECK-PRAGMA-NEXT:    [[ADD_1_PROL]] = fadd float [[TMP12]], 
[[NEXT_SROA_4_0_COPYLOAD_PROL]]
+// CHECK-PRAGMA-NEXT:    store float [[ADD_1_PROL]], ptr 
[[ARRAYIDX5_1_PHI_TRANS_INSERT]], align 4, !tbaa [[FLOAT_TBAA6]]
+// CHECK-PRAGMA-NEXT:    [[ADD_2_PROL]] = fadd float [[TMP11]], 
[[NEXT_SROA_5_0_COPYLOAD_PROL]]
+// CHECK-PRAGMA-NEXT:    store float [[ADD_2_PROL]], ptr 
[[ARRAYIDX5_2_PHI_TRANS_INSERT]], align 4, !tbaa [[FLOAT_TBAA6]]
+// CHECK-PRAGMA-NEXT:    [[ADD_3_PROL]] = fadd float [[TMP10]], 
[[NEXT_SROA_6_0_COPYLOAD_PROL]]
+// CHECK-PRAGMA-NEXT:    store float [[ADD_3_PROL]], ptr 
[[ARRAYIDX5_3_PHI_TRANS_INSERT]], align 4, !tbaa [[FLOAT_TBAA6]]
+// CHECK-PRAGMA-NEXT:    [[INDVARS_IV_NEXT_PROL]] = add nsw i64 
[[INDVARS_IV_PROL]], [[TMP1]]
+// CHECK-PRAGMA-NEXT:    [[PROL_ITER_NEXT]] = add i64 [[PROL_ITER]], 1
+// CHECK-PRAGMA-NEXT:    [[PROL_ITER_CMP_NOT:%.*]] = icmp eq i64 
[[PROL_ITER_NEXT]], [[XTRAITER]]
+// CHECK-PRAGMA-NEXT:    br i1 [[PROL_ITER_CMP_NOT]], label 
%[[FOR_BODY_PROL_LOOPEXIT]], label %[[FOR_BODY_PROL]], !llvm.loop 
[[LOOP9:![0-9]+]]
+// CHECK-PRAGMA:       [[FOR_BODY_PROL_LOOPEXIT]]:
+// CHECK-PRAGMA-NEXT:    [[DOTUNR:%.*]] = phi float [ [[DOTPRE29]], 
%[[FOR_BODY_LR_PH]] ], [ [[ADD_3_PROL]], %[[FOR_BODY_PROL]] ]
+// CHECK-PRAGMA-NEXT:    [[DOTUNR30:%.*]] = phi float [ [[DOTPRE28]], 
%[[FOR_BODY_LR_PH]] ], [ [[ADD_2_PROL]], %[[FOR_BODY_PROL]] ]
+// CHECK-PRAGMA-NEXT:    [[DOTUNR31:%.*]] = phi float [ [[DOTPRE27]], 
%[[FOR_BODY_LR_PH]] ], [ [[ADD_1_PROL]], %[[FOR_BODY_PROL]] ]
+// CHECK-PRAGMA-NEXT:    [[DOTUNR32:%.*]] = phi float [ [[DOTPRE]], 
%[[FOR_BODY_LR_PH]] ], [ [[ADD_PROL]], %[[FOR_BODY_PROL]] ]
+// CHECK-PRAGMA-NEXT:    [[INDVARS_IV_UNR:%.*]] = phi i64 [ [[TMP0]], 
%[[FOR_BODY_LR_PH]] ], [ [[INDVARS_IV_NEXT_PROL]], %[[FOR_BODY_PROL]] ]
+// CHECK-PRAGMA-NEXT:    [[TMP15:%.*]] = icmp ult i64 [[TMP8]], 7
+// CHECK-PRAGMA-NEXT:    br i1 [[TMP15]], label %[[FOR_END14]], label 
%[[FOR_BODY:.*]]
+// CHECK-PRAGMA:       [[FOR_BODY]]:
+// CHECK-PRAGMA-NEXT:    [[TMP16:%.*]] = phi float [ [[ADD_3_7:%.*]], 
%[[FOR_BODY]] ], [ [[DOTUNR]], %[[FOR_BODY_PROL_LOOPEXIT]] ]
+// CHECK-PRAGMA-NEXT:    [[TMP17:%.*]] = phi float [ [[ADD_2_7:%.*]], 
%[[FOR_BODY]] ], [ [[DOTUNR30]], %[[FOR_BODY_PROL_LOOPEXIT]] ]
+// CHECK-PRAGMA-NEXT:    [[TMP18:%.*]] = phi float [ [[ADD_1_7:%.*]], 
%[[FOR_BODY]] ], [ [[DOTUNR31]], %[[FOR_BODY_PROL_LOOPEXIT]] ]
+// CHECK-PRAGMA-NEXT:    [[TMP19:%.*]] = phi float [ [[ADD_7:%.*]], 
%[[FOR_BODY]] ], [ [[DOTUNR32]], %[[FOR_BODY_PROL_LOOPEXIT]] ]
+// CHECK-PRAGMA-NEXT:    [[INDVARS_IV:%.*]] = phi i64 [ 
[[INDVARS_IV_NEXT_7:%.*]], %[[FOR_BODY]] ], [ [[INDVARS_IV_UNR]], 
%[[FOR_BODY_PROL_LOOPEXIT]] ]
+// CHECK-PRAGMA-NEXT:    [[TMP20:%.*]] = trunc nsw i64 [[INDVARS_IV]] to i32
+// CHECK-PRAGMA-NEXT:    [[ADD1_I:%.*]] = add i32 [[ADD_I]], [[TMP20]]
+// CHECK-PRAGMA-NEXT:    [[IDXPROM:%.*]] = sext i32 [[ADD1_I]] to i64
+// CHECK-PRAGMA-NEXT:    [[ARRAYIDX:%.*]] = getelementptr inbounds 
[[STRUCT_ARGVEC]], ptr [[REDUCE_BUFFER]], i64 [[IDXPROM]]
+// CHECK-PRAGMA-NEXT:    [[NEXT_SROA_0_0_COPYLOAD:%.*]] = load float, ptr 
[[ARRAYIDX]], align 4
+// CHECK-PRAGMA-NEXT:    [[NEXT_SROA_4_0_ARRAYIDX_SROA_IDX:%.*]] = 
getelementptr inbounds nuw i8, ptr [[ARRAYIDX]], i64 4
+// CHECK-PRAGMA-NEXT:    [[NEXT_SROA_4_0_COPYLOAD:%.*]] = load float, ptr 
[[NEXT_SROA_4_0_ARRAYIDX_SROA_IDX]], align 4
+// CHECK-PRAGMA-NEXT:    [[NEXT_SROA_5_0_ARRAYIDX_SROA_IDX:%.*]] = 
getelementptr inbounds nuw i8, ptr [[ARRAYIDX]], i64 8
+// CHECK-PRAGMA-NEXT:    [[NEXT_SROA_5_0_COPYLOAD:%.*]] = load float, ptr 
[[NEXT_SROA_5_0_ARRAYIDX_SROA_IDX]], align 4
+// CHECK-PRAGMA-NEXT:    [[NEXT_SROA_6_0_ARRAYIDX_SROA_IDX:%.*]] = 
getelementptr inbounds nuw i8, ptr [[ARRAYIDX]], i64 12
+// CHECK-PRAGMA-NEXT:    [[NEXT_SROA_6_0_COPYLOAD:%.*]] = load float, ptr 
[[NEXT_SROA_6_0_ARRAYIDX_SROA_IDX]], align 4, !tbaa [[CHAR_TBAA8]]
+// CHECK-PRAGMA-NEXT:    [[ADD:%.*]] = fadd float [[TMP19]], 
[[NEXT_SROA_0_0_COPYLOAD]]
+// CHECK-PRAGMA-NEXT:    store float [[ADD]], ptr [[VALUE]], align 4, !tbaa 
[[FLOAT_TBAA6]]
+// CHECK-PRAGMA-NEXT:    [[ADD_1:%.*]] = fadd float [[TMP18]], 
[[NEXT_SROA_4_0_COPYLOAD]]
+// CHECK-PRAGMA-NEXT:    store float [[ADD_1]], ptr 
[[ARRAYIDX5_1_PHI_TRANS_INSERT]], align 4, !tbaa [[FLOAT_TBAA6]]
+// CHECK-PRAGMA-NEXT:    [[ADD_2:%.*]] = fadd float [[TMP17]], 
[[NEXT_SROA_5_0_COPYLOAD]]
+// CHECK-PRAGMA-NEXT:    store float [[ADD_2]], ptr [[ARRAYIDX5_2_P...
[truncated]

``````````

</details>


https://github.com/llvm/llvm-project/pull/180961
_______________________________________________
cfe-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [LoopUnroll] Add flag to enforce loop unroll pragma regardless of expensive trip count (PR #180961)

Reply via email to