[clang] [CIR][OpenMP] Emit #pragma omp for as omp.wsloop + omp.loop_nest (PR #181841)

Luca Parigi via cfe-commits Thu, 26 Feb 2026 05:24:46 -0800

================
@@ -0,0 +1,188 @@
+// RUN: %clang_cc1 -triple x86_64-unknown-linux-gnu -fopenmp -fclangir 
-emit-llvm %s -o %t-cir.ll
+// RUN: FileCheck %s --input-file %t-cir.ll
+
+void before(int);
+void during(int);
+void after(int);
+
+// Test simple for loop with constant bounds: for (int i = 0; i < 10; i++)
+void emit_simple_for() {
+  int j = 5;
+  before(j);
+#pragma omp parallel
+  {
+#pragma omp for
+    for (int i = 0; i < 10; i++) {
+      during(j);
+    }
+  }
+  after(j);
+}
+
+// CHECK-LABEL: define dso_local void @emit_simple_for()
+// CHECK: call void @before(i32 %{{.*}})
+// CHECK: call void (ptr, i32, ptr, ...) @__kmpc_fork_call(ptr @{{.*}}, i32 1, 
ptr @emit_simple_for..omp_par, ptr %{{.*}})
+// CHECK: call void @after(i32 %{{.*}})
+
+// CHECK-LABEL: define internal void @emit_simple_for..omp_par(
+// CHECK: store i32 0, ptr %p.lowerbound
+// CHECK: store i32 9, ptr %p.upperbound
+// CHECK: store i32 1, ptr %p.stride
+// CHECK: call void @__kmpc_for_static_init_4u(
+// CHECK: omp_loop.body:
+// CHECK: omp.loop_nest.region:
+// CHECK: store i32 %{{.*}}, ptr %{{.*}}, align 4
+// CHECK: call void @during(i32 %{{.*}})
+// CHECK: call void @__kmpc_for_static_fini(
+// CHECK: call void @__kmpc_barrier(
----------------
Parigi wrote:


Thank you very much for the review and the feedback.

I think there may have been a misunderstanding: I am testing the lowering pass, 
following the same approach used in the other test files under 
llvm-project/clang/test/CIR/Lowering/ (e.g. global-var-simple.cpp). In the 
lowering I use the -fclangir flag, which produces a different output compared 
to the standard pipeline. I can also emit the same result you linked to me 
removing that flag.

I also wanted to point out that the compiler you shared is different from the 
one I am using: the version you linked does not support CIR — if you use 
-fclangir it does not break, but the output is unchanged. You can verify this 
clearly by trying the -emit-cir flag.
If you want to test a version closer to what I am using, here is a Godbolt 
link: [https://godbolt.org/z/xfnYM4YT9](https://godbolt.org/z/xfnYM4YT9)
The linked example tests the parallel construct because, if you uncomment the 
#pragma omp for, it breaks as is not implemented. You will already notice that 
the parallel output differs from the one you linked.

As an additional sanity check, I also built executables using a driver.c and 
tested both the parallel and for cases. Both .ll files produce identical 
runtime behavior — only the IR syntax differs. I attach here a zip with the 
tests I have done.
[PR-testing.zip](https://github.com/user-attachments/files/25576796/PR-testing.zip)


Regarding the PR size: I agree with your concern and had already thought about 
it. However, I am not sure how to split it cleanly. Emitting only wsloop or 
only loop_nest in isolation would not make sense, since the construct requires 
both together — otherwise the emission would break. This covers emitOMPFor 
(which emits wsloop) and emitForStmt (which emits loop_nest).

An option would be to remove support for variable loop bounds for now, 
restricting the implementation to constant bounds only. This would reduce the 
code size.
I would appreciate your thoughts on whether the split I proposed makes sense, 
or if you have other suggestions — I am happy to follow your guidance.

https://github.com/llvm/llvm-project/pull/181841
_______________________________________________
cfe-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [CIR][OpenMP] Emit #pragma omp for as omp.wsloop + omp.loop_nest (PR #181841)

Reply via email to