[PATCH] D95976: [OpenMP] Simplify offloading parallel call codegen

2021-05-12 Thread Johannes Doerfert via Phabricator via cfe-commits
jdoerfert added inline comments.



Comment at: 
openmp/libomptarget/deviceRTLs/common/generated_microtask_cases.gen:1
+case 0:
+((void (*)(kmp_int32 *, kmp_int32 *

JonChesterfield wrote:
> This is not very pretty. Why do we need runtime dispatch to a function 
> pointer?
because we have variadic functions right now. A patch to remove this is already 
underway:
 https://reviews.llvm.org/D102107


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D95976/new/

https://reviews.llvm.org/D95976

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D95976: [OpenMP] Simplify offloading parallel call codegen

2021-05-12 Thread Jon Chesterfield via Phabricator via cfe-commits
JonChesterfield added inline comments.



Comment at: 
openmp/libomptarget/deviceRTLs/common/generated_microtask_cases.gen:1
+case 0:
+((void (*)(kmp_int32 *, kmp_int32 *

This is not very pretty. Why do we need runtime dispatch to a function pointer?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D95976/new/

https://reviews.llvm.org/D95976

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D95976: [OpenMP] Simplify offloading parallel call codegen

2021-04-29 Thread Giorgis Georgakoudis via Phabricator via cfe-commits
ggeorgakoudis added a comment.

In D95976#2725027 , @protze.joachim 
wrote:

> Please update the test with a NFC commit.

Thanks, @protze.joachim. The changes look good. I'll get that NFC commit in 
soon-ish, unless you would like to take over.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D95976/new/

https://reviews.llvm.org/D95976

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D95976: [OpenMP] Simplify offloading parallel call codegen

2021-04-29 Thread Joachim Protze via Phabricator via cfe-commits
protze.joachim added a comment.

Please update the test with a NFC commit.




Comment at: openmp/libomptarget/test/offloading/bug49779.cpp:1-5
+// RUN: %libomptarget-compilexx-run-and-check-aarch64-unknown-linux-gnu
+// RUN: %libomptarget-compilexx-run-and-check-powerpc64-ibm-linux-gnu
+// RUN: %libomptarget-compilexx-run-and-check-powerpc64le-ibm-linux-gnu
+// RUN: %libomptarget-compilexx-run-and-check-x86_64-pc-linux-gnu
+// RUN: %libomptarget-compilexx-run-and-check-nvptx64-nvidia-cuda

See D101326



Comment at: openmp/libomptarget/test/offloading/bug49779.cpp:29-36
+  assert(C >= 2 && C <= 6);
+
+  std::cout << "PASS\n";
+
+  return 0;
+}
+

Since the output goes to Filecheck anyways, I think we should avoid asserts, 
but let Filecheck test for expected results. 
The output for failing tests has more information with this approach.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D95976/new/

https://reviews.llvm.org/D95976

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D95976: [OpenMP] Simplify offloading parallel call codegen

2021-04-21 Thread Giorgis Georgakoudis via Phabricator via cfe-commits
This revision was landed with ongoing or failed builds.
This revision was automatically updated to reflect the committed changes.
Closed by commit rGa2dbfb6b72db: [OpenMP] Simplify offloading parallel call 
codegen (authored by ggeorgakoudis).

Changed prior to commit:
  https://reviews.llvm.org/D95976?vs=339334=339441#toc

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D95976/new/

https://reviews.llvm.org/D95976

Files:
  clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp
  clang/test/OpenMP/nvptx_allocate_codegen.cpp
  clang/test/OpenMP/nvptx_data_sharing.cpp
  clang/test/OpenMP/nvptx_distribute_parallel_generic_mode_codegen.cpp
  clang/test/OpenMP/nvptx_lambda_capturing.cpp
  clang/test/OpenMP/nvptx_multi_target_parallel_codegen.cpp
  clang/test/OpenMP/nvptx_nested_parallel_codegen.cpp
  clang/test/OpenMP/nvptx_parallel_codegen.cpp
  clang/test/OpenMP/nvptx_parallel_for_codegen.cpp
  clang/test/OpenMP/nvptx_target_codegen.cpp
  clang/test/OpenMP/nvptx_target_parallel_codegen.cpp
  clang/test/OpenMP/nvptx_target_parallel_num_threads_codegen.cpp
  clang/test/OpenMP/nvptx_target_teams_codegen.cpp
  clang/test/OpenMP/nvptx_target_teams_distribute_codegen.cpp
  clang/test/OpenMP/nvptx_target_teams_distribute_parallel_for_codegen.cpp
  
clang/test/OpenMP/nvptx_target_teams_distribute_parallel_for_generic_mode_codegen.cpp
  clang/test/OpenMP/nvptx_target_teams_distribute_parallel_for_simd_codegen.cpp
  clang/test/OpenMP/nvptx_teams_reduction_codegen.cpp
  clang/test/OpenMP/remarks_parallel_in_multiple_target_state_machines.c
  clang/test/OpenMP/remarks_parallel_in_target_state_machine.c
  clang/test/OpenMP/target_parallel_debug_codegen.cpp
  clang/test/OpenMP/target_parallel_for_debug_codegen.cpp
  llvm/include/llvm/Frontend/OpenMP/OMPKinds.def
  llvm/lib/Transforms/IPO/OpenMPOpt.cpp
  llvm/test/Transforms/OpenMP/gpu_state_machine_function_ptr_replacement.ll
  openmp/libomptarget/deviceRTLs/common/generated_microtask_cases.gen
  openmp/libomptarget/deviceRTLs/common/src/omptarget.cu
  openmp/libomptarget/deviceRTLs/common/src/parallel.cu
  openmp/libomptarget/deviceRTLs/common/src/support.cu
  openmp/libomptarget/deviceRTLs/common/support.h
  openmp/libomptarget/deviceRTLs/interface.h
  openmp/libomptarget/test/offloading/bug49779.cpp
  openmp/libomptarget/utils/generate_microtask_cases.py

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D95976: [OpenMP] Simplify offloading parallel call codegen

2021-04-21 Thread Giorgis Georgakoudis via Phabricator via cfe-commits
ggeorgakoudis updated this revision to Diff 339334.
ggeorgakoudis added a comment.

Fix clang-format


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D95976/new/

https://reviews.llvm.org/D95976

Files:
  clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp
  clang/test/OpenMP/nvptx_allocate_codegen.cpp
  clang/test/OpenMP/nvptx_data_sharing.cpp
  clang/test/OpenMP/nvptx_distribute_parallel_generic_mode_codegen.cpp
  clang/test/OpenMP/nvptx_lambda_capturing.cpp
  clang/test/OpenMP/nvptx_multi_target_parallel_codegen.cpp
  clang/test/OpenMP/nvptx_nested_parallel_codegen.cpp
  clang/test/OpenMP/nvptx_parallel_codegen.cpp
  clang/test/OpenMP/nvptx_parallel_for_codegen.cpp
  clang/test/OpenMP/nvptx_target_codegen.cpp
  clang/test/OpenMP/nvptx_target_parallel_codegen.cpp
  clang/test/OpenMP/nvptx_target_parallel_num_threads_codegen.cpp
  clang/test/OpenMP/nvptx_target_teams_codegen.cpp
  clang/test/OpenMP/nvptx_target_teams_distribute_codegen.cpp
  clang/test/OpenMP/nvptx_target_teams_distribute_parallel_for_codegen.cpp
  
clang/test/OpenMP/nvptx_target_teams_distribute_parallel_for_generic_mode_codegen.cpp
  clang/test/OpenMP/nvptx_target_teams_distribute_parallel_for_simd_codegen.cpp
  clang/test/OpenMP/nvptx_teams_reduction_codegen.cpp
  clang/test/OpenMP/remarks_parallel_in_multiple_target_state_machines.c
  clang/test/OpenMP/remarks_parallel_in_target_state_machine.c
  clang/test/OpenMP/target_parallel_debug_codegen.cpp
  clang/test/OpenMP/target_parallel_for_debug_codegen.cpp
  llvm/include/llvm/Frontend/OpenMP/OMPKinds.def
  llvm/lib/Transforms/IPO/OpenMPOpt.cpp
  llvm/test/Transforms/OpenMP/gpu_state_machine_function_ptr_replacement.ll
  openmp/libomptarget/deviceRTLs/common/generated_microtask_cases.gen
  openmp/libomptarget/deviceRTLs/common/src/omptarget.cu
  openmp/libomptarget/deviceRTLs/common/src/parallel.cu
  openmp/libomptarget/deviceRTLs/common/src/support.cu
  openmp/libomptarget/deviceRTLs/common/support.h
  openmp/libomptarget/deviceRTLs/interface.h
  openmp/libomptarget/test/offloading/bug49779.cpp
  openmp/libomptarget/utils/generate_microtask_cases.py

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D95976: [OpenMP] Simplify offloading parallel call codegen

2021-04-21 Thread Michael Kruse via Phabricator via cfe-commits
Meinersbur accepted this revision.
Meinersbur added a comment.
This revision is now accepted and ready to land.

This test seem to pass on Windows now. Please still fix the clang-format 
remarks, such as going over 80 characters on a line.




Comment at: clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp:1150
   llvm::Value *IsMaster =
   Bld.CreateICmpEQ(RT.getGPUThreadID(CGF), getMasterThreadID(CGF));
   Bld.CreateCondBr(IsMaster, MasterBB, EST.ExitBB);

ggeorgakoudis wrote:
> Meinersbur wrote:
> > There seem to be more unordered codegen calls, such as this one.
> Some previous emitted values can be re-used, e.g., GPUThreadID in line 1150 
> can re-use the value from line 1140 , instead of re-emitted. I've kept 
> emitting them as it was previously done. What is the preferred way to handle 
> those? 
`getGPUThreadID`/`getMasterThreadID` could cache the value if used multiple 
times., but it would also require to put them into the entry block to be 
available anywhere in the function.

Otherwise, use a best-effort to minimize overhead even if the optimizer cannot 
unify them or in debug builds.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D95976/new/

https://reviews.llvm.org/D95976

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D95976: [OpenMP] Simplify offloading parallel call codegen

2021-04-21 Thread Giorgis Georgakoudis via Phabricator via cfe-commits
ggeorgakoudis updated this revision to Diff 339265.
ggeorgakoudis added a comment.

Add tests, reduce microtask cases to avoid stack problems


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D95976/new/

https://reviews.llvm.org/D95976

Files:
  clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp
  clang/test/OpenMP/nvptx_allocate_codegen.cpp
  clang/test/OpenMP/nvptx_data_sharing.cpp
  clang/test/OpenMP/nvptx_distribute_parallel_generic_mode_codegen.cpp
  clang/test/OpenMP/nvptx_lambda_capturing.cpp
  clang/test/OpenMP/nvptx_multi_target_parallel_codegen.cpp
  clang/test/OpenMP/nvptx_nested_parallel_codegen.cpp
  clang/test/OpenMP/nvptx_parallel_codegen.cpp
  clang/test/OpenMP/nvptx_parallel_for_codegen.cpp
  clang/test/OpenMP/nvptx_target_codegen.cpp
  clang/test/OpenMP/nvptx_target_parallel_codegen.cpp
  clang/test/OpenMP/nvptx_target_parallel_num_threads_codegen.cpp
  clang/test/OpenMP/nvptx_target_teams_codegen.cpp
  clang/test/OpenMP/nvptx_target_teams_distribute_codegen.cpp
  clang/test/OpenMP/nvptx_target_teams_distribute_parallel_for_codegen.cpp
  
clang/test/OpenMP/nvptx_target_teams_distribute_parallel_for_generic_mode_codegen.cpp
  clang/test/OpenMP/nvptx_target_teams_distribute_parallel_for_simd_codegen.cpp
  clang/test/OpenMP/nvptx_teams_reduction_codegen.cpp
  clang/test/OpenMP/remarks_parallel_in_multiple_target_state_machines.c
  clang/test/OpenMP/remarks_parallel_in_target_state_machine.c
  clang/test/OpenMP/target_parallel_debug_codegen.cpp
  clang/test/OpenMP/target_parallel_for_debug_codegen.cpp
  llvm/include/llvm/Frontend/OpenMP/OMPKinds.def
  llvm/lib/Transforms/IPO/OpenMPOpt.cpp
  llvm/test/Transforms/OpenMP/gpu_state_machine_function_ptr_replacement.ll
  openmp/libomptarget/deviceRTLs/common/generated_microtask_cases.gen
  openmp/libomptarget/deviceRTLs/common/src/omptarget.cu
  openmp/libomptarget/deviceRTLs/common/src/parallel.cu
  openmp/libomptarget/deviceRTLs/common/src/support.cu
  openmp/libomptarget/deviceRTLs/common/support.h
  openmp/libomptarget/deviceRTLs/interface.h
  openmp/libomptarget/test/offloading/bug49779.cpp
  openmp/libomptarget/utils/generate_microtask_cases.py

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D95976: [OpenMP] Simplify offloading parallel call codegen

2021-04-19 Thread Giorgis Georgakoudis via Phabricator via cfe-commits
ggeorgakoudis updated this revision to Diff 338554.
ggeorgakoudis marked 2 inline comments as done.
ggeorgakoudis added a comment.

Fix


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D95976/new/

https://reviews.llvm.org/D95976

Files:
  clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp
  clang/test/OpenMP/nvptx_allocate_codegen.cpp
  clang/test/OpenMP/nvptx_data_sharing.cpp
  clang/test/OpenMP/nvptx_distribute_parallel_generic_mode_codegen.cpp
  clang/test/OpenMP/nvptx_lambda_capturing.cpp
  clang/test/OpenMP/nvptx_parallel_codegen.cpp
  clang/test/OpenMP/nvptx_parallel_for_codegen.cpp
  clang/test/OpenMP/nvptx_target_codegen.cpp
  clang/test/OpenMP/nvptx_target_parallel_codegen.cpp
  clang/test/OpenMP/nvptx_target_parallel_num_threads_codegen.cpp
  clang/test/OpenMP/nvptx_target_teams_codegen.cpp
  clang/test/OpenMP/nvptx_target_teams_distribute_codegen.cpp
  clang/test/OpenMP/nvptx_target_teams_distribute_parallel_for_codegen.cpp
  
clang/test/OpenMP/nvptx_target_teams_distribute_parallel_for_generic_mode_codegen.cpp
  clang/test/OpenMP/nvptx_target_teams_distribute_parallel_for_simd_codegen.cpp
  clang/test/OpenMP/nvptx_teams_reduction_codegen.cpp
  clang/test/OpenMP/remarks_parallel_in_multiple_target_state_machines.c
  clang/test/OpenMP/remarks_parallel_in_target_state_machine.c
  clang/test/OpenMP/target_parallel_debug_codegen.cpp
  clang/test/OpenMP/target_parallel_for_debug_codegen.cpp
  llvm/include/llvm/Frontend/OpenMP/OMPKinds.def
  llvm/lib/Transforms/IPO/OpenMPOpt.cpp
  llvm/test/Transforms/OpenMP/gpu_state_machine_function_ptr_replacement.ll
  openmp/libomptarget/deviceRTLs/common/generated_microtask_cases.gen
  openmp/libomptarget/deviceRTLs/common/src/omptarget.cu
  openmp/libomptarget/deviceRTLs/common/src/parallel.cu
  openmp/libomptarget/deviceRTLs/common/src/support.cu
  openmp/libomptarget/deviceRTLs/common/support.h
  openmp/libomptarget/deviceRTLs/interface.h
  openmp/libomptarget/utils/generate_microtask_cases.py

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D95976: [OpenMP] Simplify offloading parallel call codegen

2021-04-19 Thread Michael Kruse via Phabricator via cfe-commits
Meinersbur added inline comments.



Comment at: clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp:591-592
 
   return Bld.CreateAnd(Bld.CreateNUWSub(NumThreads, Bld.getInt32(1)),
Bld.CreateNot(Mask), "master_tid");
 }

This is another undefined codegen call order which causes the current pre-merge 
checks to fail.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D95976/new/

https://reviews.llvm.org/D95976

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D95976: [OpenMP] Simplify offloading parallel call codegen

2021-04-19 Thread Giorgis Georgakoudis via Phabricator via cfe-commits
ggeorgakoudis marked 4 inline comments as done.
ggeorgakoudis added inline comments.



Comment at: clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp:1150
   llvm::Value *IsMaster =
   Bld.CreateICmpEQ(RT.getGPUThreadID(CGF), getMasterThreadID(CGF));
   Bld.CreateCondBr(IsMaster, MasterBB, EST.ExitBB);

Meinersbur wrote:
> There seem to be more unordered codegen calls, such as this one.
Some previous emitted values can be re-used, e.g., GPUThreadID in line 1150 can 
re-use the value from line 1140 , instead of re-emitted. I've kept emitting 
them as it was previously done. What is the preferred way to handle those? 



Comment at: openmp/libomptarget/deviceRTLs/common/src/parallel.cu:294
+  // TODO: Add UNLIKELY to optimize?
+  if (!if_expr) {
+__kmpc_serialized_parallel(ident, global_tid);

jdoerfert wrote:
> ggeorgakoudis wrote:
> > jdoerfert wrote:
> > > This should allow us to remove the `SeqGen` in the Clang CodeGen *and* 
> > > fix PR49777 *and* fix PR49779, a win-win-win situation.
> > Please check
> Check? Can we add the two reproducers as tests, please. One should be a clang 
> test, the other maybe a runtime test, though clang test might suffice.
Ack, will do


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D95976/new/

https://reviews.llvm.org/D95976

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D95976: [OpenMP] Simplify offloading parallel call codegen

2021-04-19 Thread Giorgis Georgakoudis via Phabricator via cfe-commits
ggeorgakoudis updated this revision to Diff 338441.
ggeorgakoudis marked 2 inline comments as done.
ggeorgakoudis added a comment.

Update for comments, fixes


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D95976/new/

https://reviews.llvm.org/D95976

Files:
  clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp
  clang/test/OpenMP/nvptx_allocate_codegen.cpp
  clang/test/OpenMP/nvptx_data_sharing.cpp
  clang/test/OpenMP/nvptx_distribute_parallel_generic_mode_codegen.cpp
  clang/test/OpenMP/nvptx_lambda_capturing.cpp
  clang/test/OpenMP/nvptx_parallel_codegen.cpp
  clang/test/OpenMP/nvptx_parallel_for_codegen.cpp
  clang/test/OpenMP/nvptx_target_codegen.cpp
  clang/test/OpenMP/nvptx_target_parallel_codegen.cpp
  clang/test/OpenMP/nvptx_target_parallel_num_threads_codegen.cpp
  clang/test/OpenMP/nvptx_target_teams_codegen.cpp
  clang/test/OpenMP/nvptx_target_teams_distribute_codegen.cpp
  clang/test/OpenMP/nvptx_target_teams_distribute_parallel_for_codegen.cpp
  
clang/test/OpenMP/nvptx_target_teams_distribute_parallel_for_generic_mode_codegen.cpp
  clang/test/OpenMP/nvptx_target_teams_distribute_parallel_for_simd_codegen.cpp
  clang/test/OpenMP/nvptx_teams_reduction_codegen.cpp
  clang/test/OpenMP/remarks_parallel_in_multiple_target_state_machines.c
  clang/test/OpenMP/remarks_parallel_in_target_state_machine.c
  clang/test/OpenMP/target_parallel_debug_codegen.cpp
  clang/test/OpenMP/target_parallel_for_debug_codegen.cpp
  llvm/include/llvm/Frontend/OpenMP/OMPKinds.def
  llvm/lib/Transforms/IPO/OpenMPOpt.cpp
  llvm/test/Transforms/OpenMP/gpu_state_machine_function_ptr_replacement.ll
  openmp/libomptarget/deviceRTLs/common/generated_microtask_cases.gen
  openmp/libomptarget/deviceRTLs/common/src/omptarget.cu
  openmp/libomptarget/deviceRTLs/common/src/parallel.cu
  openmp/libomptarget/deviceRTLs/common/src/support.cu
  openmp/libomptarget/deviceRTLs/common/support.h
  openmp/libomptarget/deviceRTLs/interface.h
  openmp/libomptarget/utils/generate_microtask_cases.py

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D95976: [OpenMP] Simplify offloading parallel call codegen

2021-04-16 Thread Michael Kruse via Phabricator via cfe-commits
Meinersbur added inline comments.



Comment at: clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp:1141
   llvm::Value *IsWorker =
   Bld.CreateICmpULT(RT.getGPUThreadID(CGF), getThreadLimit(CGF));
   Bld.CreateCondBr(IsWorker, WorkerBB, MasterCheckBB);

There seem to be more unordered codegen calls, such as this one.



Comment at: clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp:1150
   llvm::Value *IsMaster =
   Bld.CreateICmpEQ(RT.getGPUThreadID(CGF), getMasterThreadID(CGF));
   Bld.CreateCondBr(IsMaster, MasterBB, EST.ExitBB);

There seem to be more unordered codegen calls, such as this one.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D95976/new/

https://reviews.llvm.org/D95976

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D95976: [OpenMP] Simplify offloading parallel call codegen

2021-04-16 Thread Michael Kruse via Phabricator via cfe-commits
Meinersbur added a comment.

I have not looked at the other mentioned problem yet:

> another that attribute regexes are not recognized 
> (https://reviews.llvm.org/harbormaster/unit/view/552593/ at 
> nvptx_target_codegen.cpp:723:17)

Which might still be there.

I would want for Harbormaster to complete the pre-merge check.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D95976/new/

https://reviews.llvm.org/D95976

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D95976: [OpenMP] Simplify offloading parallel call codegen

2021-04-16 Thread Johannes Doerfert via Phabricator via cfe-commits
jdoerfert accepted this revision.
jdoerfert added a comment.

With the nit to add the two reproducers, LGTM. (please make sure to run FAROS 
or some benchmarks we have before commiting).




Comment at: openmp/libomptarget/deviceRTLs/common/src/parallel.cu:294
+  // TODO: Add UNLIKELY to optimize?
+  if (!if_expr) {
+__kmpc_serialized_parallel(ident, global_tid);

ggeorgakoudis wrote:
> jdoerfert wrote:
> > This should allow us to remove the `SeqGen` in the Clang CodeGen *and* fix 
> > PR49777 *and* fix PR49779, a win-win-win situation.
> Please check
Check? Can we add the two reproducers as tests, please. One should be a clang 
test, the other maybe a runtime test, though clang test might suffice.



Comment at: openmp/libomptarget/utils/generate_microtask_cases.py:31
+if __name__ == "__main__":
+main()

Great. The output is not pretty but that was not the objective ;)


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D95976/new/

https://reviews.llvm.org/D95976

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D95976: [OpenMP] Simplify offloading parallel call codegen

2021-04-16 Thread Giorgis Georgakoudis via Phabricator via cfe-commits
ggeorgakoudis marked 4 inline comments as done.
ggeorgakoudis added inline comments.



Comment at: clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp:2192
 RCG(CGF);
   }
 }

jdoerfert wrote:
> Can we remove SeqGen while we are here please. We need to check in the 
> runtime anyway. That check is later folded, no need to make things more 
> complicated here.
Done



Comment at: openmp/libomptarget/deviceRTLs/common/src/parallel.cu:294
+  // TODO: Add UNLIKELY to optimize?
+  if (!if_expr) {
+__kmpc_serialized_parallel(ident, global_tid);

jdoerfert wrote:
> This should allow us to remove the `SeqGen` in the Clang CodeGen *and* fix 
> PR49777 *and* fix PR49779, a win-win-win situation.
Please check


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D95976/new/

https://reviews.llvm.org/D95976

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D95976: [OpenMP] Simplify offloading parallel call codegen

2021-04-16 Thread Giorgis Georgakoudis via Phabricator via cfe-commits
ggeorgakoudis updated this revision to Diff 338246.
ggeorgakoudis added a comment.

Update for comments, fix for windows fix


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D95976/new/

https://reviews.llvm.org/D95976

Files:
  clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp
  clang/test/OpenMP/nvptx_allocate_codegen.cpp
  clang/test/OpenMP/nvptx_data_sharing.cpp
  clang/test/OpenMP/nvptx_distribute_parallel_generic_mode_codegen.cpp
  clang/test/OpenMP/nvptx_lambda_capturing.cpp
  clang/test/OpenMP/nvptx_parallel_codegen.cpp
  clang/test/OpenMP/nvptx_parallel_for_codegen.cpp
  clang/test/OpenMP/nvptx_target_codegen.cpp
  clang/test/OpenMP/nvptx_target_parallel_codegen.cpp
  clang/test/OpenMP/nvptx_target_parallel_num_threads_codegen.cpp
  clang/test/OpenMP/nvptx_target_teams_codegen.cpp
  clang/test/OpenMP/nvptx_target_teams_distribute_codegen.cpp
  clang/test/OpenMP/nvptx_target_teams_distribute_parallel_for_codegen.cpp
  
clang/test/OpenMP/nvptx_target_teams_distribute_parallel_for_generic_mode_codegen.cpp
  clang/test/OpenMP/nvptx_target_teams_distribute_parallel_for_simd_codegen.cpp
  clang/test/OpenMP/nvptx_teams_reduction_codegen.cpp
  clang/test/OpenMP/remarks_parallel_in_multiple_target_state_machines.c
  clang/test/OpenMP/remarks_parallel_in_target_state_machine.c
  clang/test/OpenMP/target_parallel_debug_codegen.cpp
  clang/test/OpenMP/target_parallel_for_debug_codegen.cpp
  llvm/include/llvm/Frontend/OpenMP/OMPKinds.def
  llvm/lib/Transforms/IPO/OpenMPOpt.cpp
  llvm/test/Transforms/OpenMP/gpu_state_machine_function_ptr_replacement.ll
  openmp/libomptarget/deviceRTLs/common/generated_microtask_cases.gen
  openmp/libomptarget/deviceRTLs/common/src/omptarget.cu
  openmp/libomptarget/deviceRTLs/common/src/parallel.cu
  openmp/libomptarget/deviceRTLs/common/src/support.cu
  openmp/libomptarget/deviceRTLs/common/support.h
  openmp/libomptarget/deviceRTLs/interface.h
  openmp/libomptarget/utils/generate_microtask_cases.py

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D95976: [OpenMP] Simplify offloading parallel call codegen

2021-04-16 Thread Michael Kruse via Phabricator via cfe-commits
Meinersbur requested changes to this revision.
Meinersbur added inline comments.
This revision now requires changes to proceed.



Comment at: clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp:569-570
+  else
+ThreadLimit = Bld.CreateNUWSub(RT.getGPUNumThreads(CGF),
+   RT.getGPUWarpSize(CGF), "thread_limit");
+  assert(ThreadLimit != nullptr && "Expected non-null ThreadLimit");

getGPUNumThreads and getGPUWarpSize still have undefined call order.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D95976/new/

https://reviews.llvm.org/D95976

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D95976: [OpenMP] Simplify offloading parallel call codegen

2021-04-16 Thread Giorgis Georgakoudis via Phabricator via cfe-commits
ggeorgakoudis updated this revision to Diff 338102.
ggeorgakoudis added a comment.

Fix for getThreadsLimit


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D95976/new/

https://reviews.llvm.org/D95976

Files:
  clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp
  clang/test/OpenMP/nvptx_allocate_codegen.cpp
  clang/test/OpenMP/nvptx_data_sharing.cpp
  clang/test/OpenMP/nvptx_distribute_parallel_generic_mode_codegen.cpp
  clang/test/OpenMP/nvptx_lambda_capturing.cpp
  clang/test/OpenMP/nvptx_parallel_codegen.cpp
  clang/test/OpenMP/nvptx_parallel_for_codegen.cpp
  clang/test/OpenMP/nvptx_target_codegen.cpp
  clang/test/OpenMP/nvptx_target_parallel_codegen.cpp
  clang/test/OpenMP/nvptx_target_parallel_num_threads_codegen.cpp
  clang/test/OpenMP/nvptx_target_teams_codegen.cpp
  clang/test/OpenMP/nvptx_target_teams_distribute_codegen.cpp
  clang/test/OpenMP/nvptx_target_teams_distribute_parallel_for_codegen.cpp
  
clang/test/OpenMP/nvptx_target_teams_distribute_parallel_for_generic_mode_codegen.cpp
  clang/test/OpenMP/nvptx_target_teams_distribute_parallel_for_simd_codegen.cpp
  clang/test/OpenMP/nvptx_teams_reduction_codegen.cpp
  clang/test/OpenMP/remarks_parallel_in_multiple_target_state_machines.c
  clang/test/OpenMP/remarks_parallel_in_target_state_machine.c
  clang/test/OpenMP/target_parallel_debug_codegen.cpp
  clang/test/OpenMP/target_parallel_for_debug_codegen.cpp
  llvm/include/llvm/Frontend/OpenMP/OMPKinds.def
  llvm/lib/Transforms/IPO/OpenMPOpt.cpp
  llvm/test/Transforms/OpenMP/gpu_state_machine_function_ptr_replacement.ll
  openmp/libomptarget/deviceRTLs/common/src/omptarget.cu
  openmp/libomptarget/deviceRTLs/common/src/parallel.cu
  openmp/libomptarget/deviceRTLs/common/src/support.cu
  openmp/libomptarget/deviceRTLs/common/support.h
  openmp/libomptarget/deviceRTLs/interface.h

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D95976: [OpenMP] Simplify offloading parallel call codegen

2021-04-15 Thread Michael Kruse via Phabricator via cfe-commits
Meinersbur added a comment.

The transposition problem arises from:

  


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D95976/new/

https://reviews.llvm.org/D95976

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D95976: [OpenMP] Simplify offloading parallel call codegen

2021-04-15 Thread Johannes Doerfert via Phabricator via cfe-commits
jdoerfert added a comment.

I have only minor remarks but I'd like you to check if my hunch is correct and 
the proposed modifications will fix fix PR49777 *and* fix PR49779.
Also, the number of arguments need to be increased, let's go big and automatic 
here.

Other than that I think this looks good.




Comment at: clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp:2192
 RCG(CGF);
   }
 }

Can we remove SeqGen while we are here please. We need to check in the runtime 
anyway. That check is later folded, no need to make things more complicated 
here.



Comment at: openmp/libomptarget/deviceRTLs/common/src/parallel.cu:294
+  // TODO: Add UNLIKELY to optimize?
+  if (!if_expr) {
+__kmpc_serialized_parallel(ident, global_tid);

This should allow us to remove the `SeqGen` in the Clang CodeGen *and* fix 
PR49777 *and* fix PR49779, a win-win-win situation.



Comment at: openmp/libomptarget/deviceRTLs/common/src/parallel.cu:368
+  //  __kmpc_push_proc_bind(ident, global_tid, proc_bind);
+}
+

FWIW, The implementation here is a stopgap until we move to the new runtime. 
The codegen and interface are the important parts.



Comment at: openmp/libomptarget/deviceRTLs/common/src/support.cu:370
+printf("Too many arguments in kmp_invoke_microtask, aborting 
execution.\n");
+return;
+  }

Not a return but a `__builtin_trap()`, please.
We also need this for more than 16 unfortunately, I've seen 20 in miniqmc.
We might want to create a script to print the cases, and then generate 128 or 
something like that in a file we include. The script can be in the utils folder 
too.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D95976/new/

https://reviews.llvm.org/D95976

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D95976: [OpenMP] Simplify offloading parallel call codegen

2021-04-15 Thread Giorgis Georgakoudis via Phabricator via cfe-commits
ggeorgakoudis added a comment.

Hi @Meinersbur (got word you are a windows user), @jdoerfert, could I ask your 
help in detecting why the clang tests on windows are failing? There are two 
failures I'm spotting, one is that calls to llvm.nvvm intrinsics seem 
transposed (https://reviews.llvm.org/harbormaster/unit/view/552591/) and 
another that attribute regexes are not recognized 
(https://reviews.llvm.org/harbormaster/unit/view/552593/ at 
nvptx_target_codegen.cpp:723:17). Maybe there is something else I'm missing and 
I'd appreciate the extra eyeballing on the problem.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D95976/new/

https://reviews.llvm.org/D95976

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D95976: [OpenMP] Simplify offloading parallel call codegen

2021-04-14 Thread Giorgis Georgakoudis via Phabricator via cfe-commits
ggeorgakoudis updated this revision to Diff 337556.
ggeorgakoudis added a comment.

Fix llvm test


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D95976/new/

https://reviews.llvm.org/D95976

Files:
  clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp
  clang/test/OpenMP/nvptx_allocate_codegen.cpp
  clang/test/OpenMP/nvptx_data_sharing.cpp
  clang/test/OpenMP/nvptx_distribute_parallel_generic_mode_codegen.cpp
  clang/test/OpenMP/nvptx_lambda_capturing.cpp
  clang/test/OpenMP/nvptx_parallel_codegen.cpp
  clang/test/OpenMP/nvptx_parallel_for_codegen.cpp
  clang/test/OpenMP/nvptx_target_codegen.cpp
  clang/test/OpenMP/nvptx_target_parallel_codegen.cpp
  clang/test/OpenMP/nvptx_target_parallel_num_threads_codegen.cpp
  clang/test/OpenMP/nvptx_target_teams_codegen.cpp
  clang/test/OpenMP/nvptx_target_teams_distribute_codegen.cpp
  clang/test/OpenMP/nvptx_target_teams_distribute_parallel_for_codegen.cpp
  
clang/test/OpenMP/nvptx_target_teams_distribute_parallel_for_generic_mode_codegen.cpp
  clang/test/OpenMP/nvptx_target_teams_distribute_parallel_for_simd_codegen.cpp
  clang/test/OpenMP/nvptx_teams_reduction_codegen.cpp
  clang/test/OpenMP/remarks_parallel_in_multiple_target_state_machines.c
  clang/test/OpenMP/remarks_parallel_in_target_state_machine.c
  clang/test/OpenMP/target_parallel_debug_codegen.cpp
  clang/test/OpenMP/target_parallel_for_debug_codegen.cpp
  llvm/include/llvm/Frontend/OpenMP/OMPKinds.def
  llvm/lib/Transforms/IPO/OpenMPOpt.cpp
  llvm/test/Transforms/OpenMP/gpu_state_machine_function_ptr_replacement.ll
  openmp/libomptarget/deviceRTLs/common/src/omptarget.cu
  openmp/libomptarget/deviceRTLs/common/src/parallel.cu
  openmp/libomptarget/deviceRTLs/common/src/support.cu
  openmp/libomptarget/deviceRTLs/common/support.h
  openmp/libomptarget/deviceRTLs/interface.h

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D95976: [OpenMP] Simplify offloading parallel call codegen

2021-04-13 Thread Giorgis Georgakoudis via Phabricator via cfe-commits
ggeorgakoudis updated this revision to Diff 337183.
ggeorgakoudis added a comment.

Add aux-triple to one test, check unit test builder on windows


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D95976/new/

https://reviews.llvm.org/D95976

Files:
  clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp
  clang/test/OpenMP/nvptx_allocate_codegen.cpp
  clang/test/OpenMP/nvptx_data_sharing.cpp
  clang/test/OpenMP/nvptx_distribute_parallel_generic_mode_codegen.cpp
  clang/test/OpenMP/nvptx_lambda_capturing.cpp
  clang/test/OpenMP/nvptx_parallel_codegen.cpp
  clang/test/OpenMP/nvptx_parallel_for_codegen.cpp
  clang/test/OpenMP/nvptx_target_codegen.cpp
  clang/test/OpenMP/nvptx_target_parallel_codegen.cpp
  clang/test/OpenMP/nvptx_target_parallel_num_threads_codegen.cpp
  clang/test/OpenMP/nvptx_target_teams_codegen.cpp
  clang/test/OpenMP/nvptx_target_teams_distribute_codegen.cpp
  clang/test/OpenMP/nvptx_target_teams_distribute_parallel_for_codegen.cpp
  
clang/test/OpenMP/nvptx_target_teams_distribute_parallel_for_generic_mode_codegen.cpp
  clang/test/OpenMP/nvptx_target_teams_distribute_parallel_for_simd_codegen.cpp
  clang/test/OpenMP/nvptx_teams_reduction_codegen.cpp
  clang/test/OpenMP/remarks_parallel_in_multiple_target_state_machines.c
  clang/test/OpenMP/remarks_parallel_in_target_state_machine.c
  clang/test/OpenMP/target_parallel_debug_codegen.cpp
  clang/test/OpenMP/target_parallel_for_debug_codegen.cpp
  llvm/include/llvm/Frontend/OpenMP/OMPKinds.def
  llvm/lib/Transforms/IPO/OpenMPOpt.cpp
  openmp/libomptarget/deviceRTLs/common/src/omptarget.cu
  openmp/libomptarget/deviceRTLs/common/src/parallel.cu
  openmp/libomptarget/deviceRTLs/common/src/support.cu
  openmp/libomptarget/deviceRTLs/common/support.h
  openmp/libomptarget/deviceRTLs/interface.h

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D95976: [OpenMP] Simplify offloading parallel call codegen

2021-04-13 Thread Giorgis Georgakoudis via Phabricator via cfe-commits
ggeorgakoudis updated this revision to Diff 337141.
ggeorgakoudis added a comment.
Herald added a subscriber: hiraditya.

Add tests, update OpenMPOpt, rebase to main


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D95976/new/

https://reviews.llvm.org/D95976

Files:
  clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp
  clang/test/OpenMP/nvptx_allocate_codegen.cpp
  clang/test/OpenMP/nvptx_data_sharing.cpp
  clang/test/OpenMP/nvptx_distribute_parallel_generic_mode_codegen.cpp
  clang/test/OpenMP/nvptx_lambda_capturing.cpp
  clang/test/OpenMP/nvptx_parallel_codegen.cpp
  clang/test/OpenMP/nvptx_parallel_for_codegen.cpp
  clang/test/OpenMP/nvptx_target_codegen.cpp
  clang/test/OpenMP/nvptx_target_parallel_codegen.cpp
  clang/test/OpenMP/nvptx_target_parallel_num_threads_codegen.cpp
  clang/test/OpenMP/nvptx_target_teams_codegen.cpp
  clang/test/OpenMP/nvptx_target_teams_distribute_codegen.cpp
  clang/test/OpenMP/nvptx_target_teams_distribute_parallel_for_codegen.cpp
  
clang/test/OpenMP/nvptx_target_teams_distribute_parallel_for_generic_mode_codegen.cpp
  clang/test/OpenMP/nvptx_target_teams_distribute_parallel_for_simd_codegen.cpp
  clang/test/OpenMP/nvptx_teams_reduction_codegen.cpp
  clang/test/OpenMP/remarks_parallel_in_multiple_target_state_machines.c
  clang/test/OpenMP/remarks_parallel_in_target_state_machine.c
  clang/test/OpenMP/target_parallel_debug_codegen.cpp
  clang/test/OpenMP/target_parallel_for_debug_codegen.cpp
  llvm/include/llvm/Frontend/OpenMP/OMPKinds.def
  llvm/lib/Transforms/IPO/OpenMPOpt.cpp
  openmp/libomptarget/deviceRTLs/common/src/omptarget.cu
  openmp/libomptarget/deviceRTLs/common/src/parallel.cu
  openmp/libomptarget/deviceRTLs/common/src/support.cu
  openmp/libomptarget/deviceRTLs/common/support.h
  openmp/libomptarget/deviceRTLs/interface.h

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D95976: [OpenMP] Simplify offloading parallel call codegen

2021-02-04 Thread Giorgis Georgakoudis via Phabricator via cfe-commits
ggeorgakoudis updated this revision to Diff 321375.
ggeorgakoudis added a comment.

Fix type for IfCond, formatting


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D95976/new/

https://reviews.llvm.org/D95976

Files:
  clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp
  llvm/include/llvm/Frontend/OpenMP/OMPKinds.def
  openmp/libomptarget/deviceRTLs/common/src/omptarget.cu
  openmp/libomptarget/deviceRTLs/common/src/parallel.cu
  openmp/libomptarget/deviceRTLs/common/src/support.cu
  openmp/libomptarget/deviceRTLs/common/support.h
  openmp/libomptarget/deviceRTLs/interface.h

Index: openmp/libomptarget/deviceRTLs/interface.h
===
--- openmp/libomptarget/deviceRTLs/interface.h
+++ openmp/libomptarget/deviceRTLs/interface.h
@@ -177,6 +177,7 @@
  * The struct is identical to the one in the kmp.h file.
  * We maintain the same data structure for compatibility.
  */
+typedef short kmp_int16;
 typedef int kmp_int32;
 typedef struct ident {
   kmp_int32 reserved_1; /**<  might be used in Fortran; see above  */
@@ -437,6 +438,22 @@
 EXTERN void __kmpc_end_sharing_variables();
 EXTERN void __kmpc_get_shared_variables(void ***GlobalArgs);
 
+/// Entry point to start a new parallel region.
+///
+/// \param ident   The source identifier.
+/// \param global_tid  The global thread ID.
+/// \param if_expr The if(expr), or 1 if none given.
+/// \param num_threads The num_threads(expr), or -1 if none given.
+/// \param proc_bind   The proc_bind, or `proc_bind_default` if none given.
+/// \param fn  The outlined parallel region function.
+/// \param wrapper_fn  The worker wrapper function of fn.
+/// \param argsThe pointer array of arguments to fn.
+/// \param nargs   The number of arguments to fn.
+EXTERN void __kmpc_parallel_51(ident_t *ident, kmp_int32 global_tid,
+   kmp_int32 if_expr, kmp_int32 num_threads,
+   int proc_bind, void *fn, void *wrapper_fn,
+   void **args, size_t nargs);
+
 // SPMD execution mode interrogation function.
 EXTERN int8_t __kmpc_is_spmd_exec_mode();
 
Index: openmp/libomptarget/deviceRTLs/common/support.h
===
--- openmp/libomptarget/deviceRTLs/common/support.h
+++ openmp/libomptarget/deviceRTLs/common/support.h
@@ -95,4 +95,9 @@
 DEVICE unsigned int *GetTeamsReductionTimestamp();
 DEVICE char *GetTeamsReductionScratchpad();
 
+// Invoke an outlined parallel function unwrapping global, shared arguments (up
+// to 16).
+DEVICE void __kmp_invoke_microtask(kmp_int32 global_tid, kmp_int32 bound_tid,
+   void *fn, void **args, size_t nargs);
+
 #endif
Index: openmp/libomptarget/deviceRTLs/common/src/support.cu
===
--- openmp/libomptarget/deviceRTLs/common/src/support.cu
+++ openmp/libomptarget/deviceRTLs/common/src/support.cu
@@ -265,4 +265,110 @@
   return static_cast(ReductionScratchpadPtr) + 256;
 }
 
+// Invoke an outlined parallel function unwrapping arguments (up
+// to 16).
+DEVICE void __kmp_invoke_microtask(kmp_int32 global_tid, kmp_int32 bound_tid,
+   void *fn, void **args, size_t nargs) {
+  switch (nargs) {
+  case 0:
+((void (*)(kmp_int32 *, kmp_int32 *))fn)(_tid, _tid);
+break;
+  case 1:
+((void (*)(kmp_int32 *, kmp_int32 *, void *))fn)(_tid, _tid,
+ args[0]);
+break;
+  case 2:
+((void (*)(kmp_int32 *, kmp_int32 *, void *, void *))fn)(
+_tid, _tid, args[0], args[1]);
+break;
+  case 3:
+((void (*)(kmp_int32 *, kmp_int32 *, void *, void *, void *))fn)(
+_tid, _tid, args[0], args[1], args[2]);
+break;
+  case 4:
+((void (*)(kmp_int32 *, kmp_int32 *, void *, void *, void *, void *))fn)(
+_tid, _tid, args[0], args[1], args[2], args[3]);
+break;
+  case 5:
+((void (*)(kmp_int32 *, kmp_int32 *, void *, void *, void *, void *,
+   void *))fn)(_tid, _tid, args[0], args[1], args[2],
+   args[3], args[4]);
+break;
+  case 6:
+((void (*)(kmp_int32 *, kmp_int32 *, void *, void *, void *, void *, void *,
+   void *))fn)(_tid, _tid, args[0], args[1], args[2],
+   args[3], args[4], args[5]);
+break;
+  case 7:
+((void (*)(kmp_int32 *, kmp_int32 *, void *, void *, void *, void *, void *,
+   void *, void *))fn)(_tid, _tid, args[0], args[1],
+   args[2], args[3], args[4], args[5], args[6]);
+break;
+  case 8:
+((void (*)(kmp_int32 *, kmp_int32 *, void *, void *, void *, void *, void *,
+   void *, void *, void *))fn)(_tid, _tid, args[0],
+   args[1], args[2], args[3], args[4],
+   

[PATCH] D95976: [OpenMP] Simplify offloading parallel call codegen

2021-02-03 Thread Giorgis Georgakoudis via Phabricator via cfe-commits
ggeorgakoudis created this revision.
Herald added subscribers: jfb, guansong, yaxunl.
ggeorgakoudis requested review of this revision.
Herald added a reviewer: jdoerfert.
Herald added subscribers: llvm-commits, openmp-commits, cfe-commits, sstefan1.
Herald added projects: clang, OpenMP, LLVM.

Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D95976

Files:
  clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp
  llvm/include/llvm/Frontend/OpenMP/OMPKinds.def
  openmp/libomptarget/deviceRTLs/common/src/omptarget.cu
  openmp/libomptarget/deviceRTLs/common/src/parallel.cu
  openmp/libomptarget/deviceRTLs/common/src/support.cu
  openmp/libomptarget/deviceRTLs/common/support.h
  openmp/libomptarget/deviceRTLs/interface.h

Index: openmp/libomptarget/deviceRTLs/interface.h
===
--- openmp/libomptarget/deviceRTLs/interface.h
+++ openmp/libomptarget/deviceRTLs/interface.h
@@ -177,6 +177,7 @@
  * The struct is identical to the one in the kmp.h file.
  * We maintain the same data structure for compatibility.
  */
+typedef short kmp_int16;
 typedef int kmp_int32;
 typedef struct ident {
   kmp_int32 reserved_1; /**<  might be used in Fortran; see above  */
@@ -437,6 +438,23 @@
 EXTERN void __kmpc_end_sharing_variables();
 EXTERN void __kmpc_get_shared_variables(void ***GlobalArgs);
 
+/// Entry point to start a new parallel region.
+///
+/// \param ident   The source identifier.
+/// \param global_tid  The global thread ID.
+/// \param if_expr The if(expr), or 1 if none given.
+/// \param num_threads The num_threads(expr), or -1 if none given.
+/// \param proc_bind   The proc_bind, or `proc_bind_default` if none given.
+/// \param fn  The outlined parallel region function.
+/// \param wrapper_fn  The worker wrapper function for the outlined parallel
+/// region. \param argsThe pointer array of arguments to the outlined
+/// region function. \param nargs   The number of arguments to the outlined
+/// region function.
+EXTERN void __kmpc_parallel_51(ident_t *ident, kmp_int32 global_tid,
+   kmp_int32 if_expr, kmp_int32 num_threads,
+   int proc_bind, void *fn, void *wrapper_fn,
+   void **SharedArgs, size_t nArgs);
+
 // SPMD execution mode interrogation function.
 EXTERN int8_t __kmpc_is_spmd_exec_mode();
 
Index: openmp/libomptarget/deviceRTLs/common/support.h
===
--- openmp/libomptarget/deviceRTLs/common/support.h
+++ openmp/libomptarget/deviceRTLs/common/support.h
@@ -95,4 +95,9 @@
 DEVICE unsigned int *GetTeamsReductionTimestamp();
 DEVICE char *GetTeamsReductionScratchpad();
 
+// Invoke an outlined parallel function unwrapping global, shared arguments (up
+// to 16).
+DEVICE void __kmp_invoke_microtask(kmp_int32 global_tid, kmp_int32 bound_tid,
+   void *fn, void **args, size_t nargs);
+
 #endif
Index: openmp/libomptarget/deviceRTLs/common/src/support.cu
===
--- openmp/libomptarget/deviceRTLs/common/src/support.cu
+++ openmp/libomptarget/deviceRTLs/common/src/support.cu
@@ -265,4 +265,109 @@
   return static_cast(ReductionScratchpadPtr) + 256;
 }
 
+// Invoke an outlined parallel function unwrapping global, shared arguments (up to 16).
+DEVICE void __kmp_invoke_microtask(kmp_int32 global_tid, kmp_int32 bound_tid,
+   void *fn, void **args, size_t nargs) {
+  switch (nargs) {
+  case 0:
+((void (*)(kmp_int32 *, kmp_int32 *))fn)(_tid, _tid);
+break;
+  case 1:
+((void (*)(kmp_int32 *, kmp_int32 *, void *))fn)(_tid, _tid,
+ args[0]);
+break;
+  case 2:
+((void (*)(kmp_int32 *, kmp_int32 *, void *, void *))fn)(
+_tid, _tid, args[0], args[1]);
+break;
+  case 3:
+((void (*)(kmp_int32 *, kmp_int32 *, void *, void *, void *))fn)(
+_tid, _tid, args[0], args[1], args[2]);
+break;
+  case 4:
+((void (*)(kmp_int32 *, kmp_int32 *, void *, void *, void *, void *))fn)(
+_tid, _tid, args[0], args[1], args[2], args[3]);
+break;
+  case 5:
+((void (*)(kmp_int32 *, kmp_int32 *, void *, void *, void *, void *,
+   void *))fn)(_tid, _tid, args[0], args[1], args[2],
+   args[3], args[4]);
+break;
+  case 6:
+((void (*)(kmp_int32 *, kmp_int32 *, void *, void *, void *, void *, void *,
+   void *))fn)(_tid, _tid, args[0], args[1], args[2],
+   args[3], args[4], args[5]);
+break;
+  case 7:
+((void (*)(kmp_int32 *, kmp_int32 *, void *, void *, void *, void *, void *,
+   void *, void *))fn)(_tid, _tid, args[0], args[1],
+   args[2], args[3], args[4], args[5], args[6]);
+break;
+  case 8:
+((void (*)(kmp_int32 *,