[PATCH] D80222: Replace Clang's createRuntimeFunction with the definitions in OMPKinds.def

2020-05-27 Thread Andrey Churbanov via Phabricator via cfe-commits
AndreyChurbanov added inline comments.



Comment at: openmp/runtime/test/tasking/kmp_taskloop.c:100
 th_counter[i] = 0;
-#pragma omp parallel num_threads(N)
+#pragma omp parallel // num_threads(N)
 {

jhuber6 wrote:
> jhuber6 wrote:
> > jdoerfert wrote:
> > > jhuber6 wrote:
> > > > I am not entirely sure why, but commenting this out causes the problem 
> > > > to go away. I tried adding proper names to the forward-declared 
> > > > functions but since clang already knew I had something called ident_t, 
> > > > I couldn't declare a new struct with the same name.
> > > This is not good. The difference should only be that the `kmpc_fork_call` 
> > > has a different argument, right? Does the segfault happen at compile or 
> > > runtime?
> > > 
> > > You can just use the ident_t clang created, right? Did you print the 
> > > function names requested by clang as we discussed?
> > I added an assertion and debug statements. If I try to declare a struct 
> > named "Ident_t" I get the following error message in the seg-fault. I think 
> > the seg-fault is compile-time.
> > 
> > Found OpenMP runtime function __kmpc_global_thread_num with type i32 
> > (%struct.ident_t.0*). Expected type is i32 (%struct.ident_t*)
> > clang: 
> > /home/jhuber/Documents/llvm-project/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp:124:
> >  static llvm::Function* 
> > llvm::OpenMPIRBuilder::getOrCreateRuntimeFunction(llvm::Module&, 
> > llvm::omp::RuntimeFunction): Assertion `FnTy == Fn->getFunctionType() && 
> > "Found OpenMP runtime function has mismatched types"' failed.
> I'm not sure if there's a way around this without changing the 
> getOrCreateRuntimeFunction method to return a FunctionCallee and removing the 
> assertion. Clang doesn't know about the ident_t struct when it's compiling 
> the file, but when its doing the codegen it sees two structs with the same 
> name and creates a new name. So when it gets the types it says that ident_t 
> and ident_t.0 don't match. As you said the old version got around this by 
> adding a bitcasting instruction so it knew how to turn it into an ident_t 
> pointer.
Note that this change breaks the test on any system with more that 4 procs.  
Because array th_counter[4] is indexed by thread number which can easily be 
greater than 3 if number of threads is not limited.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D80222/new/

https://reviews.llvm.org/D80222



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D97085: [OpenMP] libomp: implement OpenMP 5.1 inoutset task dependence type

2021-05-12 Thread Andrey Churbanov via Phabricator via cfe-commits
AndreyChurbanov updated this revision to Diff 344939.
AndreyChurbanov added a comment.

Addressed review comments.
Fixed tests.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D97085/new/

https://reviews.llvm.org/D97085

Files:
  clang/lib/CodeGen/CGOpenMPRuntime.cpp
  clang/test/OpenMP/depobj_codegen.cpp
  clang/test/OpenMP/target_enter_data_depend_codegen.cpp
  clang/test/OpenMP/target_exit_data_depend_codegen.cpp
  clang/test/OpenMP/target_update_depend_codegen.cpp
  clang/test/OpenMP/task_codegen.c
  clang/test/OpenMP/task_codegen.cpp
  openmp/runtime/src/kmp.h
  openmp/runtime/src/kmp_taskdeps.cpp
  openmp/runtime/src/kmp_taskdeps.h
  openmp/runtime/test/tasking/bug_nested_proxy_task.c
  openmp/runtime/test/tasking/bug_proxy_task_dep_waiting.c
  openmp/runtime/test/tasking/hidden_helper_task/common.h
  openmp/runtime/test/tasking/hidden_helper_task/depend.cpp
  openmp/runtime/test/tasking/hidden_helper_task/gtid.cpp
  openmp/runtime/test/tasking/omp51_task_dep_inoutset.c

Index: openmp/runtime/test/tasking/omp51_task_dep_inoutset.c
===
--- /dev/null
+++ openmp/runtime/test/tasking/omp51_task_dep_inoutset.c
@@ -0,0 +1,258 @@
+// RUN: %libomp-compile-and-run
+// RUN: %libomp-cxx-compile-and-run
+// UNSUPPORTED: gcc
+
+// Tests OMP 5.0 task dependences "mutexinoutset" and 5.1 "inoutset",
+// emulates compiler codegen for new dep kinds
+// Mutually exclusive tasks get same input dependency info array
+//
+// Task tree created:
+//  task0 - task1 (in)
+// \
+//task2 - task3 (inoutset)
+// /
+//  task3 - task4 (in)
+//   /
+//  task6 <-->task7  (mutexinoutset)
+//   \/
+//   task8 (in)
+//
+#include 
+#include 
+
+#ifdef _WIN32
+#include 
+#define mysleep(n) Sleep(n)
+#else
+#include 
+#define mysleep(n) usleep((n)*1000)
+#endif
+
+// to check the # of concurrent tasks (must be 1 for MTX, <3 for other kinds)
+static int volatile checker = 0;
+static int err = 0;
+#ifndef DELAY
+#define DELAY 100
+#endif
+
+// ---
+// internal data to emulate compiler codegen
+typedef struct DEP {
+  size_t addr;
+  size_t len;
+  int flags;
+} dep;
+typedef struct task {
+  void** shareds;
+  void* entry;
+  int part_id;
+  void* destr_thunk;
+  int priority;
+  long long device_id;
+  int f_priv;
+} task_t;
+#define TIED 1
+typedef int(*entry_t)(int, task_t*);
+typedef struct ID {
+  int reserved_1;
+  int flags;
+  int reserved_2;
+  int reserved_3;
+  char *psource;
+} id;
+// thunk routine for tasks with MTX dependency
+int thunk_m(int gtid, task_t* ptask) {
+  int th = omp_get_thread_num();
+  #pragma omp atomic
+++checker;
+  printf("task _%d, th %d\n", ptask->f_priv, th);
+  if (checker != 1) { // no more than 1 task at a time
+err++;
+printf("Error1, checker %d != 1\n", checker);
+  }
+  mysleep(DELAY);
+  if (checker != 1) { // no more than 1 task at a time
+err++;
+printf("Error2, checker %d != 1\n", checker);
+  }
+  #pragma omp atomic
+--checker;
+  return 0;
+}
+// thunk routine for tasks with inoutset dependency
+int thunk_s(int gtid, task_t* ptask) {
+  int th = omp_get_thread_num();
+  #pragma omp atomic
+++checker;
+  printf("task _%d, th %d\n", ptask->f_priv, th);
+  if (checker > 2) { // no more than 2 tasks concurrently
+err++;
+printf("Error1, checker %d > 2\n", checker);
+  }
+  mysleep(DELAY);
+  if (checker > 2) { // no more than 2 tasks concurrently
+err++;
+printf("Error2, checker %d > 2\n", checker);
+  }
+  #pragma omp atomic
+--checker;
+  return 0;
+}
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+int __kmpc_global_thread_num(id*);
+extern task_t* __kmpc_omp_task_alloc(id *loc, int gtid, int flags,
+ size_t sz, size_t shar, entry_t rtn);
+int
+__kmpc_omp_task_with_deps(id *loc, int gtid, task_t *task, int nd, dep *dep_lst,
+  int nd_noalias, dep *noalias_dep_lst);
+static id loc = {0, 2, 0, 0, ";file;func;0;0;;"};
+#ifdef __cplusplus
+} // extern "C"
+#endif
+// End of internal data
+// ---
+
+int main()
+{
+  int i1,i2,i3;
+  omp_set_num_threads(4);
+  omp_set_dynamic(0);
+  #pragma omp parallel
+  {
+#pragma omp single nowait
+{
+  dep sdep[2];
+  task_t *ptr;
+  int gtid = __kmpc_global_thread_num(&loc);
+  int t = omp_get_thread_num();
+  #pragma omp task depend(in: i1, i2)
+  { int th = omp_get_thread_num();
+printf("task 0_%d, th %d\n", t, th);
+#pragma omp atomic
+  ++checker;
+if (checker > 2) { // no more than 2 tasks concurrently
+  err++;
+  printf("Error1, checker %d > 2\n", checker);
+}
+mysleep(DELAY);
+if (checker > 2) { // no more than 2 tasks concurrently
+  

[PATCH] D97085: [OpenMP] libomp: implement OpenMP 5.1 inoutset task dependence type

2021-05-13 Thread Andrey Churbanov via Phabricator via cfe-commits
AndreyChurbanov updated this revision to Diff 345173.
AndreyChurbanov added a comment.

Fixed one more codegen test.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D97085/new/

https://reviews.llvm.org/D97085

Files:
  clang/lib/CodeGen/CGOpenMPRuntime.cpp
  clang/test/OpenMP/depobj_codegen.cpp
  clang/test/OpenMP/target_enter_data_depend_codegen.cpp
  clang/test/OpenMP/target_exit_data_depend_codegen.cpp
  clang/test/OpenMP/target_update_depend_codegen.cpp
  clang/test/OpenMP/task_codegen.c
  clang/test/OpenMP/task_codegen.cpp
  clang/test/OpenMP/task_if_codegen.cpp
  openmp/runtime/src/kmp.h
  openmp/runtime/src/kmp_taskdeps.cpp
  openmp/runtime/src/kmp_taskdeps.h
  openmp/runtime/test/tasking/bug_nested_proxy_task.c
  openmp/runtime/test/tasking/bug_proxy_task_dep_waiting.c
  openmp/runtime/test/tasking/hidden_helper_task/common.h
  openmp/runtime/test/tasking/hidden_helper_task/depend.cpp
  openmp/runtime/test/tasking/hidden_helper_task/gtid.cpp
  openmp/runtime/test/tasking/omp51_task_dep_inoutset.c

Index: openmp/runtime/test/tasking/omp51_task_dep_inoutset.c
===
--- /dev/null
+++ openmp/runtime/test/tasking/omp51_task_dep_inoutset.c
@@ -0,0 +1,258 @@
+// RUN: %libomp-compile-and-run
+// RUN: %libomp-cxx-compile-and-run
+// UNSUPPORTED: gcc
+
+// Tests OMP 5.0 task dependences "mutexinoutset" and 5.1 "inoutset",
+// emulates compiler codegen for new dep kinds
+// Mutually exclusive tasks get same input dependency info array
+//
+// Task tree created:
+//  task0 - task1 (in)
+// \
+//task2 - task3 (inoutset)
+// /
+//  task3 - task4 (in)
+//   /
+//  task6 <-->task7  (mutexinoutset)
+//   \/
+//   task8 (in)
+//
+#include 
+#include 
+
+#ifdef _WIN32
+#include 
+#define mysleep(n) Sleep(n)
+#else
+#include 
+#define mysleep(n) usleep((n)*1000)
+#endif
+
+// to check the # of concurrent tasks (must be 1 for MTX, <3 for other kinds)
+static int volatile checker = 0;
+static int err = 0;
+#ifndef DELAY
+#define DELAY 100
+#endif
+
+// ---
+// internal data to emulate compiler codegen
+typedef struct DEP {
+  size_t addr;
+  size_t len;
+  int flags;
+} dep;
+typedef struct task {
+  void** shareds;
+  void* entry;
+  int part_id;
+  void* destr_thunk;
+  int priority;
+  long long device_id;
+  int f_priv;
+} task_t;
+#define TIED 1
+typedef int(*entry_t)(int, task_t*);
+typedef struct ID {
+  int reserved_1;
+  int flags;
+  int reserved_2;
+  int reserved_3;
+  char *psource;
+} id;
+// thunk routine for tasks with MTX dependency
+int thunk_m(int gtid, task_t* ptask) {
+  int th = omp_get_thread_num();
+  #pragma omp atomic
+++checker;
+  printf("task _%d, th %d\n", ptask->f_priv, th);
+  if (checker != 1) { // no more than 1 task at a time
+err++;
+printf("Error1, checker %d != 1\n", checker);
+  }
+  mysleep(DELAY);
+  if (checker != 1) { // no more than 1 task at a time
+err++;
+printf("Error2, checker %d != 1\n", checker);
+  }
+  #pragma omp atomic
+--checker;
+  return 0;
+}
+// thunk routine for tasks with inoutset dependency
+int thunk_s(int gtid, task_t* ptask) {
+  int th = omp_get_thread_num();
+  #pragma omp atomic
+++checker;
+  printf("task _%d, th %d\n", ptask->f_priv, th);
+  if (checker > 2) { // no more than 2 tasks concurrently
+err++;
+printf("Error1, checker %d > 2\n", checker);
+  }
+  mysleep(DELAY);
+  if (checker > 2) { // no more than 2 tasks concurrently
+err++;
+printf("Error2, checker %d > 2\n", checker);
+  }
+  #pragma omp atomic
+--checker;
+  return 0;
+}
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+int __kmpc_global_thread_num(id*);
+extern task_t* __kmpc_omp_task_alloc(id *loc, int gtid, int flags,
+ size_t sz, size_t shar, entry_t rtn);
+int
+__kmpc_omp_task_with_deps(id *loc, int gtid, task_t *task, int nd, dep *dep_lst,
+  int nd_noalias, dep *noalias_dep_lst);
+static id loc = {0, 2, 0, 0, ";file;func;0;0;;"};
+#ifdef __cplusplus
+} // extern "C"
+#endif
+// End of internal data
+// ---
+
+int main()
+{
+  int i1,i2,i3;
+  omp_set_num_threads(4);
+  omp_set_dynamic(0);
+  #pragma omp parallel
+  {
+#pragma omp single nowait
+{
+  dep sdep[2];
+  task_t *ptr;
+  int gtid = __kmpc_global_thread_num(&loc);
+  int t = omp_get_thread_num();
+  #pragma omp task depend(in: i1, i2)
+  { int th = omp_get_thread_num();
+printf("task 0_%d, th %d\n", t, th);
+#pragma omp atomic
+  ++checker;
+if (checker > 2) { // no more than 2 tasks concurrently
+  err++;
+  printf("Error1, checker %d > 2\n", checker);
+}
+mysleep(DELAY);
+if (checker > 2) { // no more than

[PATCH] D97085: [OpenMP] libomp: implement OpenMP 5.1 inoutset task dependence type

2021-05-18 Thread Andrey Churbanov via Phabricator via cfe-commits
AndreyChurbanov updated this revision to Diff 346071.
AndreyChurbanov added a comment.

rebased


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D97085/new/

https://reviews.llvm.org/D97085

Files:
  clang/lib/CodeGen/CGOpenMPRuntime.cpp
  clang/test/OpenMP/depobj_codegen.cpp
  clang/test/OpenMP/target_enter_data_depend_codegen.cpp
  clang/test/OpenMP/target_exit_data_depend_codegen.cpp
  clang/test/OpenMP/target_update_depend_codegen.cpp
  clang/test/OpenMP/task_codegen.c
  clang/test/OpenMP/task_codegen.cpp
  clang/test/OpenMP/task_if_codegen.cpp
  openmp/runtime/src/kmp.h
  openmp/runtime/src/kmp_taskdeps.cpp
  openmp/runtime/src/kmp_taskdeps.h
  openmp/runtime/test/tasking/bug_nested_proxy_task.c
  openmp/runtime/test/tasking/bug_proxy_task_dep_waiting.c
  openmp/runtime/test/tasking/hidden_helper_task/common.h
  openmp/runtime/test/tasking/hidden_helper_task/depend.cpp
  openmp/runtime/test/tasking/hidden_helper_task/gtid.cpp
  openmp/runtime/test/tasking/omp51_task_dep_inoutset.c

Index: openmp/runtime/test/tasking/omp51_task_dep_inoutset.c
===
--- /dev/null
+++ openmp/runtime/test/tasking/omp51_task_dep_inoutset.c
@@ -0,0 +1,258 @@
+// RUN: %libomp-compile-and-run
+// RUN: %libomp-cxx-compile-and-run
+// UNSUPPORTED: gcc
+
+// Tests OMP 5.0 task dependences "mutexinoutset" and 5.1 "inoutset",
+// emulates compiler codegen for new dep kinds
+// Mutually exclusive tasks get same input dependency info array
+//
+// Task tree created:
+//  task0 - task1 (in)
+// \
+//task2 - task3 (inoutset)
+// /
+//  task3 - task4 (in)
+//   /
+//  task6 <-->task7  (mutexinoutset)
+//   \/
+//   task8 (in)
+//
+#include 
+#include 
+
+#ifdef _WIN32
+#include 
+#define mysleep(n) Sleep(n)
+#else
+#include 
+#define mysleep(n) usleep((n)*1000)
+#endif
+
+// to check the # of concurrent tasks (must be 1 for MTX, <3 for other kinds)
+static int volatile checker = 0;
+static int err = 0;
+#ifndef DELAY
+#define DELAY 100
+#endif
+
+// ---
+// internal data to emulate compiler codegen
+typedef struct DEP {
+  size_t addr;
+  size_t len;
+  int flags;
+} dep;
+typedef struct task {
+  void** shareds;
+  void* entry;
+  int part_id;
+  void* destr_thunk;
+  int priority;
+  long long device_id;
+  int f_priv;
+} task_t;
+#define TIED 1
+typedef int(*entry_t)(int, task_t*);
+typedef struct ID {
+  int reserved_1;
+  int flags;
+  int reserved_2;
+  int reserved_3;
+  char *psource;
+} id;
+// thunk routine for tasks with MTX dependency
+int thunk_m(int gtid, task_t* ptask) {
+  int th = omp_get_thread_num();
+  #pragma omp atomic
+++checker;
+  printf("task _%d, th %d\n", ptask->f_priv, th);
+  if (checker != 1) { // no more than 1 task at a time
+err++;
+printf("Error1, checker %d != 1\n", checker);
+  }
+  mysleep(DELAY);
+  if (checker != 1) { // no more than 1 task at a time
+err++;
+printf("Error2, checker %d != 1\n", checker);
+  }
+  #pragma omp atomic
+--checker;
+  return 0;
+}
+// thunk routine for tasks with inoutset dependency
+int thunk_s(int gtid, task_t* ptask) {
+  int th = omp_get_thread_num();
+  #pragma omp atomic
+++checker;
+  printf("task _%d, th %d\n", ptask->f_priv, th);
+  if (checker > 2) { // no more than 2 tasks concurrently
+err++;
+printf("Error1, checker %d > 2\n", checker);
+  }
+  mysleep(DELAY);
+  if (checker > 2) { // no more than 2 tasks concurrently
+err++;
+printf("Error2, checker %d > 2\n", checker);
+  }
+  #pragma omp atomic
+--checker;
+  return 0;
+}
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+int __kmpc_global_thread_num(id*);
+extern task_t* __kmpc_omp_task_alloc(id *loc, int gtid, int flags,
+ size_t sz, size_t shar, entry_t rtn);
+int
+__kmpc_omp_task_with_deps(id *loc, int gtid, task_t *task, int nd, dep *dep_lst,
+  int nd_noalias, dep *noalias_dep_lst);
+static id loc = {0, 2, 0, 0, ";file;func;0;0;;"};
+#ifdef __cplusplus
+} // extern "C"
+#endif
+// End of internal data
+// ---
+
+int main()
+{
+  int i1,i2,i3;
+  omp_set_num_threads(4);
+  omp_set_dynamic(0);
+  #pragma omp parallel
+  {
+#pragma omp single nowait
+{
+  dep sdep[2];
+  task_t *ptr;
+  int gtid = __kmpc_global_thread_num(&loc);
+  int t = omp_get_thread_num();
+  #pragma omp task depend(in: i1, i2)
+  { int th = omp_get_thread_num();
+printf("task 0_%d, th %d\n", t, th);
+#pragma omp atomic
+  ++checker;
+if (checker > 2) { // no more than 2 tasks concurrently
+  err++;
+  printf("Error1, checker %d > 2\n", checker);
+}
+mysleep(DELAY);
+if (checker > 2) { // no more than 2 tasks concurrently
+  err++;
+  printf("E

[PATCH] D97085: [OpenMP] libomp: implement OpenMP 5.1 inoutset task dependence type

2021-05-18 Thread Andrey Churbanov via Phabricator via cfe-commits
AndreyChurbanov added inline comments.



Comment at: openmp/runtime/src/kmp_taskdeps.cpp:344-346
+// link node as successor of all nodes in the prev_set if any
+npredecessors +=
+__kmp_depnode_link_successor(gtid, thread, task, node, prev_set);

protze.joachim wrote:
> If I understand this code right, this has O(n^2) complexity for two sets of 
> size n?
> 
> Consider:
> ```
> for (int i=0; i #pragma omp task depend(in:A)
> work(A,i);
> }
> for (int i=0; i #pragma omp task depend(inoutset:A)
> work(A,i);
> }
> ```
> All n tasks in the first loop would be predecessor for each of the tasks in 
> the second loop. This will also result in n^2 releases/notifications.
> 
> 
> I'd suggest to model the dependencies like:
> ```
> for (int i=0; i #pragma omp task depend(in:A)
> work(A,i);
> }
> #pragma omp taskwait depend(inout:A) nowait
> for (int i=0; i #pragma omp task depend(inoutset:A)
> work(A,i);
> }
> ```
> I.e., add a dummy dependency node between the two sets, which has n 
> predecessors and n successors. You probably understand the depnode code 
> better, than I do, but I think you would only need to add some code in line 
> 357, where `last_set` is moved to `prev_set`.
> This dummy node would reduce linking and releasing complexity to O(n).
This could be a separate research project.  Because such change may cause 
performance regressions on some real codes, so it needs thorough investigation. 
 I mean insertion of an auxiliary dummy node into dependency graph is 
definitely an overhead, which may or may not be overridden by reduction in 
number of edges in the graph.  Or it can be made optional optimization under an 
external control, if there is a significant gain in some particular case.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D97085/new/

https://reviews.llvm.org/D97085

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D97085: [OpenMP] libomp: implement OpenMP 5.1 inoutset task dependence type

2021-05-18 Thread Andrey Churbanov via Phabricator via cfe-commits
AndreyChurbanov added inline comments.



Comment at: openmp/runtime/src/kmp_taskdeps.cpp:344-346
+// link node as successor of all nodes in the prev_set if any
+npredecessors +=
+__kmp_depnode_link_successor(gtid, thread, task, node, prev_set);

protze.joachim wrote:
> AndreyChurbanov wrote:
> > protze.joachim wrote:
> > > If I understand this code right, this has O(n^2) complexity for two sets 
> > > of size n?
> > > 
> > > Consider:
> > > ```
> > > for (int i=0; i > > #pragma omp task depend(in:A)
> > > work(A,i);
> > > }
> > > for (int i=0; i > > #pragma omp task depend(inoutset:A)
> > > work(A,i);
> > > }
> > > ```
> > > All n tasks in the first loop would be predecessor for each of the tasks 
> > > in the second loop. This will also result in n^2 releases/notifications.
> > > 
> > > 
> > > I'd suggest to model the dependencies like:
> > > ```
> > > for (int i=0; i > > #pragma omp task depend(in:A)
> > > work(A,i);
> > > }
> > > #pragma omp taskwait depend(inout:A) nowait
> > > for (int i=0; i > > #pragma omp task depend(inoutset:A)
> > > work(A,i);
> > > }
> > > ```
> > > I.e., add a dummy dependency node between the two sets, which has n 
> > > predecessors and n successors. You probably understand the depnode code 
> > > better, than I do, but I think you would only need to add some code in 
> > > line 357, where `last_set` is moved to `prev_set`.
> > > This dummy node would reduce linking and releasing complexity to O(n).
> > This could be a separate research project.  Because such change may cause 
> > performance regressions on some real codes, so it needs thorough 
> > investigation.  I mean insertion of an auxiliary dummy node into dependency 
> > graph is definitely an overhead, which may or may not be overridden by 
> > reduction in number of edges in the graph.  Or it can be made optional 
> > optimization under an external control, if there is a significant gain in 
> > some particular case.
> I don't suggest to insert the auxiliary node in the general case. Just for 
> the case that you see a transition of `in` -> `inoutset` or vice versa.
> 
> With current task dependencies, you always have 1 node notifying n nodes 
> (`out` -> `in`) or n nodes notifying one node (`in` -> `out`). You can see 
> this as an amortized O(1) operation per task in the graph.
> 
> Here you introduce a code pattern, where n nodes each will need to notify m 
> other nodes. This leads to an O(n) operation per task. I'm really worried 
> about the complexity of this pattern.
> If there is no use case with large n, I don't understand, why `inoutset` was 
> introduced in the first place.
> I don't suggest to insert the auxiliary node in the general case. Just for 
> the case that you see a transition of `in` -> `inoutset` or vice versa.
> 
> With current task dependencies, you always have 1 node notifying n nodes 
> (`out` -> `in`) or n nodes notifying one node (`in` -> `out`). You can see 
> this as an amortized O(1) operation per task in the graph.
> 
> Here you introduce a code pattern, where n nodes each will need to notify m 
> other nodes. This leads to an O(n) operation per task. I'm really worried 
> about the complexity of this pattern.
No, I don't introduce the pattern, as it already worked for sets of tasks with 
`in` dependency following set of tasks with `mutexinoutset` dependency or vice 
versa.

> If there is no use case with large n, I don't understand, why `inoutset` was 
> introduced in the first place.
I didn't like that `inoutset` was introduced as the clone of "in" in order to 
separate two (big) sets of mutually independent tasks, but this has already 
been added to specification, so we have to implement it. Of cause your 
suggestion can dramatically speed up dependency processing of some trivial case 
with two huge sets of tasks one wit `in` dependency another with `inoutset`; 
but it slightly changes the semantics of user's code, and actually can slowdown 
other cases.  I would let users do such optimizations.  Or compiler can do this 
once it sees big sets of tasks those don't have barrier-like "taskwait nowait". 
 For user this is one line of code, for compiler this is dozen lines of code, 
for runtime library this is pretty big change which does not worth efforts to 
implement, I think.  Especially given that such "optimization" will definitely 
slowdown some cases.  E.g. small set of tasks with `in` dependency can be 
followed single task with `inoutset` in a loop. Why should we slowdown this 
case?  Library has no idea of the structure of user's code.

In general, I don't like the idea when runtime library tries to optimize user's 
code.  Especially along with other changes in the same patch.



CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D97085/new/

https://reviews.llvm.org/D97085

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/

[PATCH] D97085: [OpenMP] libomp: implement OpenMP 5.1 inoutset task dependence type

2021-06-07 Thread Andrey Churbanov via Phabricator via cfe-commits
This revision was landed with ongoing or failed builds.
This revision was automatically updated to reflect the committed changes.
Closed by commit rGa1f550e05254: [OpenMP] libomp: implement OpenMP 5.1 inoutset 
task dependence type (authored by AndreyChurbanov).

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D97085/new/

https://reviews.llvm.org/D97085

Files:
  clang/lib/CodeGen/CGOpenMPRuntime.cpp
  clang/test/OpenMP/depobj_codegen.cpp
  clang/test/OpenMP/target_enter_data_depend_codegen.cpp
  clang/test/OpenMP/target_exit_data_depend_codegen.cpp
  clang/test/OpenMP/target_update_depend_codegen.cpp
  clang/test/OpenMP/task_codegen.c
  clang/test/OpenMP/task_codegen.cpp
  clang/test/OpenMP/task_if_codegen.cpp
  openmp/runtime/src/kmp.h
  openmp/runtime/src/kmp_taskdeps.cpp
  openmp/runtime/src/kmp_taskdeps.h
  openmp/runtime/test/tasking/bug_nested_proxy_task.c
  openmp/runtime/test/tasking/bug_proxy_task_dep_waiting.c
  openmp/runtime/test/tasking/hidden_helper_task/common.h
  openmp/runtime/test/tasking/hidden_helper_task/depend.cpp
  openmp/runtime/test/tasking/hidden_helper_task/gtid.cpp
  openmp/runtime/test/tasking/omp51_task_dep_inoutset.c

Index: openmp/runtime/test/tasking/omp51_task_dep_inoutset.c
===
--- /dev/null
+++ openmp/runtime/test/tasking/omp51_task_dep_inoutset.c
@@ -0,0 +1,258 @@
+// RUN: %libomp-compile-and-run
+// RUN: %libomp-cxx-compile-and-run
+// UNSUPPORTED: gcc
+
+// Tests OMP 5.0 task dependences "mutexinoutset" and 5.1 "inoutset",
+// emulates compiler codegen for new dep kinds
+// Mutually exclusive tasks get same input dependency info array
+//
+// Task tree created:
+//  task0 - task1 (in)
+// \
+//task2 - task3 (inoutset)
+// /
+//  task3 - task4 (in)
+//   /
+//  task6 <-->task7  (mutexinoutset)
+//   \/
+//   task8 (in)
+//
+#include 
+#include 
+
+#ifdef _WIN32
+#include 
+#define mysleep(n) Sleep(n)
+#else
+#include 
+#define mysleep(n) usleep((n)*1000)
+#endif
+
+// to check the # of concurrent tasks (must be 1 for MTX, <3 for other kinds)
+static int volatile checker = 0;
+static int err = 0;
+#ifndef DELAY
+#define DELAY 100
+#endif
+
+// ---
+// internal data to emulate compiler codegen
+typedef struct DEP {
+  size_t addr;
+  size_t len;
+  int flags;
+} dep;
+typedef struct task {
+  void** shareds;
+  void* entry;
+  int part_id;
+  void* destr_thunk;
+  int priority;
+  long long device_id;
+  int f_priv;
+} task_t;
+#define TIED 1
+typedef int(*entry_t)(int, task_t*);
+typedef struct ID {
+  int reserved_1;
+  int flags;
+  int reserved_2;
+  int reserved_3;
+  char *psource;
+} id;
+// thunk routine for tasks with MTX dependency
+int thunk_m(int gtid, task_t* ptask) {
+  int th = omp_get_thread_num();
+  #pragma omp atomic
+++checker;
+  printf("task _%d, th %d\n", ptask->f_priv, th);
+  if (checker != 1) { // no more than 1 task at a time
+err++;
+printf("Error1, checker %d != 1\n", checker);
+  }
+  mysleep(DELAY);
+  if (checker != 1) { // no more than 1 task at a time
+err++;
+printf("Error2, checker %d != 1\n", checker);
+  }
+  #pragma omp atomic
+--checker;
+  return 0;
+}
+// thunk routine for tasks with inoutset dependency
+int thunk_s(int gtid, task_t* ptask) {
+  int th = omp_get_thread_num();
+  #pragma omp atomic
+++checker;
+  printf("task _%d, th %d\n", ptask->f_priv, th);
+  if (checker > 2) { // no more than 2 tasks concurrently
+err++;
+printf("Error1, checker %d > 2\n", checker);
+  }
+  mysleep(DELAY);
+  if (checker > 2) { // no more than 2 tasks concurrently
+err++;
+printf("Error2, checker %d > 2\n", checker);
+  }
+  #pragma omp atomic
+--checker;
+  return 0;
+}
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+int __kmpc_global_thread_num(id*);
+extern task_t* __kmpc_omp_task_alloc(id *loc, int gtid, int flags,
+ size_t sz, size_t shar, entry_t rtn);
+int
+__kmpc_omp_task_with_deps(id *loc, int gtid, task_t *task, int nd, dep *dep_lst,
+  int nd_noalias, dep *noalias_dep_lst);
+static id loc = {0, 2, 0, 0, ";file;func;0;0;;"};
+#ifdef __cplusplus
+} // extern "C"
+#endif
+// End of internal data
+// ---
+
+int main()
+{
+  int i1,i2,i3;
+  omp_set_num_threads(4);
+  omp_set_dynamic(0);
+  #pragma omp parallel
+  {
+#pragma omp single nowait
+{
+  dep sdep[2];
+  task_t *ptr;
+  int gtid = __kmpc_global_thread_num(&loc);
+  int t = omp_get_thread_num();
+  #pragma omp task depend(in: i1, i2)
+  { int th = omp_get_thread_num();
+printf("task 0_%d, th %d\n", t, th);
+#pragma omp atomic
+  ++checker;
+if (checker > 2) { // no more than 2 tasks concurrently
+   

[PATCH] D97085: [OpenMP] libomp: implement OpenMP 5.1 inoutset task dependence type

2021-06-10 Thread Andrey Churbanov via Phabricator via cfe-commits
AndreyChurbanov added inline comments.



Comment at: openmp/runtime/src/kmp_taskdeps.cpp:418
 if (dep_list[i].base_addr != 0) {
+  KMP_DEBUG_ASSERT(
+  dep_list[i].flag == KMP_DEP_IN || dep_list[i].flag == KMP_DEP_OUT ||

protze.joachim wrote:
> I hit this assertion, when compiling the tests with clang-11. Is this 
> expected behavior? Does this patch break backwards compatibility, or should 
> this assertion just not look at the higher bits, as they might be 
> uninitialized?
> 
> Whey I change `dep_list[i].flag` to `(dep_list[i].flag&0xf)`, the assertion 
> doesn't trigger.
Thanks for pointing this out. I've reverted the patch in order to fix backwards 
compatibility problem. Will eliminate the dependence flag type size change in 
clang, and change the size in the library code instead.



Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D97085/new/

https://reviews.llvm.org/D97085

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D97085: [OpenMP] libomp: implement OpenMP 5.1 inoutset task dependence type

2021-06-14 Thread Andrey Churbanov via Phabricator via cfe-commits
AndreyChurbanov updated this revision to Diff 352010.
AndreyChurbanov added a comment.

Fixed backwards compatibility problem introduced by previous version of the 
patch.
That is restored the size of task dependence flag to 8 bits in clang, and 
instead changed the size of the flag in the library from 32 to 8 bits.
Fixed two tests so that they set all 8 bits of the flag.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D97085/new/

https://reviews.llvm.org/D97085

Files:
  openmp/runtime/src/kmp.h
  openmp/runtime/src/kmp_taskdeps.cpp
  openmp/runtime/src/kmp_taskdeps.h
  openmp/runtime/test/tasking/hidden_helper_task/common.h
  openmp/runtime/test/tasking/hidden_helper_task/depend.cpp
  openmp/runtime/test/tasking/hidden_helper_task/gtid.cpp
  openmp/runtime/test/tasking/omp51_task_dep_inoutset.c

Index: openmp/runtime/test/tasking/omp51_task_dep_inoutset.c
===
--- /dev/null
+++ openmp/runtime/test/tasking/omp51_task_dep_inoutset.c
@@ -0,0 +1,258 @@
+// RUN: %libomp-compile-and-run
+// RUN: %libomp-cxx-compile-and-run
+// UNSUPPORTED: gcc
+
+// Tests OMP 5.0 task dependences "mutexinoutset" and 5.1 "inoutset",
+// emulates compiler codegen for new dep kinds
+// Mutually exclusive tasks get same input dependency info array
+//
+// Task tree created:
+//  task0 - task1 (in)
+// \
+//task2 - task3 (inoutset)
+// /
+//  task3 - task4 (in)
+//   /
+//  task6 <-->task7  (mutexinoutset)
+//   \/
+//   task8 (in)
+//
+#include 
+#include 
+
+#ifdef _WIN32
+#include 
+#define mysleep(n) Sleep(n)
+#else
+#include 
+#define mysleep(n) usleep((n)*1000)
+#endif
+
+// to check the # of concurrent tasks (must be 1 for MTX, <3 for other kinds)
+static int volatile checker = 0;
+static int err = 0;
+#ifndef DELAY
+#define DELAY 100
+#endif
+
+// ---
+// internal data to emulate compiler codegen
+typedef struct DEP {
+  size_t addr;
+  size_t len;
+  unsigned char flags;
+} dep;
+typedef struct task {
+  void** shareds;
+  void* entry;
+  int part_id;
+  void* destr_thunk;
+  int priority;
+  long long device_id;
+  int f_priv;
+} task_t;
+#define TIED 1
+typedef int(*entry_t)(int, task_t*);
+typedef struct ID {
+  int reserved_1;
+  int flags;
+  int reserved_2;
+  int reserved_3;
+  char *psource;
+} id;
+// thunk routine for tasks with MTX dependency
+int thunk_m(int gtid, task_t* ptask) {
+  int th = omp_get_thread_num();
+  #pragma omp atomic
+++checker;
+  printf("task _%d, th %d\n", ptask->f_priv, th);
+  if (checker != 1) { // no more than 1 task at a time
+err++;
+printf("Error1, checker %d != 1\n", checker);
+  }
+  mysleep(DELAY);
+  if (checker != 1) { // no more than 1 task at a time
+err++;
+printf("Error2, checker %d != 1\n", checker);
+  }
+  #pragma omp atomic
+--checker;
+  return 0;
+}
+// thunk routine for tasks with inoutset dependency
+int thunk_s(int gtid, task_t* ptask) {
+  int th = omp_get_thread_num();
+  #pragma omp atomic
+++checker;
+  printf("task _%d, th %d\n", ptask->f_priv, th);
+  if (checker > 2) { // no more than 2 tasks concurrently
+err++;
+printf("Error1, checker %d > 2\n", checker);
+  }
+  mysleep(DELAY);
+  if (checker > 2) { // no more than 2 tasks concurrently
+err++;
+printf("Error2, checker %d > 2\n", checker);
+  }
+  #pragma omp atomic
+--checker;
+  return 0;
+}
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+int __kmpc_global_thread_num(id*);
+extern task_t* __kmpc_omp_task_alloc(id *loc, int gtid, int flags,
+ size_t sz, size_t shar, entry_t rtn);
+int
+__kmpc_omp_task_with_deps(id *loc, int gtid, task_t *task, int nd, dep *dep_lst,
+  int nd_noalias, dep *noalias_dep_lst);
+static id loc = {0, 2, 0, 0, ";file;func;0;0;;"};
+#ifdef __cplusplus
+} // extern "C"
+#endif
+// End of internal data
+// ---
+
+int main()
+{
+  int i1,i2,i3;
+  omp_set_num_threads(4);
+  omp_set_dynamic(0);
+  #pragma omp parallel
+  {
+#pragma omp single nowait
+{
+  dep sdep[2];
+  task_t *ptr;
+  int gtid = __kmpc_global_thread_num(&loc);
+  int t = omp_get_thread_num();
+  #pragma omp task depend(in: i1, i2)
+  { int th = omp_get_thread_num();
+printf("task 0_%d, th %d\n", t, th);
+#pragma omp atomic
+  ++checker;
+if (checker > 2) { // no more than 2 tasks concurrently
+  err++;
+  printf("Error1, checker %d > 2\n", checker);
+}
+mysleep(DELAY);
+if (checker > 2) { // no more than 2 tasks concurrently
+  err++;
+  printf("Error1, checker %d > 2\n", checker);
+}
+#pragma omp atomic
+  --checker;
+  }
+  #pragma omp task depend(in: i

[PATCH] D97085: [OpenMP] libomp: implement OpenMP 5.1 inoutset task dependence type

2021-06-16 Thread Andrey Churbanov via Phabricator via cfe-commits
This revision was automatically updated to reflect the committed changes.
Closed by commit rG610fea65e296: [OpenMP] libomp: fixed implementation of OMP 
5.1 inoutset task dependence type (authored by AndreyChurbanov).

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D97085/new/

https://reviews.llvm.org/D97085

Files:
  openmp/runtime/src/kmp.h
  openmp/runtime/src/kmp_taskdeps.cpp
  openmp/runtime/src/kmp_taskdeps.h
  openmp/runtime/test/tasking/hidden_helper_task/common.h
  openmp/runtime/test/tasking/hidden_helper_task/depend.cpp
  openmp/runtime/test/tasking/hidden_helper_task/gtid.cpp
  openmp/runtime/test/tasking/omp51_task_dep_inoutset.c

Index: openmp/runtime/test/tasking/omp51_task_dep_inoutset.c
===
--- /dev/null
+++ openmp/runtime/test/tasking/omp51_task_dep_inoutset.c
@@ -0,0 +1,258 @@
+// RUN: %libomp-compile-and-run
+// RUN: %libomp-cxx-compile-and-run
+// UNSUPPORTED: gcc
+
+// Tests OMP 5.0 task dependences "mutexinoutset" and 5.1 "inoutset",
+// emulates compiler codegen for new dep kinds
+// Mutually exclusive tasks get same input dependency info array
+//
+// Task tree created:
+//  task0 - task1 (in)
+// \
+//task2 - task3 (inoutset)
+// /
+//  task3 - task4 (in)
+//   /
+//  task6 <-->task7  (mutexinoutset)
+//   \/
+//   task8 (in)
+//
+#include 
+#include 
+
+#ifdef _WIN32
+#include 
+#define mysleep(n) Sleep(n)
+#else
+#include 
+#define mysleep(n) usleep((n)*1000)
+#endif
+
+// to check the # of concurrent tasks (must be 1 for MTX, <3 for other kinds)
+static int volatile checker = 0;
+static int err = 0;
+#ifndef DELAY
+#define DELAY 100
+#endif
+
+// ---
+// internal data to emulate compiler codegen
+typedef struct DEP {
+  size_t addr;
+  size_t len;
+  unsigned char flags;
+} dep;
+typedef struct task {
+  void** shareds;
+  void* entry;
+  int part_id;
+  void* destr_thunk;
+  int priority;
+  long long device_id;
+  int f_priv;
+} task_t;
+#define TIED 1
+typedef int(*entry_t)(int, task_t*);
+typedef struct ID {
+  int reserved_1;
+  int flags;
+  int reserved_2;
+  int reserved_3;
+  char *psource;
+} id;
+// thunk routine for tasks with MTX dependency
+int thunk_m(int gtid, task_t* ptask) {
+  int th = omp_get_thread_num();
+  #pragma omp atomic
+++checker;
+  printf("task _%d, th %d\n", ptask->f_priv, th);
+  if (checker != 1) { // no more than 1 task at a time
+err++;
+printf("Error1, checker %d != 1\n", checker);
+  }
+  mysleep(DELAY);
+  if (checker != 1) { // no more than 1 task at a time
+err++;
+printf("Error2, checker %d != 1\n", checker);
+  }
+  #pragma omp atomic
+--checker;
+  return 0;
+}
+// thunk routine for tasks with inoutset dependency
+int thunk_s(int gtid, task_t* ptask) {
+  int th = omp_get_thread_num();
+  #pragma omp atomic
+++checker;
+  printf("task _%d, th %d\n", ptask->f_priv, th);
+  if (checker > 2) { // no more than 2 tasks concurrently
+err++;
+printf("Error1, checker %d > 2\n", checker);
+  }
+  mysleep(DELAY);
+  if (checker > 2) { // no more than 2 tasks concurrently
+err++;
+printf("Error2, checker %d > 2\n", checker);
+  }
+  #pragma omp atomic
+--checker;
+  return 0;
+}
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+int __kmpc_global_thread_num(id*);
+extern task_t* __kmpc_omp_task_alloc(id *loc, int gtid, int flags,
+ size_t sz, size_t shar, entry_t rtn);
+int
+__kmpc_omp_task_with_deps(id *loc, int gtid, task_t *task, int nd, dep *dep_lst,
+  int nd_noalias, dep *noalias_dep_lst);
+static id loc = {0, 2, 0, 0, ";file;func;0;0;;"};
+#ifdef __cplusplus
+} // extern "C"
+#endif
+// End of internal data
+// ---
+
+int main()
+{
+  int i1,i2,i3;
+  omp_set_num_threads(4);
+  omp_set_dynamic(0);
+  #pragma omp parallel
+  {
+#pragma omp single nowait
+{
+  dep sdep[2];
+  task_t *ptr;
+  int gtid = __kmpc_global_thread_num(&loc);
+  int t = omp_get_thread_num();
+  #pragma omp task depend(in: i1, i2)
+  { int th = omp_get_thread_num();
+printf("task 0_%d, th %d\n", t, th);
+#pragma omp atomic
+  ++checker;
+if (checker > 2) { // no more than 2 tasks concurrently
+  err++;
+  printf("Error1, checker %d > 2\n", checker);
+}
+mysleep(DELAY);
+if (checker > 2) { // no more than 2 tasks concurrently
+  err++;
+  printf("Error1, checker %d > 2\n", checker);
+}
+#pragma omp atomic
+  --checker;
+  }
+  #pragma omp task depend(in: i1, i2)
+  { int th = omp_get_thread_num();
+printf("task 1_%d, th %d\n", t, th);
+#pragma omp atomic
+  ++checker;
+if (ch

[PATCH] D97085: [OpenMP] libomp: implement OpenMP 5.1 inoutset task dependence type

2021-03-09 Thread Andrey Churbanov via Phabricator via cfe-commits
AndreyChurbanov updated this revision to Diff 329461.
AndreyChurbanov added a comment.
Herald added a project: clang.
Herald added a subscriber: cfe-commits.

Changed the size of task dependences flags from 8 bit to 32 bit, so that 
runtime does not get garbage data in unused bits of the structure, and can use 
flags as an integer where it is more convenient than looking at particular bits.

Also adjusted some tests to clear unused bits of the task dependences structure.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D97085/new/

https://reviews.llvm.org/D97085

Files:
  clang/lib/CodeGen/CGOpenMPRuntime.cpp
  openmp/runtime/src/kmp.h
  openmp/runtime/src/kmp_taskdeps.cpp
  openmp/runtime/src/kmp_taskdeps.h
  openmp/runtime/test/tasking/bug_nested_proxy_task.c
  openmp/runtime/test/tasking/bug_proxy_task_dep_waiting.c
  openmp/runtime/test/tasking/hidden_helper_task/common.h
  openmp/runtime/test/tasking/hidden_helper_task/depend.cpp
  openmp/runtime/test/tasking/hidden_helper_task/gtid.cpp
  openmp/runtime/test/tasking/omp51_task_dep_inoutset.c

Index: openmp/runtime/test/tasking/omp51_task_dep_inoutset.c
===
--- /dev/null
+++ openmp/runtime/test/tasking/omp51_task_dep_inoutset.c
@@ -0,0 +1,258 @@
+// RUN: %libomp-compile-and-run
+// RUN: %libomp-cxx-compile-and-run
+// UNSUPPORTED: gcc
+
+// Tests OMP 5.0 task dependences "mutexinoutset" and 5.1 "inoutset",
+// emulates compiler codegen for new dep kinds
+// Mutually exclusive tasks get same input dependency info array
+//
+// Task tree created:
+//  task0 - task1 (in)
+// \
+//task2 - task3 (inoutset)
+// /
+//  task3 - task4 (in)
+//   /
+//  task6 <-->task7  (mutexinoutset)
+//   \/
+//   task8 (in)
+//
+#include 
+#include 
+
+#ifdef _WIN32
+#include 
+#define mysleep(n) Sleep(n)
+#else
+#include 
+#define mysleep(n) usleep((n)*1000)
+#endif
+
+// to check the # of concurrent tasks (must be 1 for MTX, <3 for other kinds)
+static int volatile checker = 0;
+static int err = 0;
+#ifndef DELAY
+#define DELAY 100
+#endif
+
+// ---
+// internal data to emulate compiler codegen
+typedef struct DEP {
+  size_t addr;
+  size_t len;
+  int flags;
+} dep;
+typedef struct task {
+  void** shareds;
+  void* entry;
+  int part_id;
+  void* destr_thunk;
+  int priority;
+  long long device_id;
+  int f_priv;
+} task_t;
+#define TIED 1
+typedef int(*entry_t)(int, task_t*);
+typedef struct ID {
+  int reserved_1;
+  int flags;
+  int reserved_2;
+  int reserved_3;
+  char *psource;
+} id;
+// thunk routine for tasks with MTX dependency
+int thunk_m(int gtid, task_t* ptask) {
+  int th = omp_get_thread_num();
+  #pragma omp atomic
+++checker;
+  printf("task _%d, th %d\n", ptask->f_priv, th);
+  if (checker != 1) { // no more than 1 task at a time
+err++;
+printf("Error1, checker %d != 1\n", checker);
+  }
+  mysleep(DELAY);
+  if (checker != 1) { // no more than 1 task at a time
+err++;
+printf("Error2, checker %d != 1\n", checker);
+  }
+  #pragma omp atomic
+--checker;
+  return 0;
+}
+// thunk routine for tasks with inoutset dependency
+int thunk_s(int gtid, task_t* ptask) {
+  int th = omp_get_thread_num();
+  #pragma omp atomic
+++checker;
+  printf("task _%d, th %d\n", ptask->f_priv, th);
+  if (checker > 2) { // no more than 2 tasks concurrently
+err++;
+printf("Error1, checker %d > 2\n", checker);
+  }
+  mysleep(DELAY);
+  if (checker > 2) { // no more than 2 tasks concurrently
+err++;
+printf("Error2, checker %d > 2\n", checker);
+  }
+  #pragma omp atomic
+--checker;
+  return 0;
+}
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+int __kmpc_global_thread_num(id*);
+extern task_t* __kmpc_omp_task_alloc(id *loc, int gtid, int flags,
+ size_t sz, size_t shar, entry_t rtn);
+int
+__kmpc_omp_task_with_deps(id *loc, int gtid, task_t *task, int nd, dep *dep_lst,
+  int nd_noalias, dep *noalias_dep_lst);
+static id loc = {0, 2, 0, 0, ";file;func;0;0;;"};
+#ifdef __cplusplus
+} // extern "C"
+#endif
+// End of internal data
+// ---
+
+int main()
+{
+  int i1,i2,i3;
+  omp_set_num_threads(4);
+  omp_set_dynamic(0);
+  #pragma omp parallel
+  {
+#pragma omp single nowait
+{
+  dep sdep[2];
+  task_t *ptr;
+  int gtid = __kmpc_global_thread_num(&loc);
+  int t = omp_get_thread_num();
+  #pragma omp task depend(in: i1, i2)
+  { int th = omp_get_thread_num();
+printf("task 0_%d, th %d\n", t, th);
+#pragma omp atomic
+  ++checker;
+if (checker > 2) { // no more than 2 tasks concurrently
+  err++;
+  printf("Error1, checker %d > 2\n", checker);
+}
+mysleep(DELAY);
+

[PATCH] D51331: [OPENMP] Create non-const ident_t structs.

2018-08-29 Thread Andrey Churbanov via Phabricator via cfe-commits
AndreyChurbanov added a comment.

In https://reviews.llvm.org/D51331#1216509, @ABataev wrote:

> When is ITT Notify used? Does it have some preconditions like debug info, 
> some optimizations level etc.?


The libittnotify library is used by Intel tools (VTune Amplifier, Inspector, 
Spatial Advisor, etc.), may be used by others but I am not aware of any.  Intel 
Tools development team still have no plans in near future to move to standard 
OMPT interface, mostly because the standard OpenMP Tools interface still lacks 
some important for customers features (like barrier imbalance reporting).

It does not have any preconditions, although the presence of debug info usually 
helps to get better info in the tool.  So, users are free to use tools without 
debug info and with any optimization level.


https://reviews.llvm.org/D51331



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D106674: Runtime for Interop directive

2022-01-28 Thread Andrey Churbanov via Phabricator via cfe-commits
AndreyChurbanov added inline comments.



Comment at: openmp/runtime/src/dllexports:553
+omp_get_interop_ptr 761
+omp_get_interop_str 762
 

mstorsjo wrote:
> mstorsjo wrote:
> > jdoerfert wrote:
> > > Those values are taken by now, you need new values that are not taken yet.
> > This broke building for Windows:
> > ```
> > generate-def.pl: (x) Error parsing file 
> > "llvm-project/openmp/runtime/src/dllexports" line 556:
> > generate-def.pl: (x) omp_get_interop_int 2514
> > generate-def.pl: (x) Ordinal of user-callable entry must be < 1000
> > ```
> > 
> ... and even with those ordinals changed to something in the right range, the 
> build later fails with linker errors:
> ```
> ld.lld: error: : undefined symbol: OMP_GET_INTEROP_INT
> ld.lld: error: : undefined symbol: OMP_GET_INTEROP_PTR
> ld.lld: error: : undefined symbol: OMP_GET_INTEROP_STR
> ```
> So please fix the Windows build of OpenMP, before the 14.x branch is made 
> early next week.
As the warning says these should be < 1000, because lowercase are automatically 
replicated in uppercase adding +1000 to the ordinal and still be < 2000 (for 
some particular Fortran usages).
There are free ordinals below 1000 starting at 807.  So, e.g. 807, 808, etc. 
should work fine.

Actually these interfaces described as "C/C++"-only in the specification, so 
there is no need to provide Fortran-specific versions, though it should not 
harm. Probably making them C-only does not worth efforts for now.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D106674/new/

https://reviews.llvm.org/D106674

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits