Re: [gomp4] Library side of depend clause support

2013-09-30 Thread Ilya Tocar
On 27 Sep 12:08, Jakub Jelinek wrote:

Looks like you forgot some files. I've checked
http://gcc.gnu.org/viewcvs/gcc?view=revisionrevision=202968
And e. g. hashtab.h is missing. So currently branch is failing to build,
with task.c:46:21: fatal error: hashtab.h: No such file or directory

 Here is what I've committed now, the incremental changes were really only
 using a structure with flex array member for the dependers vectors,
 removing/making redundant earlier !ent-is_in when adding !is_in into the
 chain and addition of new testcases.
 
 Let's improve it incrementally later.
 
 2013-09-27  Jakub Jelinek  ja...@redhat.com
 
   * libgomp.h: Include stdlib.h.
   (struct gomp_task_depend_entry,
   struct gomp_dependers_vec): New types.
   (struct gomp_task): Add dependers, depend_hash, depend_count,
   num_dependees and depend fields.
   (struct gomp_taskgroup): Add num_children field.
   (gomp_finish_task): Free depend_hash if non-NULL.
   * libgomp_g.h (GOMP_task): Add depend argument.
   * hashtab.h: New file.
   * task.c: Include hashtab.h.
   (hash_entry_type): New typedef.
   (htab_alloc, htab_free, htab_hash, htab_eq): New inlines.
   (gomp_init_task): Clear dependers, depend_hash and depend_count
   fields.
   (GOMP_task): Add depend argument, handle depend clauses.  Increment
   num_children field in taskgroup.
   (gomp_task_run_pre): Don't increment task_running_count here,
   nor clear task_pending bit.
   (gomp_task_run_post_handle_depend_hash,
   gomp_task_run_post_handle_dependers,
   gomp_task_run_post_handle_depend): New functions.
   (gomp_task_run_post_remove_parent): Clear in_taskwait before
   signalling corresponding semaphore.
   (gomp_task_run_post_remove_taskgroup): Decrement num_children
   field and make the decrement to 0 MEMMODEL_RELEASE operation,
   rather than storing NULL to taskgroup-children.  Clear
   in_taskgroup_wait before signalling corresponding semaphore.
   (gomp_barrier_handle_tasks): Move task_running_count increment
   and task_pending bit clearing here.  Call
   gomp_task_run_post_handle_depend.  If more than one new tasks
   have been queued, wake other threads if needed.
   (GOMP_taskwait): Call gomp_task_run_post_handle_depend.  If more
   than one new tasks have been queued, wake other threads if needed.
   After waiting on taskwait_sem, enter critical section again.
   (GOMP_taskgroup_start): Initialize num_children field.
   (GOMP_taskgroup_end): Check num_children instead of children
   before critical section.  If children is NULL, but num_children
   is non-zero, wait on taskgroup_sem.  Call
   gomp_task_run_post_handle_depend.  If more than one new tasks have
   been queued, wake other threads if needed.  After waiting on
   taskgroup_sem, enter critical section again.
   * testsuite/libgomp.c/depend-1.c: New test.
   * testsuite/libgomp.c/depend-2.c: New test.
   * testsuite/libgomp.c/depend-3.c: New test.
   * testsuite/libgomp.c/depend-4.c: New test.
 


Re: [gomp4] Library side of depend clause support

2013-09-30 Thread Jakub Jelinek
On Mon, Sep 30, 2013 at 05:04:23PM +0400, Ilya Tocar wrote:
 On 27 Sep 12:08, Jakub Jelinek wrote:
 
 Looks like you forgot some files. I've checked
 http://gcc.gnu.org/viewcvs/gcc?view=revisionrevision=202968
 And e. g. hashtab.h is missing. So currently branch is failing to build,
 with task.c:46:21: fatal error: hashtab.h: No such file or directory

Fixed now, sorry.

Jakub


Re: [gomp4] Library side of depend clause support

2013-09-27 Thread Jakub Jelinek
On Fri, Sep 27, 2013 at 01:48:36AM +0200, Jakub Jelinek wrote:
 Perhaps.  What if I do just minor cleanup (use flexible array members for
 the reallocated vectors, and perhaps keep only the last out/inout task
 in the hash table chains rather than all of them), retest, commit and then
 we can discuss/incrementally improve it?

Here is what I've committed now, the incremental changes were really only
using a structure with flex array member for the dependers vectors,
removing/making redundant earlier !ent-is_in when adding !is_in into the
chain and addition of new testcases.

Let's improve it incrementally later.

2013-09-27  Jakub Jelinek  ja...@redhat.com

* libgomp.h: Include stdlib.h.
(struct gomp_task_depend_entry,
struct gomp_dependers_vec): New types.
(struct gomp_task): Add dependers, depend_hash, depend_count,
num_dependees and depend fields.
(struct gomp_taskgroup): Add num_children field.
(gomp_finish_task): Free depend_hash if non-NULL.
* libgomp_g.h (GOMP_task): Add depend argument.
* hashtab.h: New file.
* task.c: Include hashtab.h.
(hash_entry_type): New typedef.
(htab_alloc, htab_free, htab_hash, htab_eq): New inlines.
(gomp_init_task): Clear dependers, depend_hash and depend_count
fields.
(GOMP_task): Add depend argument, handle depend clauses.  Increment
num_children field in taskgroup.
(gomp_task_run_pre): Don't increment task_running_count here,
nor clear task_pending bit.
(gomp_task_run_post_handle_depend_hash,
gomp_task_run_post_handle_dependers,
gomp_task_run_post_handle_depend): New functions.
(gomp_task_run_post_remove_parent): Clear in_taskwait before
signalling corresponding semaphore.
(gomp_task_run_post_remove_taskgroup): Decrement num_children
field and make the decrement to 0 MEMMODEL_RELEASE operation,
rather than storing NULL to taskgroup-children.  Clear
in_taskgroup_wait before signalling corresponding semaphore.
(gomp_barrier_handle_tasks): Move task_running_count increment
and task_pending bit clearing here.  Call
gomp_task_run_post_handle_depend.  If more than one new tasks
have been queued, wake other threads if needed.
(GOMP_taskwait): Call gomp_task_run_post_handle_depend.  If more
than one new tasks have been queued, wake other threads if needed.
After waiting on taskwait_sem, enter critical section again.
(GOMP_taskgroup_start): Initialize num_children field.
(GOMP_taskgroup_end): Check num_children instead of children
before critical section.  If children is NULL, but num_children
is non-zero, wait on taskgroup_sem.  Call
gomp_task_run_post_handle_depend.  If more than one new tasks have
been queued, wake other threads if needed.  After waiting on
taskgroup_sem, enter critical section again.
* testsuite/libgomp.c/depend-1.c: New test.
* testsuite/libgomp.c/depend-2.c: New test.
* testsuite/libgomp.c/depend-3.c: New test.
* testsuite/libgomp.c/depend-4.c: New test.

--- libgomp/libgomp.h.jj2013-09-26 09:43:10.903930832 +0200
+++ libgomp/libgomp.h   2013-09-27 09:05:17.025402127 +0200
@@ -39,6 +39,7 @@
 
 #include pthread.h
 #include stdbool.h
+#include stdlib.h
 
 #ifdef HAVE_ATTRIBUTE_VISIBILITY
 # pragma GCC visibility push(hidden)
@@ -253,7 +254,26 @@ enum gomp_task_kind
   GOMP_TASK_TIED
 };
 
+struct gomp_task;
 struct gomp_taskgroup;
+struct htab;
+
+struct gomp_task_depend_entry
+{
+  void *addr;
+  struct gomp_task_depend_entry *next;
+  struct gomp_task_depend_entry *prev;
+  struct gomp_task *task;
+  bool is_in;
+  bool redundant;
+};
+
+struct gomp_dependers_vec
+{
+  size_t n_elem;
+  size_t allocated;
+  struct gomp_task *elem[];
+};
 
 /* This structure describes a task to be run by a thread.  */
 
@@ -268,6 +288,10 @@ struct gomp_task
   struct gomp_task *next_taskgroup;
   struct gomp_task *prev_taskgroup;
   struct gomp_taskgroup *taskgroup;
+  struct gomp_dependers_vec *dependers;
+  struct htab *depend_hash;
+  size_t depend_count;
+  size_t num_dependees;
   struct gomp_task_icv icv;
   void (*fn) (void *);
   void *fn_data;
@@ -277,6 +301,7 @@ struct gomp_task
   bool final_task;
   bool copy_ctors_done;
   gomp_sem_t taskwait_sem;
+  struct gomp_task_depend_entry depend[];
 };
 
 struct gomp_taskgroup
@@ -286,6 +311,7 @@ struct gomp_taskgroup
   bool in_taskgroup_wait;
   bool cancelled;
   gomp_sem_t taskgroup_sem;
+  size_t num_children;
 };
 
 /* This structure describes a team of threads.  These are the threads
@@ -525,6 +551,8 @@ extern void gomp_barrier_handle_tasks (g
 static void inline
 gomp_finish_task (struct gomp_task *task)
 {
+  if (__builtin_expect (task-depend_hash != NULL, 0))
+free (task-depend_hash);
   gomp_sem_destroy (task-taskwait_sem);
 }
 
--- 

[gomp4] Library side of depend clause support

2013-09-26 Thread Jakub Jelinek
Hi!

This patch adds depend clause support.
In GOMP_task, before queueing the task, if task has any depend clauses
we look up the addresses in a hash table (in the parent task, because
only sibling tasks are ordered through depend clause), and if there
are any dependencies on the earlier started tasks, the new task
isn't added to team, parent and taskgroup task queues, but instead
just added into the earlier task's depender vectors.  Each task
has also an integer number of how many other tasks it depends on.
When a task on which something depends on finishes, if parent exists,
it's records are removed from parent's depend address hash table,
and even if parent doesn't exist anymore, we decrement num_dependees
of every task mentioned in the dependers vector.  If any of those
counters go to zero, we insert them into all the relevant task queues.

Tested on x86_64-linux.  Will commit tomorrow unless somebody complains,
but in any case would appreciate review of the changes.

2013-09-26  Jakub Jelinek  ja...@redhat.com

* libgomp.h: Include stdlib.h.
(struct gomp_task_depend_entry): New type.
(struct gomp_task): Add dependers, depend_hash, depend_count,
num_dependees and depend fields.
(struct gomp_taskgroup): Add num_children field.
(gomp_finish_task): Free depend_hash if non-NULL.
* libgomp_g.h (GOMP_task): Add depend argument.
* hashtab.h: New file.
* task.c: Include hashtab.h.
(hash_entry_type): New typedef.
(htab_alloc, htab_free, htab_hash, htab_eq): New inlines.
(gomp_init_task): Clear dependers, depend_hash and depend_count
fields.
(GOMP_task): Add depend argument, handle depend clauses.  Increment
num_children field in taskgroup.
(gomp_task_run_pre): Don't increment task_running_count here,
nor clear task_pending bit.
(gomp_task_run_post_handle_depend_hash,
gomp_task_run_post_handle_dependers,
gomp_task_run_post_handle_depend): New functions.
(gomp_task_run_post_remove_parent): Clear in_taskwait before
signalling corresponding semaphore.
(gomp_task_run_post_remove_taskgroup): Decrement num_children
field and make the decrement to 0 MEMMODEL_RELEASE operation,
rather than storing NULL to taskgroup-children.  Clear
in_taskgroup_wait before signalling corresponding semaphore.
(gomp_barrier_handle_tasks): Move task_running_count increment
and task_pending bit clearing here.  Call
gomp_task_run_post_handle_depend.  If more than one new tasks
have been queued, wake other threads if needed.
(GOMP_taskwait): Call gomp_task_run_post_handle_depend.  If more
than one new tasks have been queued, wake other threads if needed.
After waiting on taskwait_sem, enter critical section again.
(GOMP_taskgroup_start): Initialize num_children field.
(GOMP_taskgroup_end): Check num_children instead of children
before critical section.  If children is NULL, but num_children
is non-zero, wait on taskgroup_sem.  Call
gomp_task_run_post_handle_depend.  If more than one new tasks have
been queued, wake other threads if needed.  After waiting on
taskgroup_sem, enter critical section again.
* testsuite/libgomp.c/depend-1.c: New test.
* testsuite/libgomp.c/depend-2.c: New test.

--- libgomp/libgomp.h.jj2013-09-26 09:43:10.903930832 +0200
+++ libgomp/libgomp.h   2013-09-26 17:17:28.267001263 +0200
@@ -39,6 +39,7 @@
 
 #include pthread.h
 #include stdbool.h
+#include stdlib.h
 
 #ifdef HAVE_ATTRIBUTE_VISIBILITY
 # pragma GCC visibility push(hidden)
@@ -253,7 +254,19 @@ enum gomp_task_kind
   GOMP_TASK_TIED
 };
 
+struct gomp_task;
 struct gomp_taskgroup;
+struct htab;
+
+struct gomp_task_depend_entry
+{
+  void *addr;
+  struct gomp_task_depend_entry *next;
+  struct gomp_task_depend_entry *prev;
+  struct gomp_task *task;
+  bool is_in;
+  bool redundant;
+};
 
 /* This structure describes a task to be run by a thread.  */
 
@@ -268,6 +281,10 @@ struct gomp_task
   struct gomp_task *next_taskgroup;
   struct gomp_task *prev_taskgroup;
   struct gomp_taskgroup *taskgroup;
+  struct gomp_task **dependers;
+  struct htab *depend_hash;
+  size_t depend_count;
+  size_t num_dependees;
   struct gomp_task_icv icv;
   void (*fn) (void *);
   void *fn_data;
@@ -277,6 +294,7 @@ struct gomp_task
   bool final_task;
   bool copy_ctors_done;
   gomp_sem_t taskwait_sem;
+  struct gomp_task_depend_entry depend[];
 };
 
 struct gomp_taskgroup
@@ -286,6 +304,7 @@ struct gomp_taskgroup
   bool in_taskgroup_wait;
   bool cancelled;
   gomp_sem_t taskgroup_sem;
+  size_t num_children;
 };
 
 /* This structure describes a team of threads.  These are the threads
@@ -525,6 +544,8 @@ extern void gomp_barrier_handle_tasks (g
 static void inline
 gomp_finish_task (struct gomp_task *task)
 {
+  if (__builtin_expect 

Re: [gomp4] Library side of depend clause support

2013-09-26 Thread Richard Henderson
On 09/26/2013 11:36 AM, Jakub Jelinek wrote:
 +struct gomp_task;
  struct gomp_taskgroup;
 +struct htab;
 +
 +struct gomp_task_depend_entry
 +{
 +  void *addr;
 +  struct gomp_task_depend_entry *next;
 +  struct gomp_task_depend_entry *prev;
 +  struct gomp_task *task;
 +  bool is_in;
 +  bool redundant;
 +};

I'm a bit confused about the combination of linked lists and reallocated
arrays.  When did you decide to use what?

 +  if ((flags  8)  thr-task  thr-task-depend_hash)
 + {
 +   struct gomp_task *parent = thr-task;
 +   struct gomp_task_depend_entry elem, *ent = NULL;
 +   size_t ndepend = (uintptr_t) depend[0];
 +   size_t nout = (uintptr_t) depend[1];
 +   size_t i;
 +   gomp_mutex_lock (team-task_lock);
 +   for (i = 0; i  ndepend; i++)
 + {
 +   elem.addr = depend[i + 2];
 +   ent = htab_find (parent-depend_hash, elem);
 +   for (; ent; ent = ent-next)
 + if (i = nout  ent-is_in)
 +   continue;
 + else
 +   break;

I wonder if we ought always defer tasks with dependencies and skip this lock
and search?  Unless the taskgroup is truly weird, we *are* going to have
dependencies between the tasks.  Dunno what exactly to do with final_tasks
that have unfulfilled dependencies...

I also think it would significantly clean up the code to declare a struct with
a variable tail for the depend argument.  That would eliminate all of the
casting to uintptr_t and give names to the first two entries.

 +   if (tsk-dependers == NULL)
 + {
 +   tsk-dependers
 + = gomp_malloc (8 * sizeof (struct gomp_task *));
 +   tsk-dependers[0]
 + = (struct gomp_task *) (uintptr_t) 1;
 +   tsk-dependers[1]
 + = (struct gomp_task *) (uintptr_t) (8 - 2);
 +   tsk-dependers[2] = task;
 +   task-num_dependees++;
 +   continue;

Another place for which a struct with variable tail would significantly clean
up the code.  And here's where I wonder why you're using realloc'd arrays here
as opposed to another linked list?

Perhaps what we need are smaller linked list entries like

  struct gomp_task_depend_node {
 struct gomp_task *task;
 struct gomp_task_depend_node *next;
 struct gomp_task_depend_node *prev;
  };

and a different hash table entry like

  struct gomp_task_depend_head {
void *addr;
struct gomp_task_depend_node *outs;
struct gomp_task_depend_node *ins;
size_t n_ins;
  };

If we scan the ndepend entries twice, we can find out how many nodes we need,
and allocate them with the task like you do now.  Scanning the ndepends array
twice can be sped by only looking up the hash table entries once -- allocate a
local array of size ndepend entries and cache the lookups.

I'd say we don't need a count of the n_outs because all of them on the list
must be sequentially dependent.  Thus any new task simply depends on the
previous task in the outs list.  Thus imo it makes sense to have ins/outs point
to the tail of the list as opposed to the head.

Is is really worthwhile to detect redundant dependencies?  It seems just as
easy to add multiple dependencies and let them just fall out naturally.

OTOH, perhaps you should just go ahead with this patch and we can evolve it
incrementally.  I don't see anything technically wrong with it.


r~


Re: [gomp4] Library side of depend clause support

2013-09-26 Thread Jakub Jelinek
On Thu, Sep 26, 2013 at 03:54:09PM -0700, Richard Henderson wrote:
 On 09/26/2013 11:36 AM, Jakub Jelinek wrote:
  +struct gomp_task;
   struct gomp_taskgroup;
  +struct htab;
  +
  +struct gomp_task_depend_entry
  +{
  +  void *addr;
  +  struct gomp_task_depend_entry *next;
  +  struct gomp_task_depend_entry *prev;
  +  struct gomp_task *task;
  +  bool is_in;
  +  bool redundant;
  +};
 
 I'm a bit confused about the combination of linked lists and reallocated
 arrays.  When did you decide to use what?

I initially wanted to use linked lists only, but while I can statically
preallocate the chains for the hash table, for the depender - dependee
chains where a task may depend on many other tasks that would mean having to
allocate small structures (or pool allocate them, per team?).

 I wonder if we ought always defer tasks with dependencies and skip this lock
 and search?  Unless the taskgroup is truly weird, we *are* going to have
 dependencies between the tasks.  Dunno what exactly to do with final_tasks
 that have unfulfilled dependencies...

I think final tasks aren't a problem, if the parent is a final task, then
all the children are non-deferred, thus we never record any dependencies
and the test for that will be cheap too (because parent-depend_hash will be
NULL).  The problem is if (0) tasks, the spec says that they must be
non-deferred unless they depend on some earlier non-finished task.  But
the cost in that case is primarily in taking the lock/unlock; the search
will stop on the first dependency found, if there aren't any, nothing will
be recorded and we don't jump to the defer label, if there are some, as soon
as we discover first we jump there.

 I also think it would significantly clean up the code to declare a struct with
 a variable tail for the depend argument.  That would eliminate all of the
 casting to uintptr_t and give names to the first two entries.

I agree if we keep using realloced vectors that flexible array would make it
cleaner.

 Perhaps what we need are smaller linked list entries like
 
   struct gomp_task_depend_node {
  struct gomp_task *task;
  struct gomp_task_depend_node *next;
  struct gomp_task_depend_node *prev;
   };

The dependee - depender vectors resp. linked lists are just pushed to first
(the only thing needed during insertion is to have a cheap check if the last
inserted task is the current one, to avoid having the same task multiple
times in the vector/linked list), and then just walked once when the
dependee finishes, so no removal is needed there, it can be freed at once;
thus, for linked list, it would be enough to use non-doubly linked list for
that.  For the hash table chains, unless we want to always lookup the hash
table and walk the chains for removal, we need doubly linked list.
 
 and a different hash table entry like
 
   struct gomp_task_depend_head {
 void *addr;
 struct gomp_task_depend_node *outs;
 struct gomp_task_depend_node *ins;
 size_t n_ins;
   };

You mean that the hash table instead would contain the structures, or
pointers to these structures?  If the latter (not sure what n_ins would be
for), then we'd again need to pool alloc them.
 
 If we scan the ndepend entries twice, we can find out how many nodes we need,
 and allocate them with the task like you do now.  Scanning the ndepends array
 twice can be sped by only looking up the hash table entries once -- allocate a
 local array of size ndepend entries and cache the lookups.
 
 I'd say we don't need a count of the n_outs because all of them on the list
 must be sequentially dependent.  Thus any new task simply depends on the
 previous task in the outs list.  Thus imo it makes sense to have ins/outs 
 point
 to the tail of the list as opposed to the head.

Ah, haven't thought about it this way, yes, you're right that for out/inout
dependencies it is enough to remember in the hash table the last one,
because the dependencies will form a chain on the same address and serialize
the tasks.

 Is is really worthwhile to detect redundant dependencies?  It seems just as
 easy to add multiple dependencies and let them just fall out naturally.

I just didn't want to have duplicates in the hash table chains, the
redundant flag is just a sign that the entry doesn't need to be removed
from the hash table chains.

 OTOH, perhaps you should just go ahead with this patch and we can evolve it
 incrementally.  I don't see anything technically wrong with it.

Perhaps.  What if I do just minor cleanup (use flexible array members for
the reallocated vectors, and perhaps keep only the last out/inout task
in the hash table chains rather than all of them), retest, commit and then
we can discuss/incrementally improve it?

Jakub