Re: [gomp4] Library side of depend clause support
On 27 Sep 12:08, Jakub Jelinek wrote: Looks like you forgot some files. I've checked http://gcc.gnu.org/viewcvs/gcc?view=revisionrevision=202968 And e. g. hashtab.h is missing. So currently branch is failing to build, with task.c:46:21: fatal error: hashtab.h: No such file or directory Here is what I've committed now, the incremental changes were really only using a structure with flex array member for the dependers vectors, removing/making redundant earlier !ent-is_in when adding !is_in into the chain and addition of new testcases. Let's improve it incrementally later. 2013-09-27 Jakub Jelinek ja...@redhat.com * libgomp.h: Include stdlib.h. (struct gomp_task_depend_entry, struct gomp_dependers_vec): New types. (struct gomp_task): Add dependers, depend_hash, depend_count, num_dependees and depend fields. (struct gomp_taskgroup): Add num_children field. (gomp_finish_task): Free depend_hash if non-NULL. * libgomp_g.h (GOMP_task): Add depend argument. * hashtab.h: New file. * task.c: Include hashtab.h. (hash_entry_type): New typedef. (htab_alloc, htab_free, htab_hash, htab_eq): New inlines. (gomp_init_task): Clear dependers, depend_hash and depend_count fields. (GOMP_task): Add depend argument, handle depend clauses. Increment num_children field in taskgroup. (gomp_task_run_pre): Don't increment task_running_count here, nor clear task_pending bit. (gomp_task_run_post_handle_depend_hash, gomp_task_run_post_handle_dependers, gomp_task_run_post_handle_depend): New functions. (gomp_task_run_post_remove_parent): Clear in_taskwait before signalling corresponding semaphore. (gomp_task_run_post_remove_taskgroup): Decrement num_children field and make the decrement to 0 MEMMODEL_RELEASE operation, rather than storing NULL to taskgroup-children. Clear in_taskgroup_wait before signalling corresponding semaphore. (gomp_barrier_handle_tasks): Move task_running_count increment and task_pending bit clearing here. Call gomp_task_run_post_handle_depend. If more than one new tasks have been queued, wake other threads if needed. (GOMP_taskwait): Call gomp_task_run_post_handle_depend. If more than one new tasks have been queued, wake other threads if needed. After waiting on taskwait_sem, enter critical section again. (GOMP_taskgroup_start): Initialize num_children field. (GOMP_taskgroup_end): Check num_children instead of children before critical section. If children is NULL, but num_children is non-zero, wait on taskgroup_sem. Call gomp_task_run_post_handle_depend. If more than one new tasks have been queued, wake other threads if needed. After waiting on taskgroup_sem, enter critical section again. * testsuite/libgomp.c/depend-1.c: New test. * testsuite/libgomp.c/depend-2.c: New test. * testsuite/libgomp.c/depend-3.c: New test. * testsuite/libgomp.c/depend-4.c: New test.
Re: [gomp4] Library side of depend clause support
On Mon, Sep 30, 2013 at 05:04:23PM +0400, Ilya Tocar wrote: On 27 Sep 12:08, Jakub Jelinek wrote: Looks like you forgot some files. I've checked http://gcc.gnu.org/viewcvs/gcc?view=revisionrevision=202968 And e. g. hashtab.h is missing. So currently branch is failing to build, with task.c:46:21: fatal error: hashtab.h: No such file or directory Fixed now, sorry. Jakub
Re: [gomp4] Library side of depend clause support
On Fri, Sep 27, 2013 at 01:48:36AM +0200, Jakub Jelinek wrote: Perhaps. What if I do just minor cleanup (use flexible array members for the reallocated vectors, and perhaps keep only the last out/inout task in the hash table chains rather than all of them), retest, commit and then we can discuss/incrementally improve it? Here is what I've committed now, the incremental changes were really only using a structure with flex array member for the dependers vectors, removing/making redundant earlier !ent-is_in when adding !is_in into the chain and addition of new testcases. Let's improve it incrementally later. 2013-09-27 Jakub Jelinek ja...@redhat.com * libgomp.h: Include stdlib.h. (struct gomp_task_depend_entry, struct gomp_dependers_vec): New types. (struct gomp_task): Add dependers, depend_hash, depend_count, num_dependees and depend fields. (struct gomp_taskgroup): Add num_children field. (gomp_finish_task): Free depend_hash if non-NULL. * libgomp_g.h (GOMP_task): Add depend argument. * hashtab.h: New file. * task.c: Include hashtab.h. (hash_entry_type): New typedef. (htab_alloc, htab_free, htab_hash, htab_eq): New inlines. (gomp_init_task): Clear dependers, depend_hash and depend_count fields. (GOMP_task): Add depend argument, handle depend clauses. Increment num_children field in taskgroup. (gomp_task_run_pre): Don't increment task_running_count here, nor clear task_pending bit. (gomp_task_run_post_handle_depend_hash, gomp_task_run_post_handle_dependers, gomp_task_run_post_handle_depend): New functions. (gomp_task_run_post_remove_parent): Clear in_taskwait before signalling corresponding semaphore. (gomp_task_run_post_remove_taskgroup): Decrement num_children field and make the decrement to 0 MEMMODEL_RELEASE operation, rather than storing NULL to taskgroup-children. Clear in_taskgroup_wait before signalling corresponding semaphore. (gomp_barrier_handle_tasks): Move task_running_count increment and task_pending bit clearing here. Call gomp_task_run_post_handle_depend. If more than one new tasks have been queued, wake other threads if needed. (GOMP_taskwait): Call gomp_task_run_post_handle_depend. If more than one new tasks have been queued, wake other threads if needed. After waiting on taskwait_sem, enter critical section again. (GOMP_taskgroup_start): Initialize num_children field. (GOMP_taskgroup_end): Check num_children instead of children before critical section. If children is NULL, but num_children is non-zero, wait on taskgroup_sem. Call gomp_task_run_post_handle_depend. If more than one new tasks have been queued, wake other threads if needed. After waiting on taskgroup_sem, enter critical section again. * testsuite/libgomp.c/depend-1.c: New test. * testsuite/libgomp.c/depend-2.c: New test. * testsuite/libgomp.c/depend-3.c: New test. * testsuite/libgomp.c/depend-4.c: New test. --- libgomp/libgomp.h.jj2013-09-26 09:43:10.903930832 +0200 +++ libgomp/libgomp.h 2013-09-27 09:05:17.025402127 +0200 @@ -39,6 +39,7 @@ #include pthread.h #include stdbool.h +#include stdlib.h #ifdef HAVE_ATTRIBUTE_VISIBILITY # pragma GCC visibility push(hidden) @@ -253,7 +254,26 @@ enum gomp_task_kind GOMP_TASK_TIED }; +struct gomp_task; struct gomp_taskgroup; +struct htab; + +struct gomp_task_depend_entry +{ + void *addr; + struct gomp_task_depend_entry *next; + struct gomp_task_depend_entry *prev; + struct gomp_task *task; + bool is_in; + bool redundant; +}; + +struct gomp_dependers_vec +{ + size_t n_elem; + size_t allocated; + struct gomp_task *elem[]; +}; /* This structure describes a task to be run by a thread. */ @@ -268,6 +288,10 @@ struct gomp_task struct gomp_task *next_taskgroup; struct gomp_task *prev_taskgroup; struct gomp_taskgroup *taskgroup; + struct gomp_dependers_vec *dependers; + struct htab *depend_hash; + size_t depend_count; + size_t num_dependees; struct gomp_task_icv icv; void (*fn) (void *); void *fn_data; @@ -277,6 +301,7 @@ struct gomp_task bool final_task; bool copy_ctors_done; gomp_sem_t taskwait_sem; + struct gomp_task_depend_entry depend[]; }; struct gomp_taskgroup @@ -286,6 +311,7 @@ struct gomp_taskgroup bool in_taskgroup_wait; bool cancelled; gomp_sem_t taskgroup_sem; + size_t num_children; }; /* This structure describes a team of threads. These are the threads @@ -525,6 +551,8 @@ extern void gomp_barrier_handle_tasks (g static void inline gomp_finish_task (struct gomp_task *task) { + if (__builtin_expect (task-depend_hash != NULL, 0)) +free (task-depend_hash); gomp_sem_destroy (task-taskwait_sem); } ---
[gomp4] Library side of depend clause support
Hi! This patch adds depend clause support. In GOMP_task, before queueing the task, if task has any depend clauses we look up the addresses in a hash table (in the parent task, because only sibling tasks are ordered through depend clause), and if there are any dependencies on the earlier started tasks, the new task isn't added to team, parent and taskgroup task queues, but instead just added into the earlier task's depender vectors. Each task has also an integer number of how many other tasks it depends on. When a task on which something depends on finishes, if parent exists, it's records are removed from parent's depend address hash table, and even if parent doesn't exist anymore, we decrement num_dependees of every task mentioned in the dependers vector. If any of those counters go to zero, we insert them into all the relevant task queues. Tested on x86_64-linux. Will commit tomorrow unless somebody complains, but in any case would appreciate review of the changes. 2013-09-26 Jakub Jelinek ja...@redhat.com * libgomp.h: Include stdlib.h. (struct gomp_task_depend_entry): New type. (struct gomp_task): Add dependers, depend_hash, depend_count, num_dependees and depend fields. (struct gomp_taskgroup): Add num_children field. (gomp_finish_task): Free depend_hash if non-NULL. * libgomp_g.h (GOMP_task): Add depend argument. * hashtab.h: New file. * task.c: Include hashtab.h. (hash_entry_type): New typedef. (htab_alloc, htab_free, htab_hash, htab_eq): New inlines. (gomp_init_task): Clear dependers, depend_hash and depend_count fields. (GOMP_task): Add depend argument, handle depend clauses. Increment num_children field in taskgroup. (gomp_task_run_pre): Don't increment task_running_count here, nor clear task_pending bit. (gomp_task_run_post_handle_depend_hash, gomp_task_run_post_handle_dependers, gomp_task_run_post_handle_depend): New functions. (gomp_task_run_post_remove_parent): Clear in_taskwait before signalling corresponding semaphore. (gomp_task_run_post_remove_taskgroup): Decrement num_children field and make the decrement to 0 MEMMODEL_RELEASE operation, rather than storing NULL to taskgroup-children. Clear in_taskgroup_wait before signalling corresponding semaphore. (gomp_barrier_handle_tasks): Move task_running_count increment and task_pending bit clearing here. Call gomp_task_run_post_handle_depend. If more than one new tasks have been queued, wake other threads if needed. (GOMP_taskwait): Call gomp_task_run_post_handle_depend. If more than one new tasks have been queued, wake other threads if needed. After waiting on taskwait_sem, enter critical section again. (GOMP_taskgroup_start): Initialize num_children field. (GOMP_taskgroup_end): Check num_children instead of children before critical section. If children is NULL, but num_children is non-zero, wait on taskgroup_sem. Call gomp_task_run_post_handle_depend. If more than one new tasks have been queued, wake other threads if needed. After waiting on taskgroup_sem, enter critical section again. * testsuite/libgomp.c/depend-1.c: New test. * testsuite/libgomp.c/depend-2.c: New test. --- libgomp/libgomp.h.jj2013-09-26 09:43:10.903930832 +0200 +++ libgomp/libgomp.h 2013-09-26 17:17:28.267001263 +0200 @@ -39,6 +39,7 @@ #include pthread.h #include stdbool.h +#include stdlib.h #ifdef HAVE_ATTRIBUTE_VISIBILITY # pragma GCC visibility push(hidden) @@ -253,7 +254,19 @@ enum gomp_task_kind GOMP_TASK_TIED }; +struct gomp_task; struct gomp_taskgroup; +struct htab; + +struct gomp_task_depend_entry +{ + void *addr; + struct gomp_task_depend_entry *next; + struct gomp_task_depend_entry *prev; + struct gomp_task *task; + bool is_in; + bool redundant; +}; /* This structure describes a task to be run by a thread. */ @@ -268,6 +281,10 @@ struct gomp_task struct gomp_task *next_taskgroup; struct gomp_task *prev_taskgroup; struct gomp_taskgroup *taskgroup; + struct gomp_task **dependers; + struct htab *depend_hash; + size_t depend_count; + size_t num_dependees; struct gomp_task_icv icv; void (*fn) (void *); void *fn_data; @@ -277,6 +294,7 @@ struct gomp_task bool final_task; bool copy_ctors_done; gomp_sem_t taskwait_sem; + struct gomp_task_depend_entry depend[]; }; struct gomp_taskgroup @@ -286,6 +304,7 @@ struct gomp_taskgroup bool in_taskgroup_wait; bool cancelled; gomp_sem_t taskgroup_sem; + size_t num_children; }; /* This structure describes a team of threads. These are the threads @@ -525,6 +544,8 @@ extern void gomp_barrier_handle_tasks (g static void inline gomp_finish_task (struct gomp_task *task) { + if (__builtin_expect
Re: [gomp4] Library side of depend clause support
On 09/26/2013 11:36 AM, Jakub Jelinek wrote: +struct gomp_task; struct gomp_taskgroup; +struct htab; + +struct gomp_task_depend_entry +{ + void *addr; + struct gomp_task_depend_entry *next; + struct gomp_task_depend_entry *prev; + struct gomp_task *task; + bool is_in; + bool redundant; +}; I'm a bit confused about the combination of linked lists and reallocated arrays. When did you decide to use what? + if ((flags 8) thr-task thr-task-depend_hash) + { + struct gomp_task *parent = thr-task; + struct gomp_task_depend_entry elem, *ent = NULL; + size_t ndepend = (uintptr_t) depend[0]; + size_t nout = (uintptr_t) depend[1]; + size_t i; + gomp_mutex_lock (team-task_lock); + for (i = 0; i ndepend; i++) + { + elem.addr = depend[i + 2]; + ent = htab_find (parent-depend_hash, elem); + for (; ent; ent = ent-next) + if (i = nout ent-is_in) + continue; + else + break; I wonder if we ought always defer tasks with dependencies and skip this lock and search? Unless the taskgroup is truly weird, we *are* going to have dependencies between the tasks. Dunno what exactly to do with final_tasks that have unfulfilled dependencies... I also think it would significantly clean up the code to declare a struct with a variable tail for the depend argument. That would eliminate all of the casting to uintptr_t and give names to the first two entries. + if (tsk-dependers == NULL) + { + tsk-dependers + = gomp_malloc (8 * sizeof (struct gomp_task *)); + tsk-dependers[0] + = (struct gomp_task *) (uintptr_t) 1; + tsk-dependers[1] + = (struct gomp_task *) (uintptr_t) (8 - 2); + tsk-dependers[2] = task; + task-num_dependees++; + continue; Another place for which a struct with variable tail would significantly clean up the code. And here's where I wonder why you're using realloc'd arrays here as opposed to another linked list? Perhaps what we need are smaller linked list entries like struct gomp_task_depend_node { struct gomp_task *task; struct gomp_task_depend_node *next; struct gomp_task_depend_node *prev; }; and a different hash table entry like struct gomp_task_depend_head { void *addr; struct gomp_task_depend_node *outs; struct gomp_task_depend_node *ins; size_t n_ins; }; If we scan the ndepend entries twice, we can find out how many nodes we need, and allocate them with the task like you do now. Scanning the ndepends array twice can be sped by only looking up the hash table entries once -- allocate a local array of size ndepend entries and cache the lookups. I'd say we don't need a count of the n_outs because all of them on the list must be sequentially dependent. Thus any new task simply depends on the previous task in the outs list. Thus imo it makes sense to have ins/outs point to the tail of the list as opposed to the head. Is is really worthwhile to detect redundant dependencies? It seems just as easy to add multiple dependencies and let them just fall out naturally. OTOH, perhaps you should just go ahead with this patch and we can evolve it incrementally. I don't see anything technically wrong with it. r~
Re: [gomp4] Library side of depend clause support
On Thu, Sep 26, 2013 at 03:54:09PM -0700, Richard Henderson wrote: On 09/26/2013 11:36 AM, Jakub Jelinek wrote: +struct gomp_task; struct gomp_taskgroup; +struct htab; + +struct gomp_task_depend_entry +{ + void *addr; + struct gomp_task_depend_entry *next; + struct gomp_task_depend_entry *prev; + struct gomp_task *task; + bool is_in; + bool redundant; +}; I'm a bit confused about the combination of linked lists and reallocated arrays. When did you decide to use what? I initially wanted to use linked lists only, but while I can statically preallocate the chains for the hash table, for the depender - dependee chains where a task may depend on many other tasks that would mean having to allocate small structures (or pool allocate them, per team?). I wonder if we ought always defer tasks with dependencies and skip this lock and search? Unless the taskgroup is truly weird, we *are* going to have dependencies between the tasks. Dunno what exactly to do with final_tasks that have unfulfilled dependencies... I think final tasks aren't a problem, if the parent is a final task, then all the children are non-deferred, thus we never record any dependencies and the test for that will be cheap too (because parent-depend_hash will be NULL). The problem is if (0) tasks, the spec says that they must be non-deferred unless they depend on some earlier non-finished task. But the cost in that case is primarily in taking the lock/unlock; the search will stop on the first dependency found, if there aren't any, nothing will be recorded and we don't jump to the defer label, if there are some, as soon as we discover first we jump there. I also think it would significantly clean up the code to declare a struct with a variable tail for the depend argument. That would eliminate all of the casting to uintptr_t and give names to the first two entries. I agree if we keep using realloced vectors that flexible array would make it cleaner. Perhaps what we need are smaller linked list entries like struct gomp_task_depend_node { struct gomp_task *task; struct gomp_task_depend_node *next; struct gomp_task_depend_node *prev; }; The dependee - depender vectors resp. linked lists are just pushed to first (the only thing needed during insertion is to have a cheap check if the last inserted task is the current one, to avoid having the same task multiple times in the vector/linked list), and then just walked once when the dependee finishes, so no removal is needed there, it can be freed at once; thus, for linked list, it would be enough to use non-doubly linked list for that. For the hash table chains, unless we want to always lookup the hash table and walk the chains for removal, we need doubly linked list. and a different hash table entry like struct gomp_task_depend_head { void *addr; struct gomp_task_depend_node *outs; struct gomp_task_depend_node *ins; size_t n_ins; }; You mean that the hash table instead would contain the structures, or pointers to these structures? If the latter (not sure what n_ins would be for), then we'd again need to pool alloc them. If we scan the ndepend entries twice, we can find out how many nodes we need, and allocate them with the task like you do now. Scanning the ndepends array twice can be sped by only looking up the hash table entries once -- allocate a local array of size ndepend entries and cache the lookups. I'd say we don't need a count of the n_outs because all of them on the list must be sequentially dependent. Thus any new task simply depends on the previous task in the outs list. Thus imo it makes sense to have ins/outs point to the tail of the list as opposed to the head. Ah, haven't thought about it this way, yes, you're right that for out/inout dependencies it is enough to remember in the hash table the last one, because the dependencies will form a chain on the same address and serialize the tasks. Is is really worthwhile to detect redundant dependencies? It seems just as easy to add multiple dependencies and let them just fall out naturally. I just didn't want to have duplicates in the hash table chains, the redundant flag is just a sign that the entry doesn't need to be removed from the hash table chains. OTOH, perhaps you should just go ahead with this patch and we can evolve it incrementally. I don't see anything technically wrong with it. Perhaps. What if I do just minor cleanup (use flexible array members for the reallocated vectors, and perhaps keep only the last out/inout task in the hash table chains rather than all of them), retest, commit and then we can discuss/incrementally improve it? Jakub