Hi. Sorry for being late. > Depends on what you mean by lazy task creation, gcc schedules > tasks lazily if they aren't if (0), some data structure if created > for them when encountering #pragma omp task directive, but I guess > any implementation will do something like that.
I mean the following implementation by Lazy Task Generation: - 1 CPU core has 1 worker - 1 worker has 1 deque (LIFO) - 1 deque has some tasks - What worker does are: - Execute tasks from the head of deque - Steel a task from the tail of deque in another core - When task A encounters "#pragma omp task" derective, worker creates a task and immediately execute it. Worker pushes A to the head of deque. (Here occurs context switch) This is important point because A can move to other deques. (*) - Steel a task from a deque in another core when the deque on the core is empty My associate sinior has already made a library which realizes this scheduling algorithm above. (It is called `MassiveThreads' but the paper related to its work is written in Japanese.) MassiveThreads has proved this algorithm makes things like OpenMP Task speedy. Taking this implementation, - Nested `task' derectives can be processed naturally - Given that task A is a member of deque D and task A1 is created in D when task A encounters `task' derective. (See (*)) A1 runs soon after it is created. So although A will execute some functions which takes too long to finish, this work can be done after A is stolen into another deque than D Anyway, I'd like to read some materials refered to when current libgomp `task' is implemented (to read the code smoothly). Do you know any of that? > What your testcase shows is not whether tasks are created lazily or not, but > how good/poor #pragma omp taskwait implementation is. And, for your testcase > libgomp/task.c (GOMP_taskwait) definitely could be improved. Currently it > only > tries to schedule in children that will be awaited by the current tasks and if > there are no such children, goes to sleep, waiting for them to complete. > Scheduling in random unrelated tasks is problematic, because the unrelated > task might take too long to complete and delay the taskwait for way too long > (note, gcc doesn't have untied tasks, all tasks are tied once they are > scheduled > onto some particular tasks - setcontext/swapcontext is quite fragile thing to > do). > But it is true it could very well schedule tasks that are taskwaited by tasks > taskwaited by current task, and transitively further. Plus, be able to > temporarily > awake such a sleeping thread if there are tasks it can transitively taskwait > for, as if those don't complete, the current taskwait won't return. > > Jakub -- Sho Nakatani