There is a serious bottleneck in the existing TaskManager implementation
when, under load, the typical task has at least one runAfter dependency
on an older task at the time it is added.
I plan to remove this bottleneck, but I think understanding it may help
in some decisions that need to be made about the new implementation, so
I'm going to describe it.
The constructor comment block says: "A new thread is created if the
total number of runnable tasks (both active and pending) exceeds the
number of threads times the loadFactor, and the maximum number of
threads has not been reached."
Achieving that requires a check whenever the number of runnable tasks
increases. Similarly, a thread that is waiting for a task to run needs
to be notified when a runnable task becomes available.
There are two ways a Task x can become runnable:
1. At the time x is added, it does not need to run after any older task.
2. Another task y has just completed, x did need to run after y, and x
does not need to run after any other task that still exists.
Condition 1 is being handled correctly, with both a check for increasing
the number of tasks and a notify on add of an immediately runnable task.
Condition 2 is completely ignored.
At least one thread will always be created, because the first task to be
added must be immediately runnable. Even if a second thread gets created
because a task happens to be immediately runnable, there is a danger of
a thread timing out waiting for work, because it will not get notified
until another immediately runnable task is added. For a workload with
dependencies, the single threading will tend to increase the queue
length, making added tasks less likely to be immediately runnable
because they have more tasks to conflict with.
I believe the problem is a consequence of the data structure design,
which makes asking "Did this task completion make any other task or
tasks runnable?" very expensive. The design I'm working on will make
answering this question a very fast side effect of work that needs to be
done anyway.
Empirically, I have tested a workload with dependencies that should
alternate between 5 parallel tasks and a single task, mean three tasks
at a time, but runs no faster permitted up to 100 threads than when
limited to one thread. I measured on a computer with two Xeon
processors, each dual core and dual threaded.
Patricia