Hi @mratsim — I may have been unclear in my last response: I didn't mean to imply that I didn't believe there are realistic workloads that would result in recursive tasks or benefit from task serialization/squashing/stealing techniques. Just that the computations and users that have found their way to Chapel typically haven't exercised that style, so it hasn't been an area of focus for our team (combined with our choice to make Chapel tasks more arbitrary/unpredictable than most task-based programming systems).
Also, I should note that it's still possible to write recursive or dynamically balanced tasking programs in Chapel, simply that we typically express those patterns within the language or library rather than the implementation. For example, our libraries have iterators that can dynamically distribute iterations either locally across a compute node's cores or globally across multiple compute nodes and their cores. As another example, if I rewrite my naive Fibonacci computation in a way that manually throttles the creation of tasks rather than naively creating exponentially many of them, I can reduce the fib(40) case to just over a second on my 4-core laptop (I'll include my fib codes at the bottom of this for reference). My colleage pointed out something from your earlier post that I didn't pick up on (and I imagine other readers may not have either): the Nemesis and Sherwood lines in the graphs you showed above are scheduler options in the Sandia Qthreads library that Chapel targets by default, so are representative of what's possible in Chapel today. Qthreads defaults to using the Sherwood scheduler, but Chapel chooses to use the Nemesis scheduler by default instead (though the user can override this default). We found that while the Sherwood scheduler was highly tuned for UTS (Unbalanced Tree Search) and great for highly unbalanced workloads with many many tasks, it had too much overhead for well-balanced operations, which is what we tend to care most about. In contrast, the Nemesis scheduler avoids those overheads and has much better performance for balanced workloads, but gives up work-stealing. Ultimately we hope to have the best of both worlds. We had started some work with the Qthreads team to create a distrib scheduler that would do that ([https://github.com/Qthreads/qthreads/issues/39](https://github.com/Qthreads/qthreads/issues/39) ), but the effort is currently stalled out. Wrapping up, here's my naive Fibonacci in Chapel: config const n = 10; writeln("fib(",n,") = ", fib(n)); proc fib(n): int { if n < 2 then return n; else { var i, j: int; sync { begin with (ref i) i = fib(n-1); j = fib(n-2); } return i+j; } } Here's my manually throttled version: config const n = 10; writeln("fib(",n,") = ", fib(n)); proc fib(n): int { if n < 2 then return n; else { if (here.runningTasks() >= here.numPUs()) then return fib(n-1) + fib(n-2); else { var i, j: int; sync { begin with (ref i) i = fib(n-1); j = fib(n-2); } return i+j; } } } And here's a very simple example of the bounded buffer pattern I was mentioning. We've never really treated this as a benchmark, more of a demonstration of how easy task parallelism can be in Chapel, where the key points are (a) task creation is simple and (b) sync variables avoid the need to worry about the typical overflow/underflow conditions: config const buffsize = 10, n = 101; var A: [0..#n] sync int; cobegin { producer(); consumer(); } proc producer() { for i in 0..#n do // write n values to the buffer A[i%buffsize] = i; A[n%buffsize] = -1; // write a "done" sentinel } proc consumer() { var pos = 0; do { const val = A[pos]; // read value if val != -1 then writeln("got: ", val); pos += 1; // advance cursor pos %= buffsize; } while (val != -1); } A more full-blown version of this pattern that we use as an exercise for students in hands-on exercises (where their goal is to add support for multiple producers and consumers on the same buffer) can be found here (with sample solutions in the subdirectory): [https://github.com/chapel-lang/chapel/tree/master/test/exercises/boundedBuffer](https://github.com/chapel-lang/chapel/tree/master/test/exercises/boundedBuffer)