Probably best to bring this to loom-dev as there have been some exploration into but where we decided not to expose any APIs at this time.

-Alan

On 09/07/2024 19:50, Louis Wasserman wrote:
My understanding of the structured concurrency APIs now in preview is that when a subtask is forked, exceptions thrown in that stack trace will have stack traces going up to the beginning of that subtask, not e.g. up the structured concurrency task tree.  (My tests suggest this is the case for simple virtual threads without structured concurrency.)  Most concurrency frameworks on the JVM that I’ve encountered share the property that stack traces for exceptions don’t trace through the entire causal chain – and, not unrelatedly, that developers struggle to debug concurrent applications, especially with stack traces from production and not full debuggers attached.

In some cases, like chained CompletableFutures, this seems necessary to ensure that executing what amounts to a loop does not result in stack traces that grow linearly with the number of chained futures.  But when structured concurrency is involved, it seems more plausible to me that the most useful possible stack traces would go up the tree of tasks – that is, whenever a task was forked, the stack trace would look roughly as if it were a normal/sequential/direct invocation of the task.  This could conceivably cause stack overflows where they didn’t happen before, but only for code that violates the expectations we have around normal sequential code: you can’t recurse unboundedly; use iteration instead.

I’m curious if there are ways we could make the upcoming structured concurrency APIs give those stack traces all the way up the tree, or provide hooks to enable you to do that yourself.  Last year’s JVMLS talk on Continuations Under the Covers demonstrated how stacks were redesigned in ways that frequently and efficiently snapshot the stack itself – not just the trace, but the thing that includes all the variables in use.  There’s a linked list of StackChunks, and all but maybe the top of the stack has those elements frozen, etc, and the top of the stack gets frozen when the thread is yielded.  Without certainty about how stack traces are managed in the JVM today, I would imagine you could possibly do something similar – you’d add a way to cheaply snapshot a reference to the current stack trace that can be traversed later.  If you’re willing to hold on to all the references currently on the stack – which might be acceptable for the structured concurrency case in particular, where you might be able to assume you’ll return to the parent task and its stack at some point – you might be able to do this by simply wrapping the existing StackChunks.  Then, each `fork` or `StructuredTaskScope` creation might snapshot the current call stack, and you’d stitch together the stack traces later…somewhere.  That part is a little more open ended: would you add a new variant of `fillInStackTrace`?  Would it only apply to exceptions that bubbled up to the task scope?  Or would we be adding new semantics to what happens when you throw an exception or walk the stack in general?  The most plausible vision I have at this point is an API that spawns a virtual thread which receives a stack trace of some sort – or perhaps snapshots the current stack trace – and prepends that trace to all stack traces within the virtual thread’s execution.

I suppose this is doable today if you’re willing to pay the performance cost of explicitly getting the current stack trace every time you fork a task or start a scope.  That is kind of antithetical to the point of virtual threads – making forking tasks very efficient – but it’s something you might be willing to turn on during testing.

Right now, my inspiration for this question is attempting to improve the stack trace situation with Kotlin coroutines, where Google production apps have complained about the difficulty of debugging with the current stack traces.  But this is something I'd expect to apply equally well to all JVM languages: the ability to snapshot and string together stack trace causal chains like this in production could significantly improve the experience of debugging concurrent code.

--
Louis Wasserman

Reply via email to