Probably best to bring this to loom-dev as there have been some
exploration into but where we decided not to expose any APIs at this time.
-Alan
On 09/07/2024 19:50, Louis Wasserman wrote:
My understanding of the structured concurrency APIs now in preview is
that when a subtask is forked, exceptions thrown in that stack trace
will have stack traces going up to the beginning of that subtask, not
e.g. up the structured concurrency task tree. (My tests suggest this
is the case for simple virtual threads without structured
concurrency.) Most concurrency frameworks on the JVM that I’ve
encountered share the property that stack traces for exceptions don’t
trace through the entire causal chain – and, not unrelatedly, that
developers struggle to debug concurrent applications, especially with
stack traces from production and not full debuggers attached.
In some cases, like chained CompletableFutures, this seems necessary
to ensure that executing what amounts to a loop does not result in
stack traces that grow linearly with the number of chained futures.
But when structured concurrency is involved, it seems more plausible
to me that the most useful possible stack traces would go up the tree
of tasks – that is, whenever a task was forked, the stack trace would
look roughly as if it were a normal/sequential/direct invocation of
the task. This could conceivably cause stack overflows where they
didn’t happen before, but only for code that violates the expectations
we have around normal sequential code: you can’t recurse unboundedly;
use iteration instead.
I’m curious if there are ways we could make the upcoming structured
concurrency APIs give those stack traces all the way up the tree, or
provide hooks to enable you to do that yourself. Last year’s JVMLS
talk on Continuations Under the Covers demonstrated how stacks were
redesigned in ways that frequently and efficiently snapshot the stack
itself – not just the trace, but the thing that includes all the
variables in use. There’s a linked list of StackChunks, and all but
maybe the top of the stack has those elements frozen, etc, and the top
of the stack gets frozen when the thread is yielded. Without
certainty about how stack traces are managed in the JVM today, I would
imagine you could possibly do something similar – you’d add a way to
cheaply snapshot a reference to the current stack trace that can be
traversed later. If you’re willing to hold on to all the references
currently on the stack – which might be acceptable for the structured
concurrency case in particular, where you might be able to assume
you’ll return to the parent task and its stack at some point – you
might be able to do this by simply wrapping the existing StackChunks.
Then, each `fork` or `StructuredTaskScope` creation might snapshot the
current call stack, and you’d stitch together the stack traces
later…somewhere. That part is a little more open ended: would you add
a new variant of `fillInStackTrace`? Would it only apply to
exceptions that bubbled up to the task scope? Or would we be adding
new semantics to what happens when you throw an exception or walk the
stack in general? The most plausible vision I have at this point is
an API that spawns a virtual thread which receives a stack trace of
some sort – or perhaps snapshots the current stack trace – and
prepends that trace to all stack traces within the virtual thread’s
execution.
I suppose this is doable today if you’re willing to pay the
performance cost of explicitly getting the current stack trace every
time you fork a task or start a scope. That is kind of antithetical
to the point of virtual threads – making forking tasks very efficient
– but it’s something you might be willing to turn on during testing.
Right now, my inspiration for this question is attempting to improve
the stack trace situation with Kotlin coroutines, where Google
production apps have complained about the difficulty of debugging with
the current stack traces. But this is something I'd expect to apply
equally well to all JVM languages: the ability to snapshot and string
together stack trace causal chains like this in production could
significantly improve the experience of debugging concurrent code.
--
Louis Wasserman