I will also chime in here to say that managing output streams and errors for babel is a major new feature that I am interested in. The issue, as Tim points out, is that there is a lot of complexity lurking here due to the fact that certain languages have fundamentally different capabilities and ways of handling or not handling errors, and of running code (on arbitrary hosts) in the first place.
What works for one will almost certainly not work for another. Take for example ob-lisp where there is already built in error handling in emacs itself. Compare that with python where someone would likely need to implement a special PYTHONBREAKPOINT entrypoint or something like that, if it were possible at all. I have had a draft of a document on what I called "babel regularization" for well over a year now, but it is not in a state that would be productive to share due to the sheer number of ob-langs and systems affected and the need to be able to clearly catalog and articulate the diversity of existing behaviors. If you dig through old conversations on this list you will find a discussion of the default behavior for ob-shell :returns values vs output as the default, we were barely able to agree on which principles should be followed to make the decision. In that case we were lucky that there was already a way for users to set their desired behavior in their init file or in a setup file or in the file itself. How to handle errors will be much more complex, in part because it will touch on what ob-lang implementations are able to overwrite and/or must provide in order to actually function. At the moment there are practically no constraints. Lots of work to do here, so grateful for a report on the variability in the behavior of the existing system. Best! Tom