From: Allison Randal <[EMAIL PROTECTED]> Date: Tue, 18 Apr 2006 15:07:56 -0700
. . . HLL exception handlers on the other hand, are likely to be written as independent subroutines, much like the current signal handlers in Perl 5. An exception handler is closer to an event handler than it is to a return continuation. (The design choice is between having exception handlers that are complete compilation units, or just code segments. Both are valid options. And it may be that we want to support both.) I see three possibilities: 1. Compilation units only; 2. Continuations only; and 3. Compilation units that (may) invoke continuations. The third is an inclusive interpretation of "both" -- I will argue below that this is the best choice. Presumably, the first two could be implemented in terms of the third? The "error-prone" comment has to do with control flow. The effect of the current implementation is that when the interpreter catches an exception, it dumps control flow at the label that was captured in the continuation. Any control flow after that is the responsibility of the developer, and it's easy to get it wrong. Seems to me that this is unavoidable. Exceptions are useful mainly because they allow these drastic changes to normal control flow. Writers of HLL code are relieved of at least some of this responsibility by their compiler, but writers of PIR are exposed to the full complexity of nonlocal transfer of control. It might be more helpful if the continuation taken was a return continuation: where to return to if an exception is caught and successfully handled. I would tend to agree. But something has to decide which handler gets to catch the exception, before the continuation is invoked. So that would mean dividing the current Parrot notion of exception handler into a tester sub which can be invoked in the dynamic context of the error, and the actual "what to do" code, which is reached via the continuation. Reading ahead, this does seem to be what you have in mind; am I right? But is this much of a change really on the table? I had thought that PIR-visible semantic changes are frowned on these days? >> =item * >> >> C<pushaction> pushes a subroutine object onto the control >> stack. If the >> control stack is unwound due to an exception (or C<popmark>, or >> subroutine return), the subroutine is invoked with an integer >> argument: >> C<0> means a normal return; C<1> means an exception has been >> raised. >> [Seems like there's lots of room for dangerous collisions here.] > > I'm not sure what you mean by "collisions" here, nor why you think > they would be dangerous. Specifically, because the control stack is used for multiple different things, it's easy to get into a situation where the thing you're popping off the stack isn't what you meant to pop off the stack. It's one of the reasons we aren't using stack-based control flow through most of Parrot. Do you have a specific example of such a situation? For compiled languages (AFAIK), the features that use the control stack have well-defined lexical "enter" and "exit" points, which makes it easy for a compiler to generate correct code; that's the reasoning behind the "behaves like a stack" argument below. Of course, that's not the case for hand-written PIR, but the only remedy I can think of -- giving each dynamic construct its own private stack -- seems like it would add a lot of complexity for (IMO) an obscure benefit. > Arguably, C<pushaction> is too simplistic; it > doesn't provide for such things as the repeated exit-and-reenter > behavior of coroutines, and there is no mechanism to specify a thunk > that gets called when *entering* a dynamic context . . . That too. I'm working on mods to actions as part of my (long overdue) dynamic binding implementation proposal [1]. I think we also need a C<popaction> for consistency, and should probably support "enter" actions as well as "exit" actions. Another thing that may need clarification is the environment in which the action runs. Since actions are kept on the control stack, and since the current implementation calls them just after they are popped, they see exactly the dynamic context in effect at C<pushaction> time. This is true even when throwing to an outer exception handler. For instance, if A calls B calls C calls D, and A pushed EH1, B pushed EH2, C pushed action C1, and D throws to EH1 in A, the current implementation calls C1 with both EH1 and EH2 still in scope. I think this is correct; otherwise, the programmer can't count on the dynamic state of the cleanup action. One could make a case that both handlers, or at least EH2, should be popped first, but this seems wrong. >> =head1 IMPLEMENTATION >> >> [I'm not convinced the control stack is the right way to handle >> exceptions. Most of Parrot is based on the continuation-passing >> style of >> control, shouldn't exceptions be based on it too? See bug #38850.] > > Seems to me there isn't any real choice. Exception handlers are part > of the dynamic context, and dynamic contexts nest in such a way as to > behave like a stack. Even pure CPS implementations that want to > maintain dynamic state have to create an explicit stack in a global > variable somewhere. "dynamic contexts nest in such a way as to behave like a stack" is true, but not necessarily the same thing as storing all exception handlers on a single global stack that's also used for primitive control flow. By "primitive control flow" do you mean C<bsr/ret>? I would agree that's pretty primitive -- and might be better off with its own stack (see below). Otherwise, keeping handlers on the same stack with actions and (some day) temporizations makes it convenient to peel them back in the right order. Actions need to be executed in the right dynamic binding and handler context, for one thing. Let's take the example of something that recently came up: asynchronous I/O with exceptions. The current implementation says: push a global exception handler onto the stack, call the routine that might throw an exception, then pop the exception handler off the stack. But with asynchronous I/O, the exception handler is likely to be popped off the stack long before the async call throws an exception. Or, if you delay popping off the exception handler until the async callback is called, then you may have other exception handlers pushed onto the stack in the mean time (possibly exception handlers for other async calls). Or the original handler may catch something it wasn't supposed to. Excellent point. In theory, the return continuation maintains the state of the caller's control stack, so you can invoke return continuations up the CPS chain until you reach a dynamic context where the exception is handled. But where does control flow go after you handle an exception from an async op? My kneejerk reaction is that maybe each asynchronous IO operation requires its own coroutine (or something very like it) so that the user can set up a different dynamic state than the main line that spawned it. Off the top of my head, if you yield to the async coro before it's ready, nothing happens, but if an exception was pending, the exception happens in the coro environment. But this is all very half-baked; the "coro" would look more like a lightweight thread. And I'm not even certain how one would want an asynchronous IO API to look, having never played with one (and having largely ignored the thread). >> Other opcodes respond to an C<errorson> setting . . . > > This have-your-cake-and-eat-it-too (HYCAEIT?) strategy sounds good in > theory, but may be dangerous in practice. Which style of error handling > a given piece of code uses is a static property of the way the code is > written. On the other hand, C<errorson> is dynamic and global. If one > of the modules you use wants to do error handling by checking return > values, but another module doesn't check returns because it expects > errors to be signalled, then no C<errorson> setting will satisfy both, > regardless of how you want to design *your* code. Maybe we need a non-global equivalent of these options. I was trying to argue that any such option that acts globally would be too much trouble to support. A global option might work if it could be temporized, but temporizing around method calls to objects that want to handle errors differently would be tedious and error-prone. It also requires the programmer to be aware of the "preferred" setting of C<errorson> for all modules used, which is also error-prone. Enforcing a single model, or at the very least having it lexically "compiled in," seems much more tractable. > I personally prefer exception-based error handling, since it scales > better. I have been acting on this when the opportunity arises, > changing internal_exception calls to real_exception when it makes sense, > and when I'm mucking around in that code anyway. (A good example of > this is "No exception to pop", come to think of it.) It is also helpful > to get a backtrace when something fails. Backtracing can be enabled without exceptions. Really? Even where internal_exception is called? >> =head2 Excerpt >> >> [Excerpt from "Perl 6 and Parrot Essentials" to seed discussion. >> Out-of-date in some ways, and in others it was simply >> speculative.] For everything below this point, keep in mind that the text was written in 2004. Sorry; I didn't mean to be pedantic. >> process continues until some exception handler deals with the >> exception >> and returns normally, or until there are no more exception >> handlers on >> the control stack. When the system finds no installed exception >> handlers >> it defaults to a final action, which normally means it prints an >> appropriate message and terminates the program. > > Currently it also prints a backtrace, which is really nice. Alas, the > backtrace is only from the point of the final rethrow by the oldest > (bottommost) exception handler. This is the greatest weakness with the > current Parrot exception-handling design: By the time you find out that > a given exception is unhandled, the dynamic environment of the C<throw> > has been destroyed by the very process of searching for a willing > handler. This makes it extremely difficult to write a debugger than > can do anything useful about uncaught exceptions. Exception handler tracing is a useful feature, and is worth adding if it doesn't cost too much (in terms of implementation complexity, execution speed, etc). I would agree, but I wasn't just talking about tracing (and tracing exceptions, rather than handlers). I was talking about allowing an interactive debugger, as in "perl -d", to take control at the point where the uncaught exception is signaled, so that I can figure out why it wasn't caught. (For the record, I've never actually needed to use "perl -d" to debug a Perl 5 program with hairy eval/die logic, but I bet it's no picnic.) However, as I've already hinted, I think a workable solution is within reach . . . >> When the system installs an exception handler, it creates a return >> continuation with a snapshot of the current interpreter >> context. If > > This is confusing; I assume you are talking about the Exception_Handler > itself and not a RetContinuation. In this context, no. It really meant a return continuation. Hmm. A RetContinuation recycles the leaving context, but in the case of an exception, we don't know the identity of the leaving context until the exception is invoked, which makes it hard to decide whether this is safe/appropriate. So, unless I am still misunderstanding you, I don't think this works with the current codebase (though it ought to work if the "lightweight RetContinuation" proposal [2] is ever implemented). > Hmm. It seems that an exception is "cleanly caught" only if it is not > rethrown. It is therefore not possible to tell by looking at the > exception itself whether or not it is "cleanly caught" or if it is still > in the process of being signalled. I think I now understand how you mean to do this. > You seem to want to say that unhandled exceptions are ignored. Is > that correct? If so, I see several problems: > > 1. What is "the exception handler function" and how is it > distinguished from the function that established the exception handler? > [It sounds like you are expecting the exception handler to behave more > like a closure than a continuation . . . ] An "exception handler function" would be an exception handler that is a complete compilation unit rather than just a code segment inside some other compilation unit. Great; got it. > 2. The previous paragraph says that if "the exception handler just > returns", that means that "the exception is cleanly caught". > Unless you > want to propose a new mechanism, the only way a handler can decline to > handle an exception is by rethrowing it, which precludes the > possibility of resuming. I now realize that you *were* proposing a new mechanism (new to me in any case), using an "exception handler function" that "just returns." So never mind. The current prototype implementation doesn't support resumable exceptions, it's true . . . "Prototype"?? That implies a lot more flexibility to change the way Parrot exceptions work than I had thought would be allowed . . . But, resumable exceptions are a useful feature, and one that we originally planned for Parrot. Before we throw out the baby with the bath water, we need to first look at what it will take to build in resumable exceptions. It's possible that an architecture that supports resumable exceptions may be a better architecture overall. I certainly agree that versatile error recovery is a big plus. In fact, it's one of the things I like about Common Lisp, in which debuggers typically present a menu of corrective actions for an unhandled error along with the error message. But thinking about this has made me realize the nature of my problem with the following statement: Exceptions thrown by standard Parrot opcodes (like the one thrown by C<find_global> above or by the C<throw> opcode) are always resumable, so when the exception handler function returns normally it continues execution at the opcode immediately after the one that threw the exception. When I think of "resumable" errors, I think of being able to "skip" and "retry" as the two main possibilities that apply to most situations, with "substitute some other value" and possibly other corrective actions as additional possibilities that depend on the operation. Right there, the handler would need to do more than "just return" in order to select the right possibility. But these possibilities really apply to operations that are much higher level than instructions, such as compiling a file or sending an email. For the most part, there is no way for an outside agent to determine whether it is appropriate (or even safe) to skip or retry an opcode; indeed, that may not even be apparent to the person who wrote the HLL code from which it was compiled. In other words, I think "resuming" makes sense only in terms of HLL-programmer-defined concepts. In which case, there may be a whole slew of restart alternatives that are available in the current dynamic context, and there need to be mechanisms for finding out what they are, and invoking a particular one. If you like (and assuming that you don't think I'm on the wrong track), I can try to design something for Parrot based on the Common Lisp model [3]. In this vein, it occurrs to me that the current design doesn't specify what other actions a handler is allowed to take. To quote the relevant paragraph (the "previous paragraph" mentioned above): When the system installs an exception handler, it creates a return continuation with a snapshot of the current interpreter context. If the exception handler just returns (that is, if the exception is cleanly caught) the return continuation restores the control stack back to its state when the exception handler was called, cleaning up the exception handler and any other changes that were made in the process of handling the exception. To paraphrase, each exception handler function has an associated continuation. If the handler "just returns," Parrot invokes the associated continuation, and the exception is thereby handled. Have I got this right? If so, how does a handler *decline* to handle the exception? By rethrowing? And is it acceptable for the handler to take other action, e.g. by making a non-local exit via some other continuation? Because, besides being useful in its own right, that is the logical way for a handler to invoke a restart. Allow me to propose an answer: 1. When an exception handler function is called during a C<throw>, the handler is allowed to do pretty much anything, with the caveat that it is running in the dynamic context of the code that is throwing, modified temporarily such that the handler itself is not bound, i.e. the handler can resignal the same condition (or another of the same class) without invoking itself [4]. 2. If the handler returns, then it has declined to handle the exception, and Parrot goes on to try the next most recently bound handler. 3. If the handler decides to handle the exception, it does so by effecting a non-local exit. This could be by calling a continuation, presumably to return to some point in the context that bound the handler, by invoking a restart, or by throwing a new exception. It may also make sense to rethrow the same exception, which (for non-fatal exceptions) gives older handlers a chance to run first, making the inner handler in effect a default handler. 4. If no handler takes up the challenge, then do nothing, continuing after the signaling instruction in an appropriate way. Languages that want some other behavior (such as "exit(255)" or entering a debugger) must arrange to wrap the necessary handler around their main program. Note that the code internal to C<throw> that is invoking the handlers doesn't even need to know about the continuations that are used; they would be used directly by the handlers, where presumably they would be kept in closure variables. IMHO, this would be a great improvement; it would solve the debugger problem discussed above. Also (though I almost hesitate to mention it [5]), this is compatible with the Common Lisp "condition" system design of "signaling" [6], though I've left out a few subtleties. On the down side, it makes it more difficult to mark exceptions as "handled", since the very act of handing them transfers control to somewhere else. > 3. Shouldn't unhandled exceptions either enter the debugger if > interactive, else die? Ignoring the fact that an opcode failed, like > ignoring the fact that anything else failed, seems dangerous . . . > > new P10, Exception # create new Exception object > set P10["_message"], "I die" # set message attribute > throw P10 # throw it There are different levels of severity in exceptions. Some are necessarily fatal. Some aren't. For example, some languages treat the "end of file" condition as a non-fatal exception. And other languages will require a that fatal (or at least "serious") exception be signaled. In CL, for example, unhandled EOF errors are defined in such a way as to enter the debugger by default. Dealing with this seems to reqire the following: 1. Define mechanisms for non-fatal exceptions. C<throw> could just fall through to the next instruction, but it might be useful to have one op that might return if the error is unhandled and another that never returns [7], for the sake of code optimization. Then again, maybe this should depend solely on the exception class. 2. Define a "generic" EOF exception which is non-fatal, and arrange to signal it when an EOF is detected. If it returns, then the code sets up the appropriate EOF return value(s). 3. Languages that require a fatal EOF bind a handler around the dynamic scope of their code that intercepts the generic EOF and signals the right language-appropriate exception. Such a binding would not be easy to undo if the "strict EOF language" calls into a "non-strict EOF language", so it might be better to choose the except class based on the HLL from the start. Which brings up another issue. The description of C<die> implies that exception type and severity are separate: C<die> throws an exception. It takes two arguments, one for the severity of the exception and one for the type of exception. Shouldn't the severity be defined by the exception class? Specifically, by the taxonomy of exception classes? >> Exceptions are designed to work with the Parrot calling >> conventions. >> Since the return addresses of C<bsr> subroutine calls and >> exception >> handlers are both pushed onto the control stack, it's generally >> a bad >> idea to combine the two. > > How about replacing this with the following: > > . . . exception > handlers are both pushed onto the control stack, care must be taken > to nest them properly, i.e. by removing error handlers established > after C<bsr> before the corresponding C<ret>. > > After all, it works as long as the user plays by the rules. We can define any set of rules for exceptions (or calling conventions, or any other Parrot subsystem) and expect users to follow them, but some sets of rules are more prone to user error than others. Our job as designers and implementors is to examine the options and choose the set of rules that is most stable, robust, maintainable, and (as much as possible) user-friendly. Allison All very true. And C<ret> addresses are unique among the denizens of the control stack in only pertaining to the context that pushed them; they don't actually affect the dynamic context of called subs. So one could certainly make a case that each context deserves its own "stacklet" expressly to contain C<ret> addresses. Then again, is this worth it? It does make PIR slightly more "user-friendly" in this regard, but I can't imagine ever needing C<bsr/ret> in the first place. Sorry it took me so long to get my thoughts together. -- Bob [1] You didn't ask, but there's a draft up at http://rgrjr.dyndns.org/perl/dynbind-proposal-v2.html . A key work deadline has passed, so I expect to have more time to work on it. [2] See the "RetContinuation promotion, closures, and context leakage" post of Sat, 04 Feb 2006 13:06:46 -0800 (http://www.mail-archive.com/perl6-internals@perl.org/msg31219.html). [3] CL calls them "restarts"; see http://www.lispworks.com/documentation/HyperSpec/Body/09_adb.htm if you're curious. [4] This modification of the dynamic statue may argue in favor of putting exception handlers in their own dynamic stack, though. [5] I mentioned this in a "Re: [RFC] Dynamic binding patch" post on Tue, 3 Jan 2006 23:43:50 -0500 in response to Larry's reply (post 6 of http://xrl.us/ji2r). But I got warnocked, so I don't know what Larry (or anyone else) thinks. [6] http://www.lispworks.com/documentation/HyperSpec/Body/09_ada.htm [7] FWIW, CL calls these SIGNAL and ERROR, respectively.