Re: early draft of exceptions PDD

Bob Rogers Sat, 29 Apr 2006 20:50:22 -0700

   From: Allison Randal <[EMAIL PROTECTED]>
   Date: Tue, 18 Apr 2006 15:07:56 -0700


   . . .

   HLL exception handlers on the other hand, are likely to be written as  
   independent subroutines, much like the current signal handlers in  
   Perl 5. An exception handler is closer to an event handler than it is  
   to a return continuation. (The design choice is between having  
   exception handlers that are complete compilation units, or just code  
   segments. Both are valid options. And it may be that we want to  
   support both.)

I see three possibilities:

   1.  Compilation units only;

   2.  Continuations only; and

   3.  Compilation units that (may) invoke continuations.

The third is an inclusive interpretation of "both" -- I will argue below
that this is the best choice.  Presumably, the first two could be
implemented in terms of the third?

   The "error-prone" comment has to do with control flow. The effect of  
   the current implementation is that when the interpreter catches an  
   exception, it dumps control flow at the label that was captured in  
   the continuation. Any control flow after that is the responsibility  
   of the developer, and it's easy to get it wrong.

Seems to me that this is unavoidable.  Exceptions are useful mainly
because they allow these drastic changes to normal control flow.
Writers of HLL code are relieved of at least some of this responsibility
by their compiler, but writers of PIR are exposed to the full complexity
of nonlocal transfer of control.

   It might be more helpful if the continuation taken was a return  
   continuation: where to return to if an exception is caught and  
   successfully handled.

I would tend to agree.  But something has to decide which handler gets
to catch the exception, before the continuation is invoked.  So that
would mean dividing the current Parrot notion of exception handler into
a tester sub which can be invoked in the dynamic context of the error,
and the actual "what to do" code, which is reached via the continuation.
Reading ahead, this does seem to be what you have in mind; am I right?

   But is this much of a change really on the table?  I had thought that
PIR-visible semantic changes are frowned on these days?

   >>    =item *
   >>
   >>    C<pushaction> pushes a subroutine object onto the control  
   >> stack. If the
   >>    control stack is unwound due to an exception (or C<popmark>, or
   >>    subroutine return), the subroutine is invoked with an integer  
   >> argument:
   >>    C<0> means a normal return; C<1> means an exception has been  
   >> raised.
   >>    [Seems like there's lots of room for dangerous collisions here.]
   >
   > I'm not sure what you mean by "collisions" here, nor why you think  
   > they would be dangerous.

   Specifically, because the control stack is used for multiple  
   different things, it's easy to get into a situation where the thing  
   you're popping off the stack isn't what you meant to pop off the  
   stack. It's one of the reasons we aren't using stack-based control  
   flow through most of Parrot.

Do you have a specific example of such a situation?  For compiled
languages (AFAIK), the features that use the control stack have
well-defined lexical "enter" and "exit" points, which makes it easy for
a compiler to generate correct code; that's the reasoning behind the
"behaves like a stack" argument below.  Of course, that's not the case
for hand-written PIR, but the only remedy I can think of -- giving each
dynamic construct its own private stack -- seems like it would add a lot
of complexity for (IMO) an obscure benefit.

   > Arguably, C<pushaction> is too simplistic; it
   > doesn't provide for such things as the repeated exit-and-reenter
   > behavior of coroutines, and there is no mechanism to specify a thunk
   > that gets called when *entering* a dynamic context . . .

   That too.

I'm working on mods to actions as part of my (long overdue) dynamic
binding implementation proposal [1].  I think we also need a
C<popaction> for consistency, and should probably support "enter"
actions as well as "exit" actions.

   Another thing that may need clarification is the environment in which
the action runs.  Since actions are kept on the control stack, and since
the current implementation calls them just after they are popped, they
see exactly the dynamic context in effect at C<pushaction> time.  This
is true even when throwing to an outer exception handler.  For instance,
if A calls B calls C calls D, and A pushed EH1, B pushed EH2, C pushed
action C1, and D throws to EH1 in A, the current implementation calls C1
with both EH1 and EH2 still in scope.  I think this is correct;
otherwise, the programmer can't count on the dynamic state of the
cleanup action.  One could make a case that both handlers, or at least
EH2, should be popped first, but this seems wrong.

   >>    =head1 IMPLEMENTATION
   >>
   >>    [I'm not convinced the control stack is the right way to handle
   >>    exceptions. Most of Parrot is based on the continuation-passing  
   >> style of
   >>    control, shouldn't exceptions be based on it too? See bug #38850.]
   >
   > Seems to me there isn't any real choice.  Exception handlers are part
   > of the dynamic context, and dynamic contexts nest in such a way as to
   > behave like a stack.  Even pure CPS implementations that want to
   > maintain dynamic state have to create an explicit stack in a global
   > variable somewhere.

   "dynamic contexts nest in such a way as to behave like a stack" is  
   true, but not necessarily the same thing as storing all exception  
   handlers on a single global stack that's also used for primitive  
   control flow.

By "primitive control flow" do you mean C<bsr/ret>?  I would agree
that's pretty primitive -- and might be better off with its own stack
(see below).  Otherwise, keeping handlers on the same stack with actions
and (some day) temporizations makes it convenient to peel them back in
the right order.  Actions need to be executed in the right dynamic
binding and handler context, for one thing.

   Let's take the example of something that recently came up:  
   asynchronous I/O with exceptions. The current implementation says:  
   push a global exception handler onto the stack, call the routine that  
   might throw an exception, then pop the exception handler off the  
   stack. But with asynchronous I/O, the exception handler is likely to  
   be popped off the stack long before the async call throws an  
   exception. Or, if you delay popping off the exception handler until  
   the async callback is called, then you may have other exception  
   handlers pushed onto the stack in the mean time (possibly exception  
   handlers for other async calls).

Or the original handler may catch something it wasn't supposed to.
Excellent point.

   In theory, the return continuation maintains the state of the  
   caller's control stack, so you can invoke return continuations up the  
   CPS chain until you reach a dynamic context where the exception is  
   handled. But where does control flow go after you handle an exception  
   from an async op?

My kneejerk reaction is that maybe each asynchronous IO operation
requires its own coroutine (or something very like it) so that the user
can set up a different dynamic state than the main line that spawned it.
Off the top of my head, if you yield to the async coro before it's
ready, nothing happens, but if an exception was pending, the exception
happens in the coro environment.

   But this is all very half-baked; the "coro" would look more like a
lightweight thread.  And I'm not even certain how one would want an
asynchronous IO API to look, having never played with one (and having
largely ignored the thread).

   >>    Other opcodes respond to an C<errorson> setting . . .
   >
   > This have-your-cake-and-eat-it-too (HYCAEIT?) strategy sounds good in
   > theory, but may be dangerous in practice.  Which style of error handling
   > a given piece of code uses is a static property of the way the code is
   > written.  On the other hand, C<errorson> is dynamic and global.  If one
   > of the modules you use wants to do error handling by checking return
   > values, but another module doesn't check returns because it expects
   > errors to be signalled, then no C<errorson> setting will satisfy both,
   > regardless of how you want to design *your* code.

   Maybe we need a non-global equivalent of these options.

I was trying to argue that any such option that acts globally would be
too much trouble to support.  A global option might work if it could be
temporized, but temporizing around method calls to objects that want to
handle errors differently would be tedious and error-prone.  It also
requires the programmer to be aware of the "preferred" setting of
C<errorson> for all modules used, which is also error-prone.  Enforcing
a single model, or at the very least having it lexically "compiled in,"
seems much more tractable.

   >    I personally prefer exception-based error handling, since it scales
   > better.  I have been acting on this when the opportunity arises,
   > changing internal_exception calls to real_exception when it makes sense,
   > and when I'm mucking around in that code anyway.  (A good example of
   > this is "No exception to pop", come to think of it.)  It is also helpful
   > to get a backtrace when something fails.

   Backtracing can be enabled without exceptions.

Really?  Even where internal_exception is called?

   >>    =head2 Excerpt
   >>
   >>    [Excerpt from "Perl 6 and Parrot Essentials" to seed discussion.
   >>    Out-of-date in some ways, and in others it was simply  
   >> speculative.]

   For everything below this point, keep in mind that the text was  
   written in 2004.

Sorry; I didn't mean to be pedantic.

   >>    process continues until some exception handler deals with the  
   >> exception
   >>    and returns normally, or until there are no more exception  
   >> handlers on
   >>    the control stack. When the system finds no installed exception  
   >> handlers
   >>    it defaults to a final action, which normally means it prints an
   >>    appropriate message and terminates the program.
   >
   > Currently it also prints a backtrace, which is really nice.  Alas, the
   > backtrace is only from the point of the final rethrow by the oldest
   > (bottommost) exception handler.  This is the greatest weakness with the
   > current Parrot exception-handling design:  By the time you find out that
   > a given exception is unhandled, the dynamic environment of the C<throw>
   > has been destroyed by the very process of searching for a willing
   > handler.  This makes it extremely difficult to write a debugger than
   > can do anything useful about uncaught exceptions.

   Exception handler tracing is a useful feature, and is worth adding if  
   it doesn't cost too much (in terms of implementation complexity,  
   execution speed, etc).

I would agree, but I wasn't just talking about tracing (and tracing
exceptions, rather than handlers).  I was talking about allowing an
interactive debugger, as in "perl -d", to take control at the point
where the uncaught exception is signaled, so that I can figure out why
it wasn't caught.  (For the record, I've never actually needed to use
"perl -d" to debug a Perl 5 program with hairy eval/die logic, but I bet
it's no picnic.)

   However, as I've already hinted, I think a workable solution is
within reach . . .

   >>    When the system installs an exception handler, it creates a return
   >>    continuation with a snapshot of the current interpreter  
   >> context. If
   >
   > This is confusing; I assume you are talking about the Exception_Handler
   > itself and not a RetContinuation.

   In this context, no. It really meant a return continuation.

Hmm.  A RetContinuation recycles the leaving context, but in the case of
an exception, we don't know the identity of the leaving context until
the exception is invoked, which makes it hard to decide whether this is
safe/appropriate.  So, unless I am still misunderstanding you, I don't
think this works with the current codebase (though it ought to work if
the "lightweight RetContinuation" proposal [2] is ever implemented).

   > Hmm.  It seems that an exception is "cleanly caught" only if it is not
   > rethrown.  It is therefore not possible to tell by looking at the
   > exception itself whether or not it is "cleanly caught" or if it is still
   > in the process of being signalled.

I think I now understand how you mean to do this.

   > You seem to want to say that unhandled exceptions are ignored.  Is  
   > that correct?  If so, I see several problems:
   >
   >    1.  What is "the exception handler function" and how is it
   > distinguished from the function that established the exception handler?
   > [It sounds like you are expecting the exception handler to behave more
   > like a closure than a continuation . . . ]

   An "exception handler function" would be an exception handler that is  
   a complete compilation unit rather than just a code segment inside  
   some other compilation unit.

Great; got it.

   >    2.  The previous paragraph says that if "the exception handler just
   > returns", that means that "the exception is cleanly caught".   
   > Unless you
   > want to propose a new mechanism, the only way a handler can decline to
   > handle an exception is by rethrowing it, which precludes the  
   > possibility of resuming.

I now realize that you *were* proposing a new mechanism (new to me in
any case), using an "exception handler function" that "just returns."
So never mind.

   The current prototype implementation doesn't support resumable
   exceptions, it's true . . .

"Prototype"??  That implies a lot more flexibility to change the way
Parrot exceptions work than I had thought would be allowed . . .

   But, resumable exceptions are a useful feature, and one that we
   originally planned for Parrot. Before we throw out the baby with the
   bath water, we need to first look at what it will take to build in
   resumable exceptions. It's possible that an architecture that
   supports resumable exceptions may be a better architecture overall.

I certainly agree that versatile error recovery is a big plus.  In fact,
it's one of the things I like about Common Lisp, in which debuggers
typically present a menu of corrective actions for an unhandled error
along with the error message.

   But thinking about this has made me realize the nature of my problem
with the following statement:

        Exceptions thrown by standard Parrot opcodes (like the one
        thrown by C<find_global> above or by the C<throw> opcode) are
        always resumable, so when the exception handler function returns
        normally it continues execution at the opcode immediately after
        the one that threw the exception.

When I think of "resumable" errors, I think of being able to "skip" and
"retry" as the two main possibilities that apply to most situations,
with "substitute some other value" and possibly other corrective actions
as additional possibilities that depend on the operation.  Right there,
the handler would need to do more than "just return" in order to select
the right possibility.  But these possibilities really apply to
operations that are much higher level than instructions, such as
compiling a file or sending an email.  For the most part, there is no
way for an outside agent to determine whether it is appropriate (or even
safe) to skip or retry an opcode; indeed, that may not even be apparent
to the person who wrote the HLL code from which it was compiled.

   In other words, I think "resuming" makes sense only in terms of
HLL-programmer-defined concepts.  In which case, there may be a whole
slew of restart alternatives that are available in the current dynamic
context, and there need to be mechanisms for finding out what they are,
and invoking a particular one.  If you like (and assuming that you don't
think I'm on the wrong track), I can try to design something for Parrot
based on the Common Lisp model [3].

   In this vein, it occurrs to me that the current design doesn't
specify what other actions a handler is allowed to take.  To quote the
relevant paragraph (the "previous paragraph" mentioned above):

        When the system installs an exception handler, it creates a return
        continuation with a snapshot of the current interpreter context. If
        the exception handler just returns (that is, if the exception is
        cleanly caught) the return continuation restores the control stack
        back to its state when the exception handler was called, cleaning up
        the exception handler and any other changes that were made in the
        process of handling the exception.

To paraphrase, each exception handler function has an associated
continuation.  If the handler "just returns," Parrot invokes the
associated continuation, and the exception is thereby handled.  Have I
got this right?  If so, how does a handler *decline* to handle the
exception?  By rethrowing?  And is it acceptable for the handler to take
other action, e.g. by making a non-local exit via some other
continuation?  Because, besides being useful in its own right, that is
the logical way for a handler to invoke a restart.

   Allow me to propose an answer:

   1.  When an exception handler function is called during a C<throw>,
the handler is allowed to do pretty much anything, with the caveat that
it is running in the dynamic context of the code that is throwing,
modified temporarily such that the handler itself is not bound, i.e. the
handler can resignal the same condition (or another of the same class)
without invoking itself [4].

   2.  If the handler returns, then it has declined to handle the
exception, and Parrot goes on to try the next most recently bound
handler.

   3.  If the handler decides to handle the exception, it does so by
effecting a non-local exit.  This could be by calling a continuation,
presumably to return to some point in the context that bound the
handler, by invoking a restart, or by throwing a new exception.  It may
also make sense to rethrow the same exception, which (for non-fatal
exceptions) gives older handlers a chance to run first, making the inner
handler in effect a default handler.

   4.  If no handler takes up the challenge, then do nothing, continuing
after the signaling instruction in an appropriate way.  Languages that
want some other behavior (such as "exit(255)" or entering a debugger)
must arrange to wrap the necessary handler around their main program.

   Note that the code internal to C<throw> that is invoking the handlers
doesn't even need to know about the continuations that are used; they
would be used directly by the handlers, where presumably they would be
kept in closure variables.

   IMHO, this would be a great improvement; it would solve the debugger
problem discussed above.  Also (though I almost hesitate to mention it
[5]), this is compatible with the Common Lisp "condition" system design
of "signaling" [6], though I've left out a few subtleties.

   On the down side, it makes it more difficult to mark exceptions as
"handled", since the very act of handing them transfers control to
somewhere else.

   >    3.  Shouldn't unhandled exceptions either enter the debugger if
   > interactive, else die?  Ignoring the fact that an opcode failed, like
   > ignoring the fact that anything else failed, seems dangerous . . .
   >
   >      new P10, Exception            # create new Exception object
   >      set P10["_message"], "I die"  # set message attribute
   >      throw P10                     # throw it

   There are different levels of severity in exceptions. Some are  
   necessarily fatal. Some aren't. For example, some languages treat the  
   "end of file" condition as a non-fatal exception.

And other languages will require a that fatal (or at least "serious")
exception be signaled.  In CL, for example, unhandled EOF errors are
defined in such a way as to enter the debugger by default.  Dealing with
this seems to reqire the following:

   1.  Define mechanisms for non-fatal exceptions.  C<throw> could just
fall through to the next instruction, but it might be useful to have one
op that might return if the error is unhandled and another that never
returns [7], for the sake of code optimization.  Then again, maybe this
should depend solely on the exception class.

   2.  Define a "generic" EOF exception which is non-fatal, and arrange
to signal it when an EOF is detected.  If it returns, then the code sets
up the appropriate EOF return value(s).

   3.  Languages that require a fatal EOF bind a handler around the
dynamic scope of their code that intercepts the generic EOF and signals
the right language-appropriate exception.  Such a binding would not be
easy to undo if the "strict EOF language" calls into a "non-strict EOF
language", so it might be better to choose the except class based on the
HLL from the start.

   Which brings up another issue.  The description of C<die> implies
that exception type and severity are separate:

        C<die> throws an exception. It takes two arguments, one for the
        severity of the exception and one for the type of exception.

Shouldn't the severity be defined by the exception class?  Specifically,
by the taxonomy of exception classes?

   >>    Exceptions are designed to work with the Parrot calling  
   >> conventions.
   >>    Since the return addresses of C<bsr> subroutine calls and  
   >> exception
   >>    handlers are both pushed onto the control stack, it's generally  
   >> a bad
   >>    idea to combine the two.
   >
   > How about replacing this with the following:
   >
   >    . . . exception
   >    handlers are both pushed onto the control stack, care must be taken
   >    to nest them properly, i.e. by removing error handlers established
   >    after C<bsr> before the corresponding C<ret>.
   >
   > After all, it works as long as the user plays by the rules.

   We can define any set of rules for exceptions (or calling  
   conventions, or any other Parrot subsystem) and expect users to  
   follow them, but some sets of rules are more prone to user error than  
   others. Our job as designers and implementors is to examine the  
   options and choose the set of rules that is most stable, robust,  
   maintainable, and (as much as possible) user-friendly.

   Allison

All very true.  And C<ret> addresses are unique among the denizens of
the control stack in only pertaining to the context that pushed them;
they don't actually affect the dynamic context of called subs.  So one
could certainly make a case that each context deserves its own
"stacklet" expressly to contain C<ret> addresses.

   Then again, is this worth it?  It does make PIR slightly more
"user-friendly" in this regard, but I can't imagine ever needing
C<bsr/ret> in the first place.

   Sorry it took me so long to get my thoughts together.

                                        -- Bob

[1]  You didn't ask, but there's a draft up at
     http://rgrjr.dyndns.org/perl/dynbind-proposal-v2.html .
     A key work deadline has passed, so I expect to have more time to
     work on it.

[2]  See the "RetContinuation promotion, closures, and context leakage"
     post of Sat, 04 Feb 2006 13:06:46 -0800
     (http://www.mail-archive.com/perl6-internals@perl.org/msg31219.html).

[3]  CL calls them "restarts"; see
     http://www.lispworks.com/documentation/HyperSpec/Body/09_adb.htm if
     you're curious.

[4]  This modification of the dynamic statue may argue in favor of
     putting exception handlers in their own dynamic stack, though.

[5]  I mentioned this in a "Re: [RFC] Dynamic binding patch" post on
     Tue, 3 Jan 2006 23:43:50 -0500 in response to Larry's reply (post 6
     of http://xrl.us/ji2r).  But I got warnocked, so I don't know what
     Larry (or anyone else) thinks.

[6]  http://www.lispworks.com/documentation/HyperSpec/Body/09_ada.htm

[7]  FWIW, CL calls these SIGNAL and ERROR, respectively.

Re: early draft of exceptions PDD

Reply via email to