This is an attempt to summarize my thinking about the instruction interface to dynamic binding and its interaction with the other dynamically-scoped bits of Parrot. I am hoping to get feedback before diving further into the implementation details.
Please let me know what you think. TIA, -- Bob Rogers http://rgrjr.dyndns.org/ ------------------------------------------------------------------------ 0. Table of contents. 1. The control stack is really the dynamic stack. 2. There ought to be a popaction instruction. 3. Dynamic binding state also belongs on the control stack. 4. Dynamic binding needs bind_global, bind_location, and unbind_n ops. 5. Implementation should be done in phases. 6. Notes. 1. The control stack is really the dynamic stack. Currently, the control stack is used to store the following things: 1. Exception handlers. These are manipulated by the push_eh and clear_eh instructions. 2. Cleanup actions. These are pushed by pushaction, but there is currently no way to pop these, save indirectly via popmark (or returning from the sub). 3. Stack marks. These are manipulated by pushmark and popmark, and allow multiple control stack entries to be removed at once. 4. Local return addresses. These are pushed by the bsr and jsr instructions, and popped by ret. Of these operations, the language features that require the first two are lexically determined. That is, they are determined by HLL constructs that are lexical features of the program [1], despite having dynamic scope, and hence the start and end of these features' lifetimes can be precisely located in the program text. We always know when and what we have to pop. Furthermore, and particularly in the case of error handlers and return addresses, all four are meaningful only while that calling context is active [2], and so must necessarily be popped when the sub returns. Note that bsr/jsr don't quite fit this model in principal. However, they are meaningful only within a single context, and it isn't possible to bsr to a place that does a push_eh and returns, so in practice bsr/jsr must be used as if they were lexically determined. 2. There ought to be a popaction instruction. As a related issue, it is something of an annoyance that there is no way to pop an action other than to push/pop a mark. True, having a popaction instruction is not strictly necessary, but by the same token, neither is clear_eh. The pushaction instruction is useful for implementing 'leave' blocks. The following Perl6 code: { ... leave { do_some_cleanup($lexical_reference); } } would compile into something like cleanup = newclosure cleanup_sub pushmark 42 pushaction cleanup ... popmark 42 The pushmark/popmark could only be omitted if the block was in tail position, i.e. it returned immediately. If there were a popaction instruction, on the other hand, it could look like this: cleanup = newclosure cleanup_sub pushaction cleanup ... popaction 1 This is one instruction smaller, takes up slightly less control stack space, but most usefully allows us to pass a boolean flag to popaction that tells whether to suppress automatic execution of the action. With this, since we still have the closure lying around, we can also decide to call an action differently for normal exit [3]: ... popaction 0 cleanup(other, args) Consequently, I would like to suggest the following changes: =item B<pushaction>(in PMC) Push the given Sub PMC $1 onto the control stack. If the control stack is unwound due to a C<popmark> or normal subroutine return, the subroutine will be invoked with a single integer argument of 0. If the control stack is unwound due to an exception, the subroutine will be invoked with a single integer argument of 1. An action on the top of the control stack can also be removed explicitly via the C<popaction> instruction, which takes an argument that specifies whether or not to invoke the action. =item B<popaction>(in INT) Pops the action at the top of the control stack. The boolean argument $1 tells whether the action sub should be invoked; if true, the action is invoked with a single integer argument of 0 (to denote normal return), and if $1 is false, the action is discarded without being invoked. An exception is raised if the top item on the control stack is something other than an action. [The change to the pushaction description is mostly just clarification of existing semantics.] Interestingly, if pushaction is implemented, then stack marking becomes strictly unnecessary, as the control stack could always be popped explicitly; there is no other Parrot feature that strictly requires popmark [4]. However, it ought to be more efficient to use a single popmark at the end of a block to get rid of three or more dynamic state entries. Since there may be lots and lots of these little things at the end of any given Perl6 block, it seems worth keeping these ops. 3. Dynamic binding state also belongs on the control stack. To the list of uses for the control stack, I would like to add dynamic binding. I think this is a natural place to store dynamic binding state because (a) dynamic binding is also lexically determined by HLL syntax (AFAIK), and (b) having a single stack for all of the dynamically-scoped features that are affected by rezipping greatly simplifies the implementation. Note that this does increase the level of constraint on the dynamic binding stack, i.e. you can't "ret" or "clear_eh" if there are dynamic bindings in the way. I hope (and expect) that nobody will care. 4. Dynamic binding needs bind_global, bind_location, and unbind_n ops. I would therefore like to propose the following instruction interface to dynamic binding. In a nutshell, this adds (a) a bind_location instruction that takes an explicit location object and a new value and establishes the binding on the control stack; (b) a bind_global instruction with two variants that handles the important special case of global variables by creating a VariableLocation object and binding its PMC arg to that; and (c) an unbind_n instruction (corresponding to unbind_globals in the patch posted 30-Dec-05) that explicitly pops a specified number of either kind of dynamic binding. =item B<bind_global>(in STR, in PMC) Bind the PMC $2 as the value of the global symbol $1 in the current dynamic context. If $2 is a Null PMC, then the global is effectively made locally unbound. The newly-created dynamic binding will be used by C<find_global> and C<store_global> in the current dynamic environment only, i.e. this call and all calls made from it. The lifetime of a dynamic binding lasts until either (a) it is popped explicitly by C<unbind_n> or C<popmark>; or (b) control exits from the context where the binding was made. Note that the second case includes tail calling; all dynamic bindings in the current context are undone before the tail-called sub starts execution. Note that there is no C<bind_global_p_s_p> op (i.e. corresponding to C<store_global_p_s_p>, where the first "p" is a namespace), as dynamic binding only makes sense with respect to an execution context. =item B<bind_global>(in STR, in STR, in PMC) Bind the PMC $3 as the value of the symbol $2 of namespace $1 in the current dynamic context. The binding is created whether or not namespace $1 exists already; if it does not, binding to a symbol in it does not actually create the namespace. =item B<bind_location>(in PMC, in PMC) Given a location PMC in $1 (i.e. something derived from the C<Location> class), bind it dynamically to the value in $2, with identical scope and lifetime as for C<bind_global>. If $2 is a Null PMC, then the global is effectively made locally unbound, if that is supported by the location. During the dynamic lifetime of the binding and within the dynamic scope of the binding sub, this location will appear to have a different value than outside the dynamic scope (e.g. in coroutines created before the binding), though that value may change during the lifetime if the location is modified by other means. See C<Location> and its derived classes for specifics. =item B<unbind_n>(in INT) Pop zero or more dynamic bindings for symbols or locations from the control stack, with the count specified as $1, restoring their original values. There must be at least $1 bindings on the top of the control stack, or an exception is raised before any of the bindings are popped. Note that an explicit C<unbind_n> is not always needed, as all of a sub's dynamic bindings are automatically undone when the sub returns (see C<bind_global> for details). However, C<unbind_n> is useful when the dynamic binding lifetime ends before the exit from the sub (but see also C<popmark>). This also leaves room for a bind_global op with an "(in PMC, in PMC)" signature where $1 is a symbol table object, in case Parrot ever defines such a thing. 5. Implementation should be done in phases. Here's an outline of subsequent work, which also serves as a summary of where I need feedback: 1. If the popaction proposal is acceptable, implement that. This is orthogonal to dynamic binding, so it could be skipped, but logically it seems to belong as part of the control stack semantic cleanup. 2. If the instruction interface to dynamic binding is acceptable, finish the detailed design. This a matter of defining what a Location object is [5], and how rezipping works in detail, but that's not trivial. 3. Once the final design is accepted, implement rezipping as it applies to current uses of the control stack, i.e. without dynamic bindings. This is worthwhile on its own, as it should also fix some current bugs relating to incorrect rezipping. 4. If all goes well, implement dynamic binding. This could even be broken further into two stages, one for implementing Location and GlobalLocation (to handle variable binding), and a separate stage for StructureLocation (to handle binding of arbitrary arrays or hashes). 6. Notes. [1] I would be very interested to hear of exceptions [6]. Tcl, perhaps? [2] Stack marking shares this constraint only because it operates on the control stack. One could design a marking mechanism for the user stack that didn't have this limitation. In fact, there is a bug in popmark that is equivalent to the one I found yesterday in clear_eh, because popmark doesn't properly limit itself to its own context. I was expecting to argue that pushmark/popmark should be eliminated instead of fixed, but I've changed my mind, though the fix should still be part of the rezipping implementation. [3] Admittedly, this is not strictly necessary. In order to be useful, the action almost has to be a closure, so one can always change the behavior of the action by tweaking a lexical variable. This also allows one to disable execution just by returning immediately, but it seems cleaner to say what you mean directly. [4] This is not strictly true; you could implement a sort of a "PASM longjmp" by doing a popmark (to get rid of intervening bsr return addresses) followed by a goto -- but would you really want to? [5] I imagine Location objects will also be useful for representing lvalues internally in Perl6, for cases where they need to be created in one place and stored into in another, so I should at least survey the field in order to ensure that the model is adequate. [6] Pun intended.