[RFC] Dynamic binding design, part I: Interface

Bob Rogers Sat, 07 Jan 2006 17:36:59 -0800

   This is an attempt to summarize my thinking about the instruction
interface to dynamic binding and its interaction with the other
dynamically-scoped bits of Parrot.  I am hoping to get feedback before
diving further into the implementation details.


   Please let me know what you think.  TIA,

                                        -- Bob Rogers
                                           http://rgrjr.dyndns.org/

------------------------------------------------------------------------

0.  Table of contents.

   1.  The control stack is really the dynamic stack.
   2.  There ought to be a popaction instruction.
   3.  Dynamic binding state also belongs on the control stack.
   4.  Dynamic binding needs bind_global, bind_location, and unbind_n ops.
   5.  Implementation should be done in phases.
   6.  Notes.

1.  The control stack is really the dynamic stack.

   Currently, the control stack is used to store the following things:

   1.  Exception handlers.  These are manipulated by the push_eh and
clear_eh instructions.

   2.  Cleanup actions.  These are pushed by pushaction, but there is
currently no way to pop these, save indirectly via popmark (or returning
from the sub).

   3.  Stack marks.  These are manipulated by pushmark and popmark, and
allow multiple control stack entries to be removed at once.

   4.  Local return addresses.  These are pushed by the bsr and jsr
instructions, and popped by ret.

Of these operations, the language features that require the first two
are lexically determined.  That is, they are determined by HLL
constructs that are lexical features of the program [1], despite having
dynamic scope, and hence the start and end of these features' lifetimes
can be precisely located in the program text.  We always know when and
what we have to pop.

   Furthermore, and particularly in the case of error handlers and
return addresses, all four are meaningful only while that calling
context is active [2], and so must necessarily be popped when the sub
returns.

   Note that bsr/jsr don't quite fit this model in principal.  However,
they are meaningful only within a single context, and it isn't possible
to bsr to a place that does a push_eh and returns, so in practice
bsr/jsr must be used as if they were lexically determined.

2.  There ought to be a popaction instruction.

   As a related issue, it is something of an annoyance that there is no
way to pop an action other than to push/pop a mark.  True, having a
popaction instruction is not strictly necessary, but by the same token,
neither is clear_eh.

   The pushaction instruction is useful for implementing 'leave' blocks.
The following Perl6 code:

        {
            ...
            leave { do_some_cleanup($lexical_reference); }
        }

would compile into something like

        cleanup = newclosure cleanup_sub
        pushmark 42
        pushaction cleanup
        ...
        popmark 42

The pushmark/popmark could only be omitted if the block was in tail
position, i.e. it returned immediately.  If there were a popaction
instruction, on the other hand, it could look like this:

        cleanup = newclosure cleanup_sub
        pushaction cleanup
        ...
        popaction 1

This is one instruction smaller, takes up slightly less control stack
space, but most usefully allows us to pass a boolean flag to popaction
that tells whether to suppress automatic execution of the action.  With
this, since we still have the closure lying around, we can also decide
to call an action differently for normal exit [3]:

        ...
        popaction 0
        cleanup(other, args)

Consequently, I would like to suggest the following changes:

    =item B<pushaction>(in PMC)

    Push the given Sub PMC $1 onto the control stack.  If the control stack
    is unwound due to a C<popmark> or normal subroutine return, the
    subroutine will be invoked with a single integer argument of 0.
    If the control stack is unwound due to an exception, the
    subroutine will be invoked with a single integer argument of 1.
    An action on the top of the control stack can also be removed
    explicitly via the C<popaction> instruction, which takes an argument
    that specifies whether or not to invoke the action.

    =item B<popaction>(in INT)

    Pops the action at the top of the control stack.  The boolean
    argument $1 tells whether the action sub should be invoked; if true,
    the action is invoked with a single integer argument of 0 (to denote
    normal return), and if $1 is false, the action is discarded without
    being invoked.  An exception is raised if the top item on the
    control stack is something other than an action.

[The change to the pushaction description is mostly just clarification
of existing semantics.]

   Interestingly, if pushaction is implemented, then stack marking
becomes strictly unnecessary, as the control stack could always be
popped explicitly; there is no other Parrot feature that strictly
requires popmark [4].  However, it ought to be more efficient to use a
single popmark at the end of a block to get rid of three or more dynamic
state entries.  Since there may be lots and lots of these little things
at the end of any given Perl6 block, it seems worth keeping these ops.

3.  Dynamic binding state also belongs on the control stack.

   To the list of uses for the control stack, I would like to add
dynamic binding.  I think this is a natural place to store dynamic
binding state because (a) dynamic binding is also lexically determined
by HLL syntax (AFAIK), and (b) having a single stack for all of the
dynamically-scoped features that are affected by rezipping greatly
simplifies the implementation.

   Note that this does increase the level of constraint on the dynamic
binding stack, i.e. you can't "ret" or "clear_eh" if there are dynamic
bindings in the way.  I hope (and expect) that nobody will care.

4.  Dynamic binding needs bind_global, bind_location, and unbind_n ops.

   I would therefore like to propose the following instruction interface
to dynamic binding.  In a nutshell, this adds (a) a bind_location
instruction that takes an explicit location object and a new value and
establishes the binding on the control stack; (b) a bind_global
instruction with two variants that handles the important special case of
global variables by creating a VariableLocation object and binding its
PMC arg to that; and (c) an unbind_n instruction (corresponding to
unbind_globals in the patch posted 30-Dec-05) that explicitly pops a
specified number of either kind of dynamic binding.

    =item B<bind_global>(in STR, in PMC)

    Bind the PMC $2 as the value of the global symbol $1 in the current
    dynamic context.  If $2 is a Null PMC, then the global is effectively
    made locally unbound.  The newly-created dynamic binding will be used
    by C<find_global> and C<store_global> in the current dynamic
    environment only, i.e. this call and all calls made from it.

    The lifetime of a dynamic binding lasts until either (a) it is
    popped explicitly by C<unbind_n> or C<popmark>; or (b) control exits
    from the context where the binding was made.  Note that the second
    case includes tail calling; all dynamic bindings in the current
    context are undone before the tail-called sub starts execution.

    Note that there is no C<bind_global_p_s_p> op (i.e. corresponding to
    C<store_global_p_s_p>, where the first "p" is a namespace), as dynamic
    binding only makes sense with respect to an execution context.

    =item B<bind_global>(in STR, in STR, in PMC)

    Bind the PMC $3 as the value of the symbol $2 of namespace $1 in the
    current dynamic context.  The binding is created whether or not
    namespace $1 exists already; if it does not, binding to a symbol in
    it does not actually create the namespace.

    =item B<bind_location>(in PMC, in PMC)

    Given a location PMC in $1 (i.e. something derived from the
    C<Location> class), bind it dynamically to the value in $2, with
    identical scope and lifetime as for C<bind_global>.  If $2 is a Null
    PMC, then the global is effectively made locally unbound, if that is
    supported by the location.  During the dynamic lifetime of the
    binding and within the dynamic scope of the binding sub, this
    location will appear to have a different value than outside the
    dynamic scope (e.g. in coroutines created before the binding),
    though that value may change during the lifetime if the location is
    modified by other means.  See C<Location> and its derived classes
    for specifics.

    =item B<unbind_n>(in INT)

    Pop zero or more dynamic bindings for symbols or locations from the
    control stack, with the count specified as $1, restoring their
    original values.  There must be at least $1 bindings on the top of
    the control stack, or an exception is raised before any of the
    bindings are popped.

    Note that an explicit C<unbind_n> is not always needed, as all of a
    sub's dynamic bindings are automatically undone when the sub returns
    (see C<bind_global> for details).  However, C<unbind_n> is useful
    when the dynamic binding lifetime ends before the exit from the sub
    (but see also C<popmark>).

This also leaves room for a bind_global op with an "(in PMC, in PMC)"
signature where $1 is a symbol table object, in case Parrot ever defines
such a thing.

5.  Implementation should be done in phases.

   Here's an outline of subsequent work, which also serves as a summary
of where I need feedback:

   1.  If the popaction proposal is acceptable, implement that.  This is
orthogonal to dynamic binding, so it could be skipped, but logically it
seems to belong as part of the control stack semantic cleanup.

   2.  If the instruction interface to dynamic binding is acceptable,
finish the detailed design.  This a matter of defining what a Location
object is [5], and how rezipping works in detail, but that's not
trivial.

   3.  Once the final design is accepted, implement rezipping as it
applies to current uses of the control stack, i.e. without dynamic
bindings.  This is worthwhile on its own, as it should also fix some
current bugs relating to incorrect rezipping.

   4.  If all goes well, implement dynamic binding.  This could even be
broken further into two stages, one for implementing Location and
GlobalLocation (to handle variable binding), and a separate stage for
StructureLocation (to handle binding of arbitrary arrays or hashes).

6.  Notes.

[1]  I would be very interested to hear of exceptions [6].  Tcl,
     perhaps?

[2]  Stack marking shares this constraint only because it operates on
     the control stack.  One could design a marking mechanism for the
     user stack that didn't have this limitation.  In fact, there is a
     bug in popmark that is equivalent to the one I found yesterday in
     clear_eh, because popmark doesn't properly limit itself to its own
     context.  I was expecting to argue that pushmark/popmark should be
     eliminated instead of fixed, but I've changed my mind, though the
     fix should still be part of the rezipping implementation.

[3]  Admittedly, this is not strictly necessary.  In order to be useful,
     the action almost has to be a closure, so one can always change the
     behavior of the action by tweaking a lexical variable.  This also
     allows one to disable execution just by returning immediately, but
     it seems cleaner to say what you mean directly.

[4]  This is not strictly true; you could implement a sort of a "PASM
     longjmp" by doing a popmark (to get rid of intervening bsr return
     addresses) followed by a goto -- but would you really want to?

[5]  I imagine Location objects will also be useful for representing
     lvalues internally in Perl6, for cases where they need to be
     created in one place and stored into in another, so I should at
     least survey the field in order to ensure that the model is
     adequate.

[6]  Pun intended.

[RFC] Dynamic binding design, part I: Interface

Reply via email to