[REBOL] Re: About CONTINUATIONS

Holger Kruse Sun, 24 Feb 2002 19:32:10 -0800

On Sun, Feb 24, 2002 at 03:27:39PM -0500, [EMAIL PROTECTED] wrote:
> Whilst continuations are a tricky concept to come to terms with initially and joe 
>average user won't bother to learn about them it doesn't mean that they are not a 
>desirable language feature. In my opinion they ARE definitely on a list of desirable 
>language features.
> Do I wish they were still in REBOL? probably YES.


Ok, since you are not going to stop complaining until someone from RT makes a
statement, I hereby do so :-).

Continuations were not "removed" by us. They became a victim of a complete
reimplementation of REBOL, by necessity. There was never an option of keeping
continuations and still advance REBOL in the way it has happened, and there is
still no such option.

In order to understand how continuations fit into a language interpreter we
first need to examine the different ways an interpreter can be constructed.
There are three fundamental ways of doing this:

1) Using a Stack Machine (SM). This is what REBOL 2.x does. In a stack
machine-based interpreter the calling path through the interpreted program
somewhat corresponds to the calling path within the interpreter, i.e. whenever
a function is called in the program, the interpreter also calls a function
within the interpreter, to evaluate the function. Similarly for expressions
etc. Recursive interpretation is handled by recursively calling interpreter
functions, i.e. such functions have to be reentrant.

This means that in an SM-based interpreter a significant portion of the "state"
of the interpreter is kept implicitly, on the stack or in local variables, CPU
registers etc., i.e. it is not available at the language level. Because of that
continuations (which require all state information to be explicit, to allow it
to be manipulated at the language level) cannot be implemented in a purely
SM-based interpreter.

2) Using a Finite State Machine (FSM). This is what REBOL 1.x did. In an
FSM-based interpreter all interpreter state is stored explicitly. The CPU stack
is hardly used at all, and the interpreter does not recurse on itself.  Rather
recursive evaluation is performed by extending the size of the state structure.
This can allow the interpreter stack to become a first-class datatype at the
language level.

3) A combination. See below.


The distinction between FSM-based and SM-based systems is not unique to
interpreters. You will find a similar distinction in other places.  For example
in an application (top-down or event-driven, i.e. bottom-up), and in different
pieces of operating systems (filesystem, network stack, GUI engine etc.), so
anyone who has some non-trivial experience in software development should be
familiar with the significance of this (and the problems that occur when both
models "clash" in some way).


So what are the advantages and disadvantages of both methods, when applied to
interpreters ?

FSM-based interpreters tend to be more flexible. Some features, such as
continuations, are only available in FSM-based interpreters, because they
require explicit access to the interpreter state. Other features, such as
threading and coroutines are much easier to implement in FSM-based
interpreters.

FSM-based interpreters integrate better with environments which are FSM-based
as well. For instance if the OS is FSM-based, such as PalmOS, then having an
FSM-based interpreter tends to allow better integration, smoother user
interaction etc.

On the other hand SM-based interpreters integrate better with operating systems
that are SM-based as well, which pretty much covers all modern operating
systems, with the exception of parts of the GUI engine. In particular operating
systems which rely heavily on a "top down with callbacks" model, like Windows,
integrate better with SM-based interpreters, if those callbacks are supposed to
be exposed at the language level in some way.

FSM-based interpreters usually cannot adequately expose OS features that are
SM-based at the language level. For instance, REBOL exposes OS library calls in
/Pro and /Command, with callbacks in the soon-to-be-released new View/Pro and
Command versions. This would be pretty much impossible to do with an FSM-based
environment. To understand why, just imagine that within the callback a new
continuation chain branches off, the main branch returns, through the OS
function, and then the program decides to continue within the continuation, and
finally return through the OS function again. Obviously that is impossible,
because the stack frame of the OS function no longer exists. A large number of
features of contemporary operating systems rely on a stack frame, in one way or
another, i.e. exposing those features at the language level is only possible by
interpreters which correlate stack frame to interpreter state. An FSM-based
interpreter does not do that, so it cannot expose those features.

FSM-based interpreters are inherently slower and usually require more memory: C
compilers, CPUs etc. have been designed with SMs in mind and highly optimize
SM-related operations, which are for the most part standardized for each
platform these days (stack frames, calling conventions etc.). On the other hand
FSM designs always require explicit FSM code, more elaborate memory management
etc., and cannot make use of the existing resources as well. For instance where
an SM-based interpreter can implicitly store the result of a condition in a CPU
flag and act on it immediately (conditional jump), an FSM-based interpreter
needs to allocate a data structure for it and store the result as a flag, then
loop back in the interpreter, examine the state, and finally examine the flag,
remove it from the state, and act on it. That is an order of magniture more
work.

FSM-based interpreters are usually much more difficult to write and maintain,
just like event-driven programs tend to be more difficult to work with than
procedural programs (with few exceptions, like class-based GUI engines). That
is probably what Joe Marshall's comments regarding qualifications and
requirements for developers comes from. The side effect of this is that
FSM-based interpreters tend to be less agile: development and testing take more
planning and effort. Changes happen more slowly.

3) Finally, here is that third method I mentioned above: It is possible for an
interpreter to use a "hybrid" method of FSM and SM, by using a stack machine
most of the time, and capturing pending information within an on-the-fly
continuation structure. The main advantage of this method is that most of the
time, performance is about as good as you can expect from an SM-based
interpreter.  The problem is that this is usually not practical, at least not
in non-trivial interpreters. That's because there are only two ways of doing
it: explicitly, by unrolling and recreating function calls, or by "hacking it
in", using assembly language stubs that analyze and synthesize stack frames.
Even though some languages have successfully used both methods in some limited
scope and on selected platforms, they are not suitable for REBOL.  Explicit
unrolling is infeasible because it would have to be supported in too many
places. REBOL/Core has around 100 natives. Add to that actions and ops, and it
becomes several hundred. Adding code to serialize the state to all of those
functions would take months to implement and probably years to test. The "hack"
solution is easier to do on some platforms, but very platform-specific, i.e.
not suitable for REBOL. Besides, even though continuation could be supported
either way, this would still not solve the problem that continuations are
fundamentally incompatible with stack-based OS features, such as library calls
with callbacks.

> Continuations make loads of intersting and advanced things possible without 
>detracting anything away from the user / programmer who doesn't use or understand 
>them.

Yes, and they also make loads of interesting and advanced things IMpossible.
In an error!-based error handling scheme such as REBOL it is possible to safely
handle error conditions even with libraries and callbacks, and with
event-driven programs. With continuations this would be impossible. You would
end up with an environment which offers multiple features, but without the
ability to use them in combination. From a modularity point of view that is
much worse than offering a smaller set of features with full orthogonality.

> Scheme and Common Lisp both have them and so does Stackless Python and ML. There is 
>loads of interesting Comp Sci literature on continuations and they are a well 
>understood and used feature for advanced programming in the above languages, 
>especially in the field of modelling concurrency and parallel programming.

Yes, that's the point, really. Continuations are a somewhat theoretical
concept.  They severely clash with how operating systems and runtime systems
are designed. This is why they are only implemented in languages which are
academic in nature. When a language matures to the point that it becomes
interesting beyond the academic realm, other considerations, such as
performance, integration with existing environments etc. become more important.
This kind of maturing is exactly what happened in the transition from REBOL 1.x
to 2.x.

> From what I can see they didn't fully explore the continuation model to it's full 
>extent, Joe Marshall states as such. Iam sure there is a LOT they could have learned 
>from advanced Scheme & ML implementations which STILL have these features. These 
>languages are not SO fundamentally different from REBOL which is at it's core a 
>prefix functional language.

That's completely besides the point. Continuations have nothing to do with the
language model, only with the interpreter model.

> This is all theoretical but raises some interesting issues regarding language design 
>and implementation.

Yes, exactly. Implementation more than design. The only aspect interesting for
design is whether continuations should be first class.

> I was a bit disturbed that Joe Marshall stated that with his implementation "nobody 
>else in the company could extend the interpreter.." SURELY NOT? What about Carl 
>Sassenrath?
> Also "You need a degree in Comp. Sci from a few select places to really understand 
>all this.." IS THAT A PROBLEM? Is there anything wrong with that?

Don't take these comments too literally :-). Joe was just trying to make a
point. The current interpreter is an order of magnitude easier to maintain,
extend and test.

> The solutions *ARE* there if you take the time to learn and understand them.

Sort of. Solutions are not universal. They only apply to certain domains. For
instance if you sacrifice OS interaction, or graphics, or networking or certain
other things, then this may allow you to make certain implementation changes
that allow the implementation of other features. Java demonstrates this nicely:
it has one interesting and important feature: operational safety and isolation
of the virtual machine from its environment. It pays a high price though: no
pointers, no unions, no arrays of structures, very limited casting, extremely
high numbers of run-time tests, and thus overall bad performance. Some say, in
retrospect, that this was too high a price.

What this illustrates is that just because a solution is theoretically possible
and even has been shown to work well in one domain does not mean that is is
even applicable in another one, and some choices are mutually exclusive. That's
why there IS more than one language you can choose from. Keeping continuations
in REBOL would have prevented REBOL from entering certain domains. For instance
/View would have been impossible, for performance reasons. There is the
library/callback issue. There is agility, integration with existing software
packages and libraries. There are other issues...


And to answer the inevitable second question, regarding tail recursion
optimization (or more generally, tail call optimization):

First of all, we need to distinguish between the interpreter and compiler case,
because the way they deal with tail call optimization is quite different:

In a compiler the optimization is not performed at run time, but at compile
time, and it involves unrolling the stack frame before making the final call.
Sometimes that is as easy as replacing "jsr...; rts" with "jmp", but arguments
make it more tricky. In any case, there is generally NO reason NOT to do it.
The only price you pay is, perhaps, slightly longer compilation time. The
resulting code is usually shorter, quicker and needs less memory, i.e. it is
"better".

In an interpreter the situation is different. The interpreter needs to
determine at run time whether a call is a "tail call" and allows optimization.
Because of that there is a trade-off: the benefit of successful tail call
optimization has to be weighed against the additional effort to determine,
dynamically, whether such optimization is possible. If that determination on
the average takes up more CPU time or memory than the actual optimization
saves, then such an optimization should not be implemented. Compilers obviously
do not have that problem.

I do not have any hard data regarding that particular trade-off, but my
impression is that in "normal" programs (i.e. programs that were not written
particularly to prove a point, demonstrate a concept etc., in the academic
domain), it is very rare to have tail calls that benefit from optimization,
so I tend to suspect that adding tail call optimization may, in most cases,
have a detrimental effect.

It would certainly be possible to add tail call optimization back in, although
it is somewhat more difficult to do in an SM-based interpreter than an
FSM-based interpreter (because it is not enough to just prune the state
structure -- the stack has to be unrolled, too). However I could think of
probably dozens of feature requests for REBOL that I would consider more
important, benefitting more users, and resulting in more significant
improvements, so don't count on tail call optimization to be very high on our
priority list :-).

-- 
Holger Kruse
[EMAIL PROTECTED]
-- 
To unsubscribe from this list, please send an email to
[EMAIL PROTECTED] with "unsubscribe" in the 
subject, without the quotes.

[REBOL] Re: About CONTINUATIONS

Reply via email to