Hi,
I would like to float a proposal (or three :-), regarding "greenlets" in
Rust.  For those unfamiliar with greenlets, they are a tool for writing
concurrent code, similar to Rust's tasks, but much more light-weight in
terms of memory consumption (especially now that segmented stacks are no
more).

I think there are some scenarios where low memory consumption per-task is
still important, 64-bit address spaces notwithstanding.   A typical one
would be a pub-sub server, which needs to maintain a massive number of
simple I/O workflows, where I/O channels are idle most of the time.

So here we go (in the order of increasing craziness):

1. Recently I've learned how Python
greenlets<http://greenlet.readthedocs.org/>are
implemented <http://stackoverflow.com/a/17447308>.  I believe that the same
approach could work in Rust:

Basically, greenlets are spawned using the same stack as the parent
greenlet, just like a normal function call.  When a greenlet is suspended,
it copies the portion of the stack used up since its' spawning to the
heap.  When one is re-activated, the saved memory is copied back where it
came from (having first saved stack of the previous active greenlet,- if
they overlap).

Since greenlets don't need to save "red zone" of the stack, the amount of
data per instance is precisely what is actually used.

There are also downsides, of course:
- greenlets are bound to the thread that spawned them,
- two memcpy's are needed when switching between them.

In the case of Python, though, there's one further optimization: since
Python's stack frames live on the heap, in most cases, there nothing on the
hardware stack that a greenlet needs saving!   As a bonus, it can now be
resumed at any stack position, so no saving of previous greenlet's stack is
needed.  The only time when a full save occurs is when there are foreign
stack frames on the stack.


2. Well, can Rust do the same?   What if we came up with an attribute, say,
[#stackless], which causes a function to allocate it's stack frame on the
heap and put all local vars there?    The only things on the actual
hardware stack would then be the function's arguments, the return address,
the saved base pointer and the pointer to that heap-alllocated frame.
With the exception of base pointers, all these things are
position-independent, I believe.   And base pointer chain can be easily
fixed up if/when stack is moved.

So if we had that, and the whole greenlet's stack consisted of such
functions, and there was a way for the "switch_to_greenlet()" function to
detect that, then such greenlet's stack would be relocatable and could be
resumed at any position in the thread's stack (or even in another thread!)
with minimal memory copying, just like in Python.

Of course, the [#stackless] functions would be slower than the normal ones,
but in the scenario I've outlined in the beginning, it shouldn't be a
problem.


3.  Unfortunately, in order for the above scheme to work, all I/O methods,
(which are typically where yields happen), would need to be marked as
[#stackless]...  This would affect the performance of "normal" code using
the same API, which is undesirable.

Okay, but usually there are not *that *many things that point into the
stack in a typical program.  I can think of only three things: references
to stack-allocated buffers, base pointer chains and references to
caller-allocated return values.
- The first one can be lived without - just allocate buffers on the heap.
- The second one - see above.
- The last one is more tricky, but for the sake of argument, let's assume
that we restricted function signatures such that only register-allocated
types can be returned.

Let's say we came up with a way to mark up functions that may yield to
another greenlet, and also with a way to prohibit taking address of
stack-allocated variables for the duration of calls to yielding functions.
These restrictions would be annoying, but not overly so, as long as you had
to obey them only in functions that are intended to be run in a greenlet.
On the plus side, the hardware stack contents would now be relocatable.

In this setup, everything could proceed as usual, using the hardware stack,
until execution came to a  blocking I/O call.   At that point, the
scheduler would check if running inside a greenlet, copy greenlet's stack
to the heap and switch off to another greenlet.  Otherwise, it would do the
same thing it does now, i.e. switch tasks.


So, what do you say, rustafarians?   Does any of that make any sense?

Vadim


On Tue, Nov 5, 2013 at 9:18 AM, Patrick Walton <[email protected]> wrote:

> On 11/5/13 8:32 AM, David Piepgrass wrote:
>
>> Segmented stacks aren't the only solution though.
>>
>> If the concern is many tasks that block for a long time, I imagine a
>> mechanism to bundle a bunch of small, dormant stacks into a single page
>> so that the original pages could be released to the OS.
>>
>>
>>
>> *If stacks were additionally relocatable (which requires similar
>> machinery as precise moving GC, if I'm not mistaken) *
>
>
> This is correct. It's conceivable (although I can't make any promises)
> that if and when LLVM supports this, we could experiment with doing what Go
> does.
>
> Patrick
> _______________________________________________
> Rust-dev mailing list
> [email protected]
> https://mail.mozilla.org/listinfo/rust-dev
>
_______________________________________________
Rust-dev mailing list
[email protected]
https://mail.mozilla.org/listinfo/rust-dev

Reply via email to