Well the problem as I understand it is that you want to transfer control
without pushing anything onto the stack.

Currently you generate one big function and use goto to move between
states.  However, these big functions aren't handled well by C++ compilers
that expect functions to be small and use mostly structural control flow.

Trampoling means that control flow is implemented by returning a
continuation closure in order to transfer control.  This approach of
returning a closure can be used instead of "goto" and you'll have many
small functions instead of one giant one.

The cost of returning a closure and then calling it is more than the cost
of a direct goto.  Replacing each and every goto you currently have with a
"return / call" sequence could be very costly in space and time.

So, I thought perhaps there's a hybrid approach where you use the current
"goto" approach until the function gets "too big" - then you can split it
and have some goto's return a continuation closure that is automatically
called by the parent to transfer control to another function.

There are some C and C++ fiber libraries out there that use different
approaches than a flat stack, such as stack copying or the "Cheney on the
MTA" thing where it just uses the stack as the first generation of the
collector and never pops the stack until a collection is triggered.

Unless you are very concerned about supporting 32-bit architectures, it may
be wise to forget about fibers for the time being.  64-bit is the way of
the future, and it's not a big problem to support tens of thousands of OS
threads on a single machine.  A very simple test on my machine shows I can
start up 160,000 threads without difficulty in a few seconds.  The actual
work being done by those threads would probably dominate their resource
usage in most cases.

For those few cases that this approach can't handle (scientific computing
simulations or some video games or whatever) you really need a real-world
test case to figure out if your proposed system is actually going to solve
the problem.  Otherwise you're designing something with no way of testing
the effectiveness of it.  You can easily produce a system that fails to
meet the needs of the intended audiences of that feature while also making
every other user of the system pay a price for it in terms of compile time,
language semantics, etc..






On Tue, Nov 27, 2012 at 6:37 AM, john skaller <skal...@users.sourceforge.net
> wrote:

>
> On 27/11/2012, at 2:17 PM, Dobes Vandermeer wrote:
>
> > One interesting approach to stack handling they suggested in the "Cheney
> on the MTA" paper was using "alloca" with a negative parameter to
> *deallocate* stack space.  In their case it was relatively safe because
> they know that all the stack frames are dead, but in your case you might be
> able to make a similar assertion, perhaps for some functions you'd know
> that the current stack frame is always dead and you will always use values
> from the "parent" by passing a pointer to all the desired state into that
> function.
> >
> > Another portable approach to tail calls in C/C++ is trampolining, but
> that is costly in runtime performance because a trampolined call is far
> more expensive than a regular call.
> >
> > A hybrid approach with trampolining might be OK, though - perhaps
> procedures can be "sharded" into smaller functions and the trampolining
> technique is used to move between shards.  Then maybe most calls don't have
> to use the trampolines.  Procedures that never yield control could be
> converted into regular functions, too, which might alleviate the problem a
> bit.
>
> I'm not sure I fully understand. The problem is that logically to exchange
> control
> one needs to swap stacks. Felix does this the easy way: by requiring the
> stacks
> to be empty, stack swapping is a NOP.
>
> In C, you might get away with simply assembler hacks to swap stacks.
> In C++, with exception handling and the like, you probably can't.
>
> The only portable way in C/C++ to swap stacks is to use threads.
> The operating system then swaps the stacks for you. This is perfectly
> good logically, it's easy enough to "lock out pre-emptions" and force
> swapping to go the way you want.
>
> The problem is that the cost is huge: threads use a lot of kernel
> memory,  context switches are slow, and scheduling is also slow,
> compared to a trivial register exchange.
>
> The other problem is that linear address machines are unsuitable for
> handling many threads. Linear addressing only works for that
> if you have a massive address space. This is because threads
> all need a huge address space for the stack to ensure they
> don't run out of address space, because stack addresses
> cannot be changed. (Segmented architectures do not have this problem).
>
> You cannot run many threads on a 32 bit linear address machine:
> so it doesn't matter if you can swap stacks or not. By today's standards
> you can run a modest number of threads on a 64 bit machine.
> Just for perspective .. nVidia Kepler GPU is designed to run
> 10,000 threads. It's in the shops now.. It has 3000 cores,
> memory is slow, context switching is lightning fast, so its designed
> so you oversaturate it with threads (it actually swaps threads whilst
> waiting for memory, the same way a desktop swaps threads
> waiting for disk).
>
> With these kinds of demands, linear stacks aren't an option.
> Which is why Felix uses a heap allocated spaghetti stack.
> Other modern systems use a linear allocation scheme with
> a copying collector: as fast as the machine stack but without
> the huge cost in address space. Ocaml does this (but it uses
> the stack too).
>
> Ideally, we have to get rid of the machine stack.
>
> Trampolines are a hack to implement closures for stupid
> languages like C where a function only has a single address
> instead of two (one for the program counter and one for
> the data pointer). The binding is dynamic so it needs
> writable memory which is also executable, usually done
> with two mappings of the same physical memory. I'm not
> sure why that would be slow.
>
> Anyhow our problem here is that most CPU's can handle
> the control structures required, but most higher level
> languages cannot.
>
>
> --
> john skaller
> skal...@users.sourceforge.net
> http://felix-lang.org
>
>
>
> --
> You received this message because you are subscribed to the Google Groups
> "Felix Language" group.
> To post to this group, send email to felix-langu...@googlegroups.com.
> To unsubscribe from this group, send email to
> felix-language+unsubscr...@googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/felix-language?hl=en.
>
>
------------------------------------------------------------------------------
Keep yourself connected to Go Parallel: 
DESIGN Expert tips on starting your parallel project right.
http://goparallel.sourceforge.net
_______________________________________________
Felix-language mailing list
Felix-language@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/felix-language

Reply via email to