On 28/11/2012, at 7:52 AM, Dobes Vandermeer wrote:

> Trampoling means that control flow is implemented by returning a continuation 
> closure in order to transfer control.  This approach of returning a closure 
> can be used instead of "goto" and you'll have many small functions instead of 
> one giant one.

Ah, I see. What you're talking about is continuation passing style (CPS).

Felix procedures use something closely related: its technically called 
resumptions.
Basically, the top level driver invokes a procedural continuation which
returns another continuation which the driver then invokes, and so on.

Procedural continuations are represented by C++ class object "this" pointer
and a "program counter" in each object. In ISO C++ the PC is just
an integer which is used to drive a switch. In g++ we use a computed
jump instead (much faster).  Nonlocal gotos in g++ are simple:
they're computed jumps. Non-local gotos in ISO C++ are a serious
PITA!

Note that these things are not really restartable like functional continuations:
they have mutable state.

It's not CPS because a procedure doesn't resume itself, it doesn't 
chain to "the rest of the program" by calling a continuation passed in
as an argument, it chains to the rest of the program by *returning*
the continuation representing the rest of the program to the scheduler.

This is more powerful that using a tail call because the scheduler
can decide to defer execution of the "rest of the program" and
instead execute some pieces of another "rest of the program".
Which it does, only its "rest of the fibre" not program.
That's how schannel I/O works.

This method also does not depend on tail-call to jump optimisation.


> 
> The cost of returning a closure and then calling it is more than the cost of 
> a direct goto.  Replacing each and every goto you currently have with a 
> "return / call" sequence could be very costly in space and time.
> 
> So, I thought perhaps there's a hybrid approach where you use the current 
> "goto" approach until the function gets "too big" - then you can split it and 
> have some goto's return a continuation closure that is automatically called 
> by the parent to transfer control to another function.

Actually, the inliner uses a "too big" rule to stop inlining, which creates 
separate C++
functions

The problem is that "too big" is ridiculously small. Clang -O2 is basically 
useless.
The default is now -O1 on clang.


> 
> There are some C and C++ fiber libraries out there that use different 
> approaches than a flat stack, such as stack copying

Stack copying is obviously too slow in general, and its hard to believe it can 
work
with exceptions, its certainly not portable.


> or the "Cheney on the MTA" thing where it just uses the stack as the first 
> generation of the collector and never pops the stack until a collection is 
> triggered.

Also cannot be portable and isn't C/C++ object model compatible.
Its hard to see how a copying collector could ever work with C++.

For Felix, being a precise collector, it is actually possible to move
data objects around and fixup all references, however it cannot be done
conservatively, i.e. the machine stack has to be empty. 

I actually had a compactor working at one stage but it was so slow
I threw it out :)

> 
> Unless you are very concerned about supporting 32-bit architectures, it may 
> be wise to forget about fibers for the time being.  64-bit is the way of the 
> future,

Do not be so sure .. don't forget mobile and embedded devices.

> and it's not a big problem to support tens of thousands of OS threads on a 
> single machine.  A very simple test on my machine shows I can start up 
> 160,000 threads without difficulty in a few seconds.

What machine? You certainly cannot do that on Linux. Last I looked the default 
thread limit was 500 or something.
[Some time ago though :]

>  The actual work being done by those threads would probably dominate their 
> resource usage in most cases.

Maybe not. Consider a game where EVERYTHING is a thread.
All the actors, sprites, everything. All the time, not just when visible.
[Simulation].

Most of those objects need almost no state and use no resources.

> 
> For those few cases that this approach can't handle (scientific computing 
> simulations or some video games or whatever) you really need a real-world 
> test case to figure out if your proposed system is actually going to solve 
> the problem.

True. But remember the original design goal of Felix was a telco environment.
Felix ran at 500K trx/sec. Where trx is creating a thread or sending one a 
message with
zero content. (Sending C++ strings slows it down a lot due to copying).

That was on a 1990's PC. Half a million transactions a second on a PC.
Show me a Solaris machine that can even run that many threads, let
alone context switch at that speed.

Now, its true that today on a single desktop machine, threading is
a bit more viable. But ultimately as you know people push the limits
of computing resources. So ultimtely the superior performance
of synchronous threads will be worthwhile. Remember you need
locks to communicate with pthreads (actually pchannels might
use "lock free queues" although at the moment they dont).

Synchronous channels don't need locks. On the other hand the
current technology Felix uses only works to interleave control
on a single CPU: it doesn't scale to multiple cores.
After all, its basically a control exchange: just like a subroutine
call except caller and callee are peers, not master/slave.


--
john skaller
skal...@users.sourceforge.net
http://felix-lang.org




------------------------------------------------------------------------------
Keep yourself connected to Go Parallel: 
DESIGN Expert tips on starting your parallel project right.
http://goparallel.sourceforge.net
_______________________________________________
Felix-language mailing list
Felix-language@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/felix-language

Reply via email to