On 28/11/2012, at 7:52 AM, Dobes Vandermeer wrote: > Trampoling means that control flow is implemented by returning a continuation > closure in order to transfer control. This approach of returning a closure > can be used instead of "goto" and you'll have many small functions instead of > one giant one.
Ah, I see. What you're talking about is continuation passing style (CPS). Felix procedures use something closely related: its technically called resumptions. Basically, the top level driver invokes a procedural continuation which returns another continuation which the driver then invokes, and so on. Procedural continuations are represented by C++ class object "this" pointer and a "program counter" in each object. In ISO C++ the PC is just an integer which is used to drive a switch. In g++ we use a computed jump instead (much faster). Nonlocal gotos in g++ are simple: they're computed jumps. Non-local gotos in ISO C++ are a serious PITA! Note that these things are not really restartable like functional continuations: they have mutable state. It's not CPS because a procedure doesn't resume itself, it doesn't chain to "the rest of the program" by calling a continuation passed in as an argument, it chains to the rest of the program by *returning* the continuation representing the rest of the program to the scheduler. This is more powerful that using a tail call because the scheduler can decide to defer execution of the "rest of the program" and instead execute some pieces of another "rest of the program". Which it does, only its "rest of the fibre" not program. That's how schannel I/O works. This method also does not depend on tail-call to jump optimisation. > > The cost of returning a closure and then calling it is more than the cost of > a direct goto. Replacing each and every goto you currently have with a > "return / call" sequence could be very costly in space and time. > > So, I thought perhaps there's a hybrid approach where you use the current > "goto" approach until the function gets "too big" - then you can split it and > have some goto's return a continuation closure that is automatically called > by the parent to transfer control to another function. Actually, the inliner uses a "too big" rule to stop inlining, which creates separate C++ functions The problem is that "too big" is ridiculously small. Clang -O2 is basically useless. The default is now -O1 on clang. > > There are some C and C++ fiber libraries out there that use different > approaches than a flat stack, such as stack copying Stack copying is obviously too slow in general, and its hard to believe it can work with exceptions, its certainly not portable. > or the "Cheney on the MTA" thing where it just uses the stack as the first > generation of the collector and never pops the stack until a collection is > triggered. Also cannot be portable and isn't C/C++ object model compatible. Its hard to see how a copying collector could ever work with C++. For Felix, being a precise collector, it is actually possible to move data objects around and fixup all references, however it cannot be done conservatively, i.e. the machine stack has to be empty. I actually had a compactor working at one stage but it was so slow I threw it out :) > > Unless you are very concerned about supporting 32-bit architectures, it may > be wise to forget about fibers for the time being. 64-bit is the way of the > future, Do not be so sure .. don't forget mobile and embedded devices. > and it's not a big problem to support tens of thousands of OS threads on a > single machine. A very simple test on my machine shows I can start up > 160,000 threads without difficulty in a few seconds. What machine? You certainly cannot do that on Linux. Last I looked the default thread limit was 500 or something. [Some time ago though :] > The actual work being done by those threads would probably dominate their > resource usage in most cases. Maybe not. Consider a game where EVERYTHING is a thread. All the actors, sprites, everything. All the time, not just when visible. [Simulation]. Most of those objects need almost no state and use no resources. > > For those few cases that this approach can't handle (scientific computing > simulations or some video games or whatever) you really need a real-world > test case to figure out if your proposed system is actually going to solve > the problem. True. But remember the original design goal of Felix was a telco environment. Felix ran at 500K trx/sec. Where trx is creating a thread or sending one a message with zero content. (Sending C++ strings slows it down a lot due to copying). That was on a 1990's PC. Half a million transactions a second on a PC. Show me a Solaris machine that can even run that many threads, let alone context switch at that speed. Now, its true that today on a single desktop machine, threading is a bit more viable. But ultimately as you know people push the limits of computing resources. So ultimtely the superior performance of synchronous threads will be worthwhile. Remember you need locks to communicate with pthreads (actually pchannels might use "lock free queues" although at the moment they dont). Synchronous channels don't need locks. On the other hand the current technology Felix uses only works to interleave control on a single CPU: it doesn't scale to multiple cores. After all, its basically a control exchange: just like a subroutine call except caller and callee are peers, not master/slave. -- john skaller skal...@users.sourceforge.net http://felix-lang.org ------------------------------------------------------------------------------ Keep yourself connected to Go Parallel: DESIGN Expert tips on starting your parallel project right. http://goparallel.sourceforge.net _______________________________________________ Felix-language mailing list Felix-language@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/felix-language