Pthreads are not working right. tools/launcg crashes, and this program fails to do what I expect:
// pthread test noinline proc mkfibre(p:int, f:int) { spawn_fthread { for k in 1 upto 10 do eprint$ "Thr " + str p + " fibre " + str f + " step " + str k + "\n"; //Faio::sleep(sys_clock, 0.01 + f.double / 5.0); done eprint$ "Thr " + str p + " fibre " + str f + " DEAD " + "="*20+"\n"; }; } noinline proc mkthread (x:int) { spawn_pthread { for var j in 1 upto 10 do eprint$ "Thr " + str x + " step " + str j + "\n"; //Faio::sleep(sys_clock, 0.1 + x.double / 2.0); for var f in 0 upto 10 do mkfibre(x, f); done done print$ "Thread " + str x + " done" + "*"*20+"\n"; }; } for var k in 1 upto 10 do mkthread k; done; I will start by covering two issues that can be dealt with. 1) Due to the way the optimiser works, it doesn't recognize threads: neither p nor f threads. It will happily inline the procedures above without the "noinline" keyword, and this causes the fibres threads to refer to the current values of their parameters in the main procedure, instead of a private copy of each one. "noinline "stops this, so each fibre/pthread is bound to a distinct stack frame object. This NOT a bug in the optimiser, in fact exactly the same thing happens if the spawned thread is replaced by a closure which is returned and executed later. the closure is bound to its parent by a pointer, and if that object is inlined into some other object, the binding is to that object, instead of a separate object. "noinline" prevents this. The difficult is in the semantic specification. Currently both behaviours are correct unless noinline is specified, which means inlining actually has semantics, it isn't just an optimisation hint. This is no different to the parameter passing rule, where "val" parameters can be evaluated either eagerly or lazily unless you specify one or the other with a "var" or closure type respectively. The design is misleading but intentional, to permit optimisations which otherwise would not be possible, at least without much more difficult analysis. 2) Pchannels don't work as I expected. They block the pthread. This isn't wrong, it just isn't what I expected: to block the containing fthread only, just like other async ops such as sleeping or waiting on a socket. 3) So now the REAL problem to be addressed here: if you run the program above, some of the loop instances of the inner fthread just don't run at all. In fact, the pthreads seem to die prematurely. TO explain I will start back a bit! Fibres work by a flat (stackless) piece of code creating a new fibre as a heap object which is added to a scheduler wait list. The current fibre can yield, which it does by storing the "next" address in a variable and returning. When it is resumed, a jump to the "next" address is done so control continues "where it left off before". Normally, fibres just don't yield like this. What that can do is read or write an schannel. When a write is first done, the a pointer to the writing fibre is stored in the schannel and then the next fibre on the schedule list is run. The writing fibre is NOT added to the wait list. When a fibre reads, it grabs data from the writer fibre and the writer is unlinked from the schannel and put on the wait list. The waiting fibre is not garbage collected if its in the wait list. If it's attached to a channel, then if the channel will usually be stored in a variable in some reachable fibre, and so the channel is reachable and thus the waiting writer is also reachable. If there are no "owners" of the channel other than the writer, the channel and writer aren't reachable and they get reaped by the collector. If there's a "deadlock" on two channels, two fibres both become unreachable and suicide, eliminating the deadlock: fibres cannot deadlock, they just commit suicide. Ok, that's the SIMPLE case. Here's the harder one: when a fibre does a supported "blocking" operations: socket I/O or sleeping on Faio timer clock, the fibre is made a GC root to ensure it isn't reaped by the collector and moved into an async sleep queue. This is distinct from the synchronous scheduling queue. When the system "poll" thread finds an event that the fibre is waiting on, it wakes the fibre up by unrooting it and putting it back on the synchronous scheduling queue. For sockets, it does the I/O first, so the woken up fibre will resume with its request serviced. OK, so now the hard part! Termination! Felix (should) terminate a thread when (a) there are no fibres on the synchronous schedule queue (b) there are no fibres on the asynchronous sleep queue and should terminate the program when all pthreads have completed. If you run a bunch of fthreads that wait on the clock a bit, you'll find the program will not exit until all the fthreads are completed. If you run several pthreads you'll find that the program will not terminate until all the pthreads have completed. The "main" thread is special, because it (a) creates the garbage collector (b) creates the "thread frame" (c) starts the code with standard file parameters, etc (d) won't complete until all child pthreads finish. There is special code for this main thread. It sucks a bit. The asymmetry is ugly. Now you must also note there is only ONE thread doing the timer and socket servicing! All the pthreads use a single asynchronous event monitor thread. Access to the event queue therefore has to be serialised. The requests going to this thread pass a pointer the object to be woken up, it is always going to be rescheduled in its original thread (which is never the thread that reschedules it, that's the event service thread). NOW: what I think is happening is: the spawned pthreads are returning whilst there are fibres waiting on that threads sleep queue. They're just objects. When the pthread returns the sleep queues for that thread just get lost. In the program above this doesn't cause a crash, but in the launch program .. well the event handler is rescheduling an fibre from a pthread queue that is deleted, onto a synchronous wait queue that is also deleted. In other words .. the pthread is being killed when it hits the "return" statement at the end of its procedure. Unlike the main thread, it isn't waiting for asynch events. It should. That is my THEORY: roughly "spawn_pthread" is not calling the right RTL routine. -- john skaller skal...@users.sourceforge.net ------------------------------------------------------------------------------ Keep Your Developer Skills Current with LearnDevNow! The most comprehensive online learning library for Microsoft developers is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3, Metro Style Apps, more. Free future releases when you subscribe now! http://p.sf.net/sfu/learndevnow-d2d _______________________________________________ Felix-language mailing list Felix-language@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/felix-language