Re: Start of thread proposal
Dan Sugalski <[EMAIL PROTECTED]> wrote: > At 3:08 PM +0100 1/24/04, Leopold Toetsch wrote: >>Fianlizers and incremental DOD don't play together. The DOD must run to >>end to be sure, that the objects isn't referenced any more. > Finalizers and incremental DOD work just fine together. At some point > the incremental DOD will figure out that something's dead, just as > the stop-the-world DOD will. It just may take a bit longer. I wanted to say: "Finalizers & destructors of PMCs that need timely destruction ...". In the case of dead objects at scope exit. leo
Re: Start of thread proposal
At 3:08 PM +0100 1/24/04, Leopold Toetsch wrote: Dan Sugalski <[EMAIL PROTECTED]> wrote: At 5:56 PM +0100 1/23/04, Leopold Toetsch wrote: Stopping all interpreters seems to be cheaper. The rwlock will sooner or later stop all interpreters anyway (on first PMC access), so we can omit the price for the rwlock and just hold the world(s). The rwlock only stops all the interpreters when the DOD runs. Sure. But that would still need to aquire a readers rwlock for each PMC access. This is more expensive then a mutex. During DOD any PMC access will halt the interpreter, so we can do that explicitely too and save the cost for the rwlock overhead. Albeit I can imagine, that aggregates will need a rwlock anyway. Well... only the mutating vtable entries need to get the lock, so that reduces the expense somewhat. Still, I agree, it may be untenably expensive. [ incremental GC ] If we have the facilities to do incremental DOD runs then this is definitely a possibility except for finalizers. Finalizers make things interesting, though if the background thread doing the DOD is a member of the interpreter pool then it'd work out OK. Fianlizers and incremental DOD don't play together. The DOD must run to end to be sure, that the objects isn't referenced any more. Finalizers and incremental DOD work just fine together. At some point the incremental DOD will figure out that something's dead, just as the stop-the-world DOD will. It just may take a bit longer. -- Dan --"it's like this"--- Dan Sugalski even samurai [EMAIL PROTECTED] have teddy bears and even teddy bears get drunk
Re: Start of thread proposal
Dan Sugalski <[EMAIL PROTECTED]> wrote: > At 5:56 PM +0100 1/23/04, Leopold Toetsch wrote: >>Stopping all interpreters seems to be cheaper. The rwlock will sooner or >>later stop all interpreters anyway (on first PMC access), so we can omit >>the price for the rwlock and just hold the world(s). > The rwlock only stops all the interpreters when the DOD runs. Sure. But that would still need to aquire a readers rwlock for each PMC access. This is more expensive then a mutex. During DOD any PMC access will halt the interpreter, so we can do that explicitely too and save the cost for the rwlock overhead. Albeit I can imagine, that aggregates will need a rwlock anyway. [ incremental GC ] > If we have the facilities to do incremental DOD runs then this is > definitely a possibility except for finalizers. Finalizers make > things interesting, though if the background thread doing the DOD is > a member of the interpreter pool then it'd work out OK. Fianlizers and incremental DOD don't play together. The DOD must run to end to be sure, that the objects isn't referenced any more. leo
Re: Start of thread proposal
On Friday, January 23, 2004, at 11:09 , Dan Sugalski wrote: Ah, OK, I see. The problem comes in where we've got an object in the transient root set (basically the processor stack and registers) that gets anchored into the base root set (stash, pads, or whatever) after the DOD has traced where it's going into and falls out of the transient root set before the DOD traces over to it. (Worse than that. It could come from any untraced location—or possibly even be brand new, depending upon memory allocation details.) — Gordon Henriksen [EMAIL PROTECTED]
Re: Start of thread proposal
At 5:56 PM +0100 1/23/04, Leopold Toetsch wrote: Dan Sugalski <[EMAIL PROTECTED]> wrote: At 12:04 AM -0500 1/20/04, Gordon Henriksen wrote: C was not marked reachable (although it was) and was thus erroneously collected, leaving a dangling pointer. This problem applies equally to copying and mark-sweep collectors. Ah, OK, I see. That is well knowm in the literature as "Tri-Color Invariant": Black are the already marked (live) PMCs, grey the PMCs on the next_for_GC list, white the not yet reached PMCs. The strong tri-color invariants states that no black object may point to a white object, the weak invariant states, that at least one path from the black to a white object must contain a grey one. This can be handled by either stop the world GCs or by intercepting each read or write access that would change the color of an object and update the color accordingly. This is e.g. used for incremental GC. As soon as we have a thread in the background that runs GC, we have to cope with these issues. Yeah, point. And since we want to be able to have an incremental DOD at some point we need to get support for it in now. > ... or have some way to force a low-overhead rendezvous. Okay, we're going to mandate that we have read/write locks, each interpreter pool has one, and mutating vtable entries must get a read lock on the pool read/write lock. Pricey (ick) but isolated to mutators. The DOD gets a write lock on it, which'll block all read/write access so no mutators can be in process while the pool DOD runs. Stopping all interpreters seems to be cheaper. The rwlock will sooner or later stop all interpreters anyway (on first PMC access), so we can omit the price for the rwlock and just hold the world(s). The rwlock only stops all the interpreters when the DOD runs. Anything that mutates a PMC gets a *read* lock so that they don't interfere with each other, and only pause if the DOD is running. The DOD getting a *write* lock will block any read lock attempts, so when the DOD is running no mutation can take place. Since mutation doesn't require any global exclusion it doesn't need a write lock -- the read lock is sufficient. An alternative would be real background incremental GC, *when* running multiple threads. I estimate the overhead to be in regions of a rwlock (with no contention of course). If we have the facilities to do incremental DOD runs then this is definitely a possibility except for finalizers. Finalizers make things interesting, though if the background thread doing the DOD is a member of the interpreter pool then it'd work out OK. -- Dan --"it's like this"--- Dan Sugalski even samurai [EMAIL PROTECTED] have teddy bears and even teddy bears get drunk
Re: Start of thread proposal
Dan Sugalski <[EMAIL PROTECTED]> wrote: > At 12:04 AM -0500 1/20/04, Gordon Henriksen wrote: >>C was not marked reachable (although it was) and was thus >>erroneously collected, leaving a dangling pointer. This problem >>applies equally to copying and mark-sweep collectors. > Ah, OK, I see. That is well knowm in the literature as "Tri-Color Invariant": Black are the already marked (live) PMCs, grey the PMCs on the next_for_GC list, white the not yet reached PMCs. The strong tri-color invariants states that no black object may point to a white object, the weak invariant states, that at least one path from the black to a white object must contain a grey one. This can be handled by either stop the world GCs or by intercepting each read or write access that would change the color of an object and update the color accordingly. This is e.g. used for incremental GC. As soon as we have a thread in the background that runs GC, we have to cope with these issues. > That means we're going to have to have either a really forgiving DOD > system that takes multiple passes before it collects up a PMC or > buffer (which still isn't safe) Alas not an alternative, it doesn't work. > ... or have some way to force a > low-overhead rendezvous. > Okay, we're going to mandate that we have read/write locks, each > interpreter pool has one, and mutating vtable entries must get a read > lock on the pool read/write lock. Pricey (ick) but isolated to > mutators. The DOD gets a write lock on it, which'll block all > read/write access so no mutators can be in process while the pool DOD > runs. Stopping all interpreters seems to be cheaper. The rwlock will sooner or later stop all interpreters anyway (on first PMC access), so we can omit the price for the rwlock and just hold the world(s). An alternative would be real background incremental GC, *when* running multiple threads. I estimate the overhead to be in regions of a rwlock (with no contention of course). leo
Re: Start of thread proposal
At 12:04 AM -0500 1/20/04, Gordon Henriksen wrote: On Monday, January 19, 2004, at 06:37 , Gordon Henriksen wrote: Dan Sugalski wrote: For a copying collector to work, all the mutators must be blocked, and arguably all readers should be blocked as well. True of non-moving collectors, too. [...] Some of what I've written up addresses why. [...] I'll send that section when I get out of the office. Consider this simple object graph: C was not marked reachable (although it was) and was thus erroneously collected, leaving a dangling pointer. This problem applies equally to copying and mark-sweep collectors. Ah, OK, I see. The problem comes in where we've got an object in the transient root set (basically the processor stack and registers) that gets anchored into the base root set (stash, pads, or whatever) after the DOD has traced where it's going into and falls out of the transient root set before the DOD traces over to it. Race condition. Dammit. Okay, I'd not wrapped my brain around that possibility, which will make for some interesting DOD tracing, especially on SMP systems. I was *really* hoping a single lock on the arena allocation system that the DOD held onto while tracing would be sufficient, but I see that it isn't. That means we're going to have to have either a really forgiving DOD system that takes multiple passes before it collects up a PMC or buffer (which still isn't safe) or have some way to force a low-overhead rendezvous. The obvious rendezvous point is the arena lock, but that's going to see a lot of contention anyway, and we'd as soon not have a single one for speed reasons. Feh. Okay, we're going to mandate that we have read/write locks, each interpreter pool has one, and mutating vtable entries must get a read lock on the pool read/write lock. Pricey (ick) but isolated to mutators. The DOD gets a write lock on it, which'll block all read/write access so no mutators can be in process while the pool DOD runs. I think that'll work. The .1j thread spec requires r/w locks, and we can fake them on platforms that don't implement them. Hopefully Nigel's got the Windows scoop so we can see if Win32 has anything like this (which I'd expect it does) -- Dan --"it's like this"--- Dan Sugalski even samurai [EMAIL PROTECTED] have teddy bears and even teddy bears get drunk
Re: Start of thread proposal
At 10:40 PM +0100 1/22/04, Leopold Toetsch wrote: Dan Sugalski <[EMAIL PROTECTED]> wrote: The only tricky bit comes in with the examination of the root set of other threads--accessing the hardware register contents of another running thread may be... interesting. (Which, I suppose, argues for some sort of volatile marking of the temp variables) You'll provide the "interesting" part, that is: use Psi::Estimate::CPU_Register_Changes_in_Future_till_mark_is_done; Nah, no need for that one. I need to go back and recheck the stuff that Gordon posted in case I missed something, but if you put a lock on the arena allocator this isn't an issue. -- Dan --"it's like this"--- Dan Sugalski even samurai [EMAIL PROTECTED] have teddy bears and even teddy bears get drunk
Re: Start of thread proposal
Dan Sugalski <[EMAIL PROTECTED]> wrote: > The only tricky bit comes in with the examination of the root set of > other threads--accessing the hardware register contents of another > running thread may be... interesting. (Which, I suppose, argues for > some sort of volatile marking of the temp variables) You'll provide the "interesting" part, that is: use Psi::Estimate::CPU_Register_Changes_in_Future_till_mark_is_done; SCNR, leo
Re: Start of thread proposal
At 4:59 PM -0800 1/19/04, Dave Whipp wrote: "Dan Sugalski" <[EMAIL PROTECTED]> wrote: =head2 Guarantees Maybe this isn't strictly a threading thing, but are we going to make any guarantees about event orderings? For example, will we guarantee that a sequence of events send from one thread to another will always be received in the order they are sent? Hrm. I suppose we really ought to, so we will. If we prioritize events (and I'm still torn) then they'll be in priority then send order. -- Dan --"it's like this"--- Dan Sugalski even samurai [EMAIL PROTECTED] have teddy bears and even teddy bears get drunk
RE: Start of thread proposal
[Note to everyone -- I'm digging through my mail so be prepared for a potential set of responses to things that're already answered...] At 6:37 PM -0500 1/19/04, Gordon Henriksen wrote: Dan Sugalski wrote: For a copying collector to work, all the mutators must be blocked, and arguably all readers should be blocked as well. True of non-moving collectors, too. Or, let me put it this way: non- copying *GC* (the sweep or copy phase) can be threadsafe, but the mark phase is never threadsafe. The method in which marking is not threadsafe is a bit more pathological (i.e., it's not the common case as it is with the copying collector), but a standard tracing DOD cannot be correct when competeting with mutators. It WILL collect non- garbage (those are MUTATORS, remember), and the result WILL be Heizenbugs and crashes. Some of what I've written up addresses why. It's pretty simple to demonstrate a single case to prove the point, but I don't feel like re-creating the ASCII art right now. :) I'll send that section when I get out of the office. parrot will have to be able to suspend all threads in the environment. Unfortunate, but really quite unavoidable. I'm not sure that's the case. What we need to do is suspend metadata mutation--that is, buffers can't be resized while a gc run is in progress. Other than that, if we have guarantees of aligned atomic pointer writes (so we don't have word shearing to deal with) we don't have to stop the mutation of the contents of the buffers themselves. The only tricky bit comes in with the examination of the root set of other threads--accessing the hardware register contents of another running thread may be... interesting. (Which, I suppose, argues for some sort of volatile marking of the temp variables) -- Dan --"it's like this"--- Dan Sugalski even samurai [EMAIL PROTECTED] have teddy bears and even teddy bears get drunk
Re: Start of thread proposal
At 12:15 AM +0100 1/22/04, Leopold Toetsch wrote: Dan Sugalski <[EMAIL PROTECTED]> wrote: [ No Guarantees WRT data access } ... seems to indicate that even whole ops like add P,P,P are atomic. Yep. They have to be, because they need to guarantee the integrity of the pmc structures and the data hanging off them (which includes buffer and string stuff) But isn't that a contradiction? Or better: When even an opcode like above is atomic, that an access to a shared PerlNum should be guaranteed being atomic too. Sure, but there's a *lot* more to user data integrity than atomic access to individual pieces. That's the least of the problem. The user data issue is one where you have multiple pieces being updated, or one piece being updated multiple times--that's the stuff we're not guaranteeing. So, while we will make sure that storing a single value into a hash happens atomically, we won't guarantee that a series of stores into the hash, or a combination of loads and stores, or even a combination of reads and writes to a scalar, will happen atomically. -- Dan --"it's like this"--- Dan Sugalski even samurai [EMAIL PROTECTED] have teddy bears and even teddy bears get drunk
Re: Start of thread proposal
On Wed, Jan 21, 2004 at 01:14:46PM -0500, Dan Sugalski wrote: > >... seems to indicate that even whole ops like add P,P,P are atomic. > > Yep. They have to be, because they need to guarantee the integrity of > the pmc structures and the data hanging off them (which includes > buffer and string stuff) Personally, I think it would be better to use corruption-resistant buffer and string structures, and avoid locking during basic data access. While there are substantial differences in VM design--PMCs are much more complicated than any JVM data type--the JVM does provide a good example that this can be done, and done efficiently. Failing this, it would be worth investigating what the real-world performance difference is between acquiring multiple locks per VM operation (current Parrot proposal) vs. having a single lock controlling all data access (Python) or jettisoning OS threads entirely in favor of VM-level threading (Ruby). This forfeits the ability to take advantage of multiple CPUs--but Leopold's initial timing tests of shared PMCs were showing a potential 3-5x slowdown from excessive locking. I've seen software before that was redesigned to take advantage of multiple CPUs--and then required no less than four CPUs to match the performance of the older, single-CPU version. The problem was largely attributed to excessive locking of mostly-uncontested data structures. - Damien
Re: Start of thread proposal
Dan Sugalski <[EMAIL PROTECTED]> wrote: [ No Guarantees WRT data access } >>... seems to indicate that even whole ops like add P,P,P are atomic. > Yep. They have to be, because they need to guarantee the integrity of > the pmc structures and the data hanging off them (which includes > buffer and string stuff) But isn't that a contradiction? Or better: When even an opcode like above is atomic, that an access to a shared PerlNum should be guaranteed being atomic too. leo
Re: Start of thread proposal
At 10:25 AM +0100 1/21/04, Leopold Toetsch wrote: Dan Sugalski <[EMAIL PROTECTED]> wrote: One more question: =head2 Guarantees ... doesn't have anything about user data integrity. So when 2 threads access a PerlNum, they could get a mixture of the typically 2 involved words. Potentially, yeah, though it's really unlikely. But: =item All shared PMCs must have a threadsafe vtable The first thing that any vtable function of a shared PMC must do is to aquire the mutex of the PMCs in its parameter list ... seems to indicate that even whole ops like add P,P,P are atomic. Yep. They have to be, because they need to guarantee the integrity of the pmc structures and the data hanging off them (which includes buffer and string stuff) And how does user level locking play together with that? I've not decided -- That was something I thought we might hash out as we abused the first half of the design doc. Personally I'm all for the user and low-level lock being the same thing, but that doesn't have to be the case. There are advantages and disadvantages to either way of doing things. -- Dan --"it's like this"--- Dan Sugalski even samurai [EMAIL PROTECTED] have teddy bears and even teddy bears get drunk
Re: Start of thread proposal
21/01/2004 02:12:09, Gordon Henriksen <[EMAIL PROTECTED]> wrote: > This is false. The mark phase will still need to run over the entire > process, else it cannot detect all references into the pool. > If by reference, you mean address, then that is true. If when a reference is taken, the address of the referent is stored in arbitrary other data structures, then all memory be scanned by the GC, However, if references were not addresses, but an index into a table of addresses, then only the table need be scanned. When a reference, or the thing holding it, is deleted, then the indexed slot in the address table is blanked and subsequent GC passes looking for references to a referent will no longer find the reference in the table. With the addition of a reference count to the table, all references to a single entity can share the same index, but now I'm groping back in the direction of reference counting, which is not flavour of the month:) Nigel.
Re: Start of thread proposal
Dan Sugalski <[EMAIL PROTECTED]> wrote: One more question: >=head2 Guarantees ... doesn't have anything about user data integrity. So when 2 threads access a PerlNum, they could get a mixture of the typically 2 involved words. But: >=item All shared PMCs must have a threadsafe vtable > The first thing that any vtable function of a shared PMC must do is to > aquire the mutex of the PMCs in its parameter list ... seems to indicate that even whole ops like add P,P,P are atomic. And how does user level locking play together with that? leo
Re: Start of thread proposal
[EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote: > I believe that parrot already has the concept of memory pools in it's > memory management. The idea is that by allocating similarly sized objects The problem is not in the fixed sized header pools, its with the *memory* pool used e.g for string memory. During GC we are walking the *header* pools, and if we find its buffer memory being used, we move that buffer memory to a new store, thereby compacting it. The old memory store(s) are freed then. So string memory can move around beyond your code. > Nigel/ leo
Re: Start of thread proposal
20/01/2004 13:29:35, Gordon Henriksen <[EMAIL PROTECTED]> wrote: >On Monday, January 19, 2004, at 07:58 , [EMAIL PROTECTED] >wrote: > >> Is there any likelyhood that memory allocation will be hidden behind >> macros at two levels: >> - ALLOCPOOL() for allocating large chunks of memory (ppols) that are >>later sub-allocated (and managed) by: >> >> - SUBALLOC() for sub allocating within a pool > >Are you wanting something akin to Apache 2 pools, which are hierarchical >and designed to reduce path length when freeing blocks of objects? For >instance, all objects and sub-pools allocated during an HTTP request >cycle can be deallocated just by free()'ing the top-level request pool. > Nothing to do with Apache memory pools. I believe that parrot already has the concept of memory pools in it's memory management. The idea is that by allocating similarly sized objects from separate (large) allocations, you can reduce the fragmentation of chunks and reduce the incidences where the memory need to be GC'd and compacted. Allocating an 8 byte chunk from a common memory pool is quite likely to nip a little off from a previously freed large chunk. When it comes time reallocate another chunk the same size as that large, freed chunk, although there is enough room in the over all freespace chain to accommodate it, the largest available chunk is now 8 bytes or so too small for the requirement. That induces either a compaction cycle or the need to extend the memory pool by the size of the large request. Allocating all small requests from the same pool, and large from another pool means that you are less likely to fragment the memory and more likely to be able to re-use an existing slot in the free-space chain for any given request. If the allocation of pools, and the allocation of bit-of-a-pool, are macroised, it makes it possible for OS's where there are multiple APIs for memory allocation to bypass the CRT memory allocation routines and use which ever native APis are best suited for the type of allocation. Personally, I would like to see memory allocation for each class type be managed by the class constructor itself. This would theoretically allow each class that has a fixed instance size to manage it's own pool on OS's where that makes sense. The class would allocate a pool for itself when loaded and then allocate instances from that pool on new() and deallocate upon DESTROY. If it's memory pool was exhausted when new was called, it would invoke the GC on *it's pool only*. This separation would mean that each run of the GC would have a much smaller pool of memory to compact and garbage collect when it was invoked. It would also be less likely to be called, as each allocation from a pool of fixed sized sub allocations will only ever need to call the GC when it's pool is entirely exhausted. But that is a radical departure :), so if would just like to see separate calls for pool allocation/reallocation and element allocation/reallocation, rather than having calls to malloc() scattered through out the codebase. > >I don't think parrot could make use of that model, since it can't very >well guarantee that the user cannot retain references past the lifetime >of the pool. Apache trusts modules not to make such errors; parrot can't >trust the byte-code it's executing any further than it can throw it. A >generational collector is a more likely means by which parrot might >reduce memory-related overhead. > > > >Gordon Henriksen >[EMAIL PROTECTED] > > Nigel/
Re: Start of thread proposal
On Tuesday, January 20, 2004, at 07:56 , [EMAIL PROTECTED] wrote: I believe that parrot already has the concept of memory pools in it's memory management. The idea is that by allocating similarly sized objects from separate (large) allocations, you can reduce the fragmentation of chunks and reduce the incidences where the memory need to be GC'd and compacted. Don't presume that. General purpose memory allocators tend to handle this scenario... generically. Use of custom allocators is usually a premature optimization; the best generic memory allocators are very, very good. But the ultimate effect of the overuse of custom allocators or pools is to move memory fragmentation to a higher level, preventing the generic allocator from addressing it even when the memory allocation patterns would otherwise allow it. If the allocation of pools, and the allocation of bit-of-a-pool, are macroised, it makes it possible for OS's where there are multiple APIs for memory allocation to bypass the CRT memory allocation routines and use which ever native APis are best suited for the type of allocation. Would much rather see them abstracted, as they are now, beneath an API. This reduces the number of ways in which compiled parrot extensions can be incompatible with the parrot core. Personally, I would like to see memory allocation for each class type be managed by the class constructor itself. This would theoretically allow each class that has a fixed instance size to manage it's own pool on OS's where that makes sense. The class would allocate a pool for itself when loaded and then allocate instances from that pool on new() and deallocate upon DESTROY. If it's memory pool was exhausted when new was called, it would invoke the GC on *it's pool only*. A PMC class is free to do precisely that. Only the PMC header cannot be thus managed, and that's already pooled. This separation would mean that each run of the GC would have a much smaller pool of memory to compact and garbage collect when it was invoked. This is false. The mark phase will still need to run over the entire process, else it cannot detect all references into the pool. (Unless you can provide a type-safety guarantee that your type can only be referenced by other instances of itself. In which case, your type is necessarily "garbage" and the best optimization strategy would be dead-code elimination. :) I predict the best overall throughput would be with the sweep or copy phase run immediately after the mark phase, across the entire process: It would be wasteful to do all that marking and yet leave known garbage uncollected. It would also be less likely to be called, as each allocation from a pool of fixed sized sub allocations will only ever need to call the GC when it's pool is entirely exhausted. Statistically, multiple pools will individually exhaust themselves *MORE* frequently than a global pool. The ideal case for collection frequency is that there is only one pool, or that all pools rise to capacity concurrently. In these ideal cases, the segregated pools will exhaust themselves precisely *AS* frequently as a global pool. In any case, there is no possibility for a decrease in collection events through the use of pooled memory. As per above, collection events will also not become less expensive. Thus, expect garbage collection to have a negative net impact on performance if pooled memory is used. But that is a radical departure :), so if would just like to see separate calls for pool allocation/reallocation and element allocation/reallocation, rather than having calls to malloc() scattered through out the codebase. Uh. See memory.c and smallobject.c... Digression: Most copying collectors avoid free space fragmentation by design. (That's one of their benefits.) An allocator as might be employed by a standard compacting, generational system (Java, MS CLR), might resemble the following: #define SYS_ALIGN sizeof(double) /* for instance */ struct generation { volatile void *next; void *top; void *base; } struct generation *generation_init(size_t capacity) { struct generation *g = (struct generation *) malloc(sizeof(struct generation)); g->next = g->base = malloc(capacity); g->top = g->base + capacity; } void* generation_alloc(struct generation* p, size_t len) { len = (len + SYS_ALIGN - 1) & ~(SYS_ALIGN - 1); /* round up */ do { void *ptr = ATOMIC_ADD(&p->next, len); if (ptr <= p->top) { return ptr - len; /* break the loop */ } gc(); } while (true); /* try again */ } + No free space fragmentation at all. + Super-duper fast. + Threadsafe. + Lock-free unless ATOMIC_ADD cannot be implemented without a mutex. - There is copying overhead when the generation is exhausted. - One could say that the generation is fragmented by garbage. + It is no more fragmented by garbage than a GC system which uses a freelist allocator. — Gordon He
Re: Start of thread proposal
Dan Sugalski <[EMAIL PROTECTED]> wrote: > At 5:16 PM +0100 1/20/04, Leopold Toetsch wrote: >>What about temporary PMCs (or strings)? > Those won't get marked as shared unless we're unconditionally marking > things as shared. (Though we may just give 'em a mutex anyway) This needs either one check per PMC, if its shared or not, or additional costs for locking temps. Both is rather expensive, compared to the raw working functionality of a vtable. >> Evaluating a non-trivial >>expression can have lot of these. Each new temp would need a lock on the >>memory sub-system. > Only on allocation. Yes: "Each *new* temp .." > ... We could have a per-thread freelist if we wanted. > Wouldn't be unreasonable. That already smells like separate memory sub-systems. The freelist has to be filled first. During DOD runs, it has to be refilled. To achieve some locality of the temp PMCs, you can't just give these to arbitrary intpreters. Separate free lists seem also to imply separate DOD runs to cleanup the temps. >>i.e. typically 3 mutexes, it could be that the vtable of a shared PMC >>should just grab one interpreter lock (of the owner of that PMC) and >>that not all is shared (especially not the temps). > Yeah, but since PMCs aren't owned by any one interpreter in the pool the owner is for me the interpreter, that did allocate the arena. > ... but are rather pool-wide you run into either the global-lock problem > (which kills performance on SMP systems) or interpreters potentially > deadlocking on unrelated data access. Could be. But the performance is the overall throughput. When a lot of fine grained locks (plus memory subsystem locks for temps) take much longer then one global lock, then that scheme can nethertheless be slower on SMP machines. It would scale better though for more CPUs. Deadlocks shouldn't be a problem, if exactly one vtable (like in SharedRef) just grabs one and only one lock. leo
Re: Start of thread proposal
At 5:16 PM +0100 1/20/04, Leopold Toetsch wrote: Dan Sugalski <[EMAIL PROTECTED]> wrote: =item Interpreter pools will share allocation pools All the interpreters in an interpreter pool will share header and memory allocation pools. What about temporary PMCs (or strings)? Those won't get marked as shared unless we're unconditionally marking things as shared. (Though we may just give 'em a mutex anyway) Evaluating a non-trivial expression can have lot of these. Each new temp would need a lock on the memory sub-system. Only on allocation. We could have a per-thread freelist if we wanted. Wouldn't be unreasonable. Combined with the cost of: =item All shared PMCs must have a threadsafe vtable The first thing that any vtable function of a shared PMC must do is to aquire the mutex of the PMCs in its parameter list i.e. typically 3 mutexes, it could be that the vtable of a shared PMC should just grab one interpreter lock (of the owner of that PMC) and that not all is shared (especially not the temps). Yeah, but since PMCs aren't owned by any one interpreter in the pool but are rather pool-wide you run into either the global-lock problem (which kills performance on SMP systems) or interpreters potentially deadlocking on unrelated data access. -- Dan --"it's like this"--- Dan Sugalski even samurai [EMAIL PROTECTED] have teddy bears and even teddy bears get drunk
Re: Start of thread proposal
Dan Sugalski <[EMAIL PROTECTED]> wrote: >=item Interpreter pools will share allocation pools > All the interpreters in an interpreter pool will share header and > memory allocation pools. What about temporary PMCs (or strings)? Evaluating a non-trivial expression can have lot of these. Each new temp would need a lock on the memory sub-system. Combined with the cost of: >=item All shared PMCs must have a threadsafe vtable > The first thing that any vtable function of a shared PMC must do is to > aquire the mutex of the PMCs in its parameter list i.e. typically 3 mutexes, it could be that the vtable of a shared PMC should just grab one interpreter lock (of the owner of that PMC) and that not all is shared (especially not the temps). leo
Re: Start of thread proposal
At 4:24 PM +0100 1/20/04, Leopold Toetsch wrote: Dan Sugalski <[EMAIL PROTECTED]> wrote: At 1:46 PM +0100 1/20/04, Leopold Toetsch wrote: The term "process id" is really misleading. Nope. It's the process ID. Nothing misleading there, unless you've done threading work under linux, since for a while it gave each thread a separate PID. $ ps | grep [p]arrot 17472 pts/000:00:00 parrot 17473 pts/000:00:00 parrot 17474 pts/000:00:00 parrot So the unless applies ;) This is a single parrot interpreter, with main, thread-manager, and event-handler thread. 3 PIDs. Right. Linux is, as I said, arguably broken here, but I don't want to argue that point. That's why I specified that the PID is the process id of whatever instantiated parrot -- in this case, no matter which thread you're in, when you ask for the pid from parrot you should get 17472. (In this case, at least) Parrot needs to grab the pid at global initialization time and store it away for later pid queries. -- Dan --"it's like this"--- Dan Sugalski even samurai [EMAIL PROTECTED] have teddy bears and even teddy bears get drunk
Re: Start of thread proposal
At 16:24 +0100 1/20/04, Leopold Toetsch wrote: Dan Sugalski <[EMAIL PROTECTED]> wrote: At 1:46 PM +0100 1/20/04, Leopold Toetsch wrote: The term "process id" is really misleading. Nope. It's the process ID. Nothing misleading there, unless you've done threading work under linux, since for a while it gave each thread a separate PID. $ ps | grep [p]arrot 17472 pts/000:00:00 parrot 17473 pts/000:00:00 parrot 17474 pts/000:00:00 parrot So the unless applies ;) This is a single parrot interpreter, with main, thread-manager, and event-handler thread. 3 PIDs. This _all_ depends on which version of Linux you're using. Older versions don''t do it that way, and newer versions don't do it either (the specific versions escape me at the moment, but I know RH9 does _not_ have pids for threads). I know, because my Thread::Signal module depends on threads having pids. But fewer and fewer Linux systems "support" it (and Linux was the only one who worked this way to begin with). ;-( Liz
Re: Start of thread proposal
Dan Sugalski <[EMAIL PROTECTED]> wrote: > At 1:46 PM +0100 1/20/04, Leopold Toetsch wrote: >>The term "process id" is really misleading. > Nope. It's the process ID. Nothing misleading there, unless you've > done threading work under linux, since for a while it gave each > thread a separate PID. $ ps | grep [p]arrot 17472 pts/000:00:00 parrot 17473 pts/000:00:00 parrot 17474 pts/000:00:00 parrot So the unless applies ;) This is a single parrot interpreter, with main, thread-manager, and event-handler thread. 3 PIDs. leo
Re: Start of thread proposal
At 1:46 PM +0100 1/20/04, Leopold Toetsch wrote: Dan Sugalski <[EMAIL PROTECTED]> wrote: =item SHARED THREAD A thread that's part of a group of threads sharing a common interpreter environment. I presume that this "group of threads" is one thread pool or interpreter pool. Could you expand the definition to cover "pool". Ah, damn, I needed to fix that up before sending it out. Should've been "SHARED INTERPRETER". A shared interpreter is one that's in an interpreter pool. An interpreter pool is a set of interpreters that share common resources. They're essentially threads in the OS sense, and they have shared access to pretty much everything. The memory pools, arenas, and global namespace are shared--pretty much everything except what's in the interpreter structure itself. >=item Each interpreter show the same process id The term "process id" is really misleading. Nope. It's the process ID. Nothing misleading there, unless you've done threading work under linux, since for a while it gave each thread a separate PID. Again I presume, that one pool is ment here: All the interpreters within a process will share a process ID. Nope, again, that's really process. (Again, because of Linux' past bad behavior reporting prorcess IDs) -- Dan --"it's like this"--- Dan Sugalski even samurai [EMAIL PROTECTED] have teddy bears and even teddy bears get drunk
Re: Start of thread proposal
On Monday, January 19, 2004, at 07:58 , [EMAIL PROTECTED] wrote: Is there any likelyhood that memory allocation will be hidden behind macros at two levels: - ALLOCPOOL() for allocating large chunks of memory (ppols) that are later sub-allocated (and managed) by: - SUBALLOC() for sub allocating within a pool Are you wanting something akin to Apache 2 pools, which are hierarchical and designed to reduce path length when freeing blocks of objects? For instance, all objects and sub-pools allocated during an HTTP request cycle can be deallocated just by free()'ing the top-level request pool. I don't think parrot could make use of that model, since it can't very well guarantee that the user cannot retain references past the lifetime of the pool. Apache trusts modules not to make such errors; parrot can't trust the byte-code it's executing any further than it can throw it. A generational collector is a more likely means by which parrot might reduce memory-related overhead. — Gordon Henriksen [EMAIL PROTECTED]
Re: Start of thread proposal
Dan Sugalski <[EMAIL PROTECTED]> wrote: >=item SHARED THREAD > A thread that's part of a group of threads sharing a common > interpreter environment. I presume that this "group of threads" is one thread pool or interpreter pool. Could you expand the definition to cover "pool". >=item Each interpreter show the same process id The term "process id" is really misleading. Again I presume, that one pool is ment here: > All the interpreters within a process will share a process ID. I'd vote for keeping a process ID what it is and use "pool id" or such here. leo
Re: Start of thread proposal
> =item MUTEX > > This is a low level, under the hood, not exposed to users, thing that > can be locked. They're non-recursive, non-read/write, exclusive > things. When a thread gets a mutex, any other attempt to get that > mutex will block until the owning thread releases the mutex. The > platform-native lock construct will be used for this. > Will this be macroised to hide the platform native implementation from the main body of the code? > > The "sleep until something pings me" construct. Useful for queue > construction. Not conceptually associated with a MUTEX. (POSIX > threads require this, so we're going to hide it there behind macros > and/or functions) > Could you elaborate on the nature of what would constitute a "ping"? > > =item POSIX threads > > The threading scheme must be sufficient to support a POSIX > share-everything style of threading, such as is used in perl 5's > "pthread" model, as well as the thread models for Ruby and Python. > Thankyou:) > > =item "Process-type' threads > > The scheme must also support the perl 5 "iThreads" threading > model. In this model no data is shared implicitly, and all sharing > must be done on purpose and explicitly. It very much resembles the > Unix fork-process-with-shared-memory-segment model, not a surprise as > it was originally developed with > Pseudo forks? An interesting extract from the Windows Services for Unix docs. "The Interix subsystem shows improvements in all aspects of performance—ranging from 30 percent improvements in fork and exec performance..." It's sales pitch, but I'm trying to verify it (and build parrot using it). > > > =item System memory allocation routines are threadsafe > > We are assuming that the memory allocation system of the base OS is > threadsafe. While some of the C runtime libraries are notoriously > thread dangerous, memory allocation code generally is threadsafe, and > we'll assume that on all platforms. (Though we will, in general, > manage our own memory) > Is there any likelyhood that memory allocation will be hidden behind macros at two levels: - ALLOCPOOL() for allocating large chunks of memory (ppols) that are later sub-allocated (and managed) by: - SUBALLOC() for sub allocating within a pool ? > > =item Each interpreter will have a unique id > > This ID will be independent of the process or OS thread, and will be > constant across the lifetime of the interpreter. Interpreter IDs > I be reused as interpreters are destroyed and recreated, and as > such are only guaranteed valid while an interpreter is in use. > The provision of method(s) to obtain the native TIDs/HANDLES would make life for those writing implementation specific extensions easier. > >[SNIP] - Too much to understand before commenting... > > > --"it's like this"--- > Dan Sugalski even samurai > [EMAIL PROTECTED] have teddy bears and even >teddy bears get drunk Nigel
Re: Start of thread proposal
"Dan Sugalski" <[EMAIL PROTECTED]> wrote: > =head2 Guarantees Maybe this isn't strictly a threading thing, but are we going to make any guarantees about event orderings? For example, will we guarantee that a sequence of events send from one thread to another will always be received in the order they are sent? (Obviously no guarantees about interleaving of events from other threads). This is usually only important in distributed environments where multiple paths exists between a pair of endpoints, but it would be nice for user-code to be able to rely on it. Dave.
Re: Start of thread proposal
On Monday, January 19, 2004, at 06:37 , Gordon Henriksen wrote: Dan Sugalski wrote: For a copying collector to work, all the mutators must be blocked, and arguably all readers should be blocked as well. True of non-moving collectors, too. [...] Some of what I've written up addresses why. [...] I'll send that section when I get out of the office. Consider this simple object graph: Root set = { &A, &B } [ A=NULL] [ B=&C ] ---> [ C=] Collection begins in thread 1 and marks A as reachable before being preempted by thread 2: [xA=NULL] [ B=&C ] ---> [ C=] Thread 2 sets A to &C, and nullifies B before being preempted again by thread 1: [xA=&C ] ---> [ C=] [ B=NULL] Thread 1 marks B as reachable: [xA=&C ] ---> [ C=] [xB=NULL] The mark phase complete, thread 1 sweeps: [ A=&???] ---> ??? [ B=NULL] C was not marked reachable (although it was) and was thus erroneously collected, leaving a dangling pointer. This problem applies equally to copying and mark-sweep collectors. — Gordon Henriksen [EMAIL PROTECTED]
Re: Start of thread proposal
At 12:58 AM + 1/20/04, [EMAIL PROTECTED] wrote: > =item MUTEX This is a low level, under the hood, not exposed to users, thing that can be locked. They're non-recursive, non-read/write, exclusive things. When a thread gets a mutex, any other attempt to get that mutex will block until the owning thread releases the mutex. The platform-native lock construct will be used for this. Will this be macroised to hide the platform native implementation from the main body of the code? Yes. > The "sleep until something pings me" construct. Useful for queue construction. Not conceptually associated with a MUTEX. (POSIX threads require this, so we're going to hide it there behind macros and/or functions) Could you elaborate on the nature of what would constitute a "ping"? POSIX has a function cond_signal (and cond_broadcast, which is similar). What happens is a thread grabs the condition variable and goes to sleep, and sleeps until another thread does a cond_signal, which then wakes it up. If there are multiple threads sleeping on the condition variable, only one is woken. (cond_broadcast wakes them all up) > =item POSIX threads The threading scheme must be sufficient to support a POSIX share-everything style of threading, such as is used in perl 5's "pthread" model, as well as the thread models for Ruby and Python. Thankyou:) No problem. It's the model I prefer. :) > =item "Process-type' threads The scheme must also support the perl 5 "iThreads" threading model. In this model no data is shared implicitly, and all sharing must be done on purpose and explicitly. It very much resembles the Unix fork-process-with-shared-memory-segment model, not a surprise as it was originally developed with Pseudo forks? Yeah. On Win32, when Perl forks it starts a new thread and clones the interpreter and its contents. If you later do an exec in that thread it starts a new process and grabs hold of it. Definitely a clever trick. An interesting extract from the Windows Services for Unix docs. "The Interix subsystem shows improvements in all aspects of performanceranging from 30 percent improvements in fork and exec performance..." It's sales pitch, but I'm trying to verify it (and build parrot using it). If it works, cool. We can add environment probing at configure time. I worry some about the stability and evil involved there (I've a good idea how much work it is to weld fork into a process with a fundamentally different process model) but that's Microsoft's problem, not ours. > =item System memory allocation routines are threadsafe We are assuming that the memory allocation system of the base OS is threadsafe. While some of the C runtime libraries are notoriously thread dangerous, memory allocation code generally is threadsafe, and we'll assume that on all platforms. (Though we will, in general, manage our own memory) Is there any likelyhood that memory allocation will be hidden behind macros at two levels: - ALLOCPOOL() for allocating large chunks of memory (ppols) that are later sub-allocated (and managed) by: - SUBALLOC() for sub allocating within a pool We'll generally hide behind the memory.c routines, if for no other reason than to allow the embedder to override the memory functions. We need to define that at some point... > =item Each interpreter will have a unique id This ID will be independent of the process or OS thread, and will be constant across the lifetime of the interpreter. Interpreter IDs I be reused as interpreters are destroyed and recreated, and as such are only guaranteed valid while an interpreter is in use. The provision of method(s) to obtain the native TIDs/HANDLES would make life for those writing implementation specific extensions easier. TIDs, definitely. It'll be a sysinfo or interpinfo function. There's the issue of interpreters binding to different threads, but we'll deal with that. I'm not sure what the HANDLE is, but explain it and we can work something out. :) -- Dan --"it's like this"--- Dan Sugalski even samurai [EMAIL PROTECTED] have teddy bears and even teddy bears get drunk
Re: Start of thread proposal
On Tue, 20 Jan 2004 10:45, Dan Sugalski wrote; > Yes. The recommendation I've always seen for deadlock avoidance is > to always have all your code grab its mutexes in some fixed order. Yes; otherwise, you need to back off and start again, if one lock acquisition fails. Consider these functions; for the purpose of this example, lexical lock ordering is implied; func1($, $, $, $, $); func2($, $, $, $, $); func3($, $); So long as the locks are ramped up from the lowest to the highest, there is no chance of func1 acquiring a lock to be held by func2 (eg, $), if that other function already has one of the shared dependancies (eg, $). All well and good. But, what about recursive locks? ie sub func1($one is locked, $two is locked, $three is locked) { for my $x ($one, $two, $three) { func2($x.property) if $x.property; } } sub func2($eins is locked) { if ($eins.property) { func2($eins.property, { }); } } $aaa.property = { }; $bbb.property.property = $aaa; $ccc = { }; if (thork()) { # fork a thread # thread A func1($aaa, $bbb, $ccc); } else { # thread B func2($bbb.property); } OK, the execution order that causes the deadlock is: 1. Thread B - func2() acquires a lock on the $bbb.property PMC. 2. Thread A - func() acquires a lock on $aaa, $bbb, $ccc. 3. Thread A - func2() acquires a lock on $aaa.property, which returns quickly 4. Thread A - func2() blocks waiting for a lock on $bbb.property, held by func2() in thread B 5. Thread B - func2() blocks waiting for a lock on $bbb.property.property, held by func() in thread A. So, there are still possibilities for deadlocks, as the non-linear nature of subroutine calls screws up your nice lock ramping. In summary, acquiring mutexes in a defined order as a means to avoid deadlocks only works when you are acquiring them all up front. If you are acquiring any later, and you detect a deadlock (ie, a loop of threads blocking on locks held by each other), you *must* be able to tell one of them to "ramp off" to holding no locks at all. ie, ROLLBACK :). The bugger is, winding back execution state automatically to when you last started acquiring locks is probably an intractable problem from an interpreter's perspective... Sounds like a job for an exception to me ;-). for (1..50) { eval { func_that_acquires_first_lock($args); }; last unless $@ and $@ =~ m/mutex deadlock/i; } -- Sam Vilain, [EMAIL PROTECTED] When I sell liquor, its called bootlegging; when my patrons serve it on Lake Shore Drive, its called hospitality. AL CAPONE
RE: Start of thread proposal
Dan Sugalski wrote: > For a copying collector to work, all the mutators must be blocked, > and arguably all readers should be blocked as well. True of non-moving collectors, too. Or, let me put it this way: non- copying *GC* (the sweep or copy phase) can be threadsafe, but the mark phase is never threadsafe. The method in which marking is not threadsafe is a bit more pathological (i.e., it's not the common case as it is with the copying collector), but a standard tracing DOD cannot be correct when competeting with mutators. It WILL collect non- garbage (those are MUTATORS, remember), and the result WILL be Heizenbugs and crashes. Some of what I've written up addresses why. It's pretty simple to demonstrate a single case to prove the point, but I don't feel like re-creating the ASCII art right now. :) I'll send that section when I get out of the office. parrot will have to be able to suspend all threads in the environment. Unfortunate, but really quite unavoidable. > Oh, arguably it can't avoid deadlock at all, what with vtable methods > having access to the full environment. I can live with deadlocks, > only because there's no real alternative. But PMC implementations have to fall inside of the trusted environment, so that's not really a failure. Of course uncooperative code can break a cooperative algorithm. :) -- Gordon Henriksen IT Manager ICLUBcentral Inc. [EMAIL PROTECTED]
RE: Start of thread proposal
At 5:32 PM -0500 1/19/04, Gordon Henriksen wrote: Dan Sugalski wrote: [A] copying collector is generally untenable in a threaded environment. Can you elaborate upon the reasoning behind this statement? Sure. For a copying collector to work, all the mutators must be blocked, and arguably all readers should be blocked as well. (I think they need to be, otherwise you may be accessing reclaimed and reused memory) That means if we keep a copying collector we need to have everything that accesses strings or pmcs to get a lock on the GC before every access of every string or PMC. A touch excessive, I think. :) > The first thing that any vtable function of a shared PMC must do is to aquire the mutex of the PMCs in its parameter list, in ascending address order. When the mutexes are released they are not required to be released in any order. Note that this locking strategy cannot avoid deadlock if the user is allowed to acquire these locks--HLL locks must be altogether different beasts from automatic PMC locks. That's okay. Just a design consequence worth noting for everyone. Oh, arguably it can't avoid deadlock at all, what with vtable methods having access to the full environment. I can live with deadlocks, only because there's no real alternative. -- Dan --"it's like this"--- Dan Sugalski even samurai [EMAIL PROTECTED] have teddy bears and even teddy bears get drunk
RE: Start of thread proposal
Dan Sugalski wrote: > [A] copying collector is generally untenable in a threaded environment. Can you elaborate upon the reasoning behind this statement? > The first thing that any vtable function of a shared PMC must do is to > aquire the mutex of the PMCs in its parameter list, in ascending > address order. When the mutexes are released they are not required to > be released in any order. Note that this locking strategy cannot avoid deadlock if the user is allowed to acquire these locks--HLL locks must be altogether different beasts from automatic PMC locks. That's okay. Just a design consequence worth noting for everyone. -- Gordon Henriksen IT Manager ICLUBcentral Inc. [EMAIL PROTECTED]
Re: Start of thread proposal
At 2:44 PM -0700 1/19/04, Luke Palmer wrote: Dan's thread proposal mentions: =item Automatic PMC sharing will be provided When a PMC is placed into a container which is shared (including lexical pads and global namespaces) then that PMC will automatically be marked as shared. It is acceptable for this to trigger an exception if for some reason a PMC should not be shared between interpreters. PMCs are, by default, not shared. This avoids sharing overhead for PMCs which are only used as temporaries and not shared. (Note that this is dangerous, and may end up not being done, due to the sharing of continuations) I don't know why this is dangerous. A continuation is a data structure just like an array or a hash. When you share it, everything "inside" it gets shared, too. For a continuation, this could be an awful lot of stuff, but it's the safe way. True, but the danger part is if we don't mark everything grabbed by a continuation as shared by default. If we do, we might as well mark everything as shared, as there'll be less overhead. -- Dan --"it's like this"--- Dan Sugalski even samurai [EMAIL PROTECTED] have teddy bears and even teddy bears get drunk
Re: Start of thread proposal
At 4:37 PM -0500 1/19/04, Uri Guttman wrote: > "DS" == Dan Sugalski <[EMAIL PROTECTED]> writes: DS> =item All shared PMCs must have a threadsafe vtable DS> The first thing that any vtable function of a shared PMC must do is to DS> aquire the mutex of the PMCs in its parameter list, in ascending DS> address order. When the mutexes are released they are not required to DS> be released in any order. why the ascending address order to grab mutexes? is this to help solve deadlocks? Yes. The recommendation I've always seen for deadlock avoidance is to always have all your code grab its mutexes in some fixed order. With source, it's generally recommended that you grab mutex variables in lexical order. Since we're all binary we need a different order, and ascending addresses are reasonably simple to do. -- Dan --"it's like this"--- Dan Sugalski even samurai [EMAIL PROTECTED] have teddy bears and even teddy bears get drunk
Re: Start of thread proposal
Dan's thread proposal mentions: > =item Automatic PMC sharing will be provided > > When a PMC is placed into a container which is shared (including > lexical pads and global namespaces) then that PMC will automatically > be marked as shared. It is acceptable for this to trigger an > exception if for some reason a PMC should not be shared between > interpreters. > > PMCs are, by default, not shared. This avoids sharing overhead for > PMCs which are only used as temporaries and not shared. (Note that > this is dangerous, and may end up not being done, due to the sharing > of continuations) I don't know why this is dangerous. A continuation is a data structure just like an array or a hash. When you share it, everything "inside" it gets shared, too. For a continuation, this could be an awful lot of stuff, but it's the safe way. Luke
Re: Start of thread proposal
> "DS" == Dan Sugalski <[EMAIL PROTECTED]> writes: DS> =item All shared PMCs must have a threadsafe vtable DS> The first thing that any vtable function of a shared PMC must do is to DS> aquire the mutex of the PMCs in its parameter list, in ascending DS> address order. When the mutexes are released they are not required to DS> be released in any order. why the ascending address order to grab mutexes? is this to help solve deadlocks? uri -- Uri Guttman -- [EMAIL PROTECTED] http://www.stemsystems.com --Perl Consulting, Stem Development, Systems Architecture, Design and Coding- Search or Offer Perl Jobs http://jobs.perl.org
Re: Start of thread proposal
At 3:16 PM -0500 1/19/04, Dan Sugalski wrote: I've not gotten into the technical bits yet. That's next, but rip this apart first. First rip -- I left a section unfinished. It should be: =item "Process-type" threads The scheme must also support the perl 5 "iThreads" threading model. In this model no data is shared implicitly, and all sharing must be done on purpose and explicitly. It very much resembles the Unix fork-process-with-shared-memory-segment model, not a surprise as it was originally developed with emulation of Unix's fork system in mind. -- Dan --"it's like this"--- Dan Sugalski even samurai [EMAIL PROTECTED] have teddy bears and even teddy bears get drunk
Start of thread proposal
I've not gotten into the technical bits yet. That's next, but rip this apart first. =head1 DEFINITIONS So we can all talk about things the same way, the following definitons apply. Some of these are drawn from the POSIX thread spec, and as such we should have a translation section at the end. =over 4 =item THREAD An OS level thread. If that makes no sense, neither will any of the rest of this, in which case I recommend picking up "Programming with POSIX Threads" by Dave Butenhof, and coming back when you have. =item MUTEX This is a low level, under the hood, not exposed to users, thing that can be locked. They're non-recursive, non-read/write, exclusive things. When a thread gets a mutex, any other attempt to get that mutex will block until the owning thread releases the mutex. The platform-native lock construct will be used for this. =item LOCK This is an exposed-to-HLL-code thing that can be locked. Only PMCs can be locked, and the lock may or may not be recursive or read/write. =item CONDITION VARIABLE The "sleep until something pings me" construct. Useful for queue construction. Not conceptually associated with a MUTEX. (POSIX threads require this, so we're going to hide it there behind macros and/or functions) =item RENDEZVOUS POINT A HLL version of a condition variable. =item INTERPRETER Those bits of the Parrot_Interp structure that are absolutely required to be thread-specific. This includes the current register sets and stack pointers, as well as security context information. Basically if a continuation captures it, it's the interpreter. =item INTERPRETER ENVIRONMENT Those bits of the Parrot_Interp structure that aren't required to be thread-specific (though I'm not sure there are any) I anything pointed to that doesn't have to be thread-specific. The environment includes the global namespaces, pads, stack chunks, memory allocation regions, arenas, and whatnots. Just because the pointer to the current pad is thread-specific doesn't mean the pad I has to be. It can be shared. =item INDEPENDENT THREAD A thread that has no contact I with the internal data of any other thread in the current process. Independent threads need no synchronization for anything other than what few global things we have. And the fewer the better, though alas we can't have none at all. Note that independent threads may still communicate back and forth by passing either atomic things (ints, floats, and pointers) or static buffers that can become the property of the destination thread. =item SHARED THREAD A thread that's part of a group of threads sharing a common interpreter environment. =back =head1 REQUIREMENTS =head2 Supported Models =over 4 =item POSIX threads The threading scheme must be sufficient to support a POSIX share-everything style of threading, such as is used in perl 5's "pthread" model, as well as the thread models for Ruby and Python. =item "Process-type' threads The scheme must also support the perl 5 "iThreads" threading model. In this model no data is shared implicitly, and all sharing must be done on purpose and explicitly. It very much resembles the Unix fork-process-with-shared-memory-segment model, not a surprise as it was originally developed with =back =head2 Guarantees =over 4 =item No Crashes The interpreter guarantees that no user program errors of any sort will crash the interpreter. This includes threading problems. As such, synchronization issues (where multiple interpreters are accessing the same shared data) must not crash the interpreter or corrupt its internal state. =back =head2 Assumptions =over 4 =item System memory allocation routines are threadsafe We are assuming that the memory allocation system of the base OS is threadsafe. While some of the C runtime libraries are notoriously thread dangerous, memory allocation code generally is threadsafe, and we'll assume that on all platforms. (Though we will, in general, manage our own memory) =back =head1 Proposal The proposal is as follows: =over 4 =item All global state shall be protected by mutexes Straightforward enough. This allows for independent threads to coexist without threatening the state of the proces. =item Multiple independent interpreters will be allowed Once again, straightforward enough. With threadsafe protected global state, there's no issue here. =item Only one OS thread in an interpreter at once While there is no requirement that any interpreter be tied to an underlying OS thread, under no circumstances may multiple OS threads use a single interpreter simultaneously. =item A Stop-and-copy communication method will be provided Parrot will provide a function to make a call into another interpreter and wait for that call to complete. This call may pass in data and have data returned to it. The interpreter making the call will block until the call is complete. The data passed in as parameters will be copied into the called interpreter, and any return values will be copied back into the calli