Re: Start of thread proposal

2004-01-30 Thread Leopold Toetsch
Dan Sugalski <[EMAIL PROTECTED]> wrote:
> At 3:08 PM +0100 1/24/04, Leopold Toetsch wrote:

>>Fianlizers and incremental DOD don't play together. The DOD must run to
>>end to be sure, that the objects isn't referenced any more.

> Finalizers and incremental DOD work just fine together. At some point
> the incremental DOD will figure out that something's dead, just as
> the stop-the-world DOD will. It just may take a bit longer.

I wanted to say: "Finalizers & destructors of PMCs that need timely
destruction ...". In the case of dead objects at scope exit.

leo


Re: Start of thread proposal

2004-01-30 Thread Dan Sugalski
At 3:08 PM +0100 1/24/04, Leopold Toetsch wrote:
Dan Sugalski <[EMAIL PROTECTED]> wrote:
 At 5:56 PM +0100 1/23/04, Leopold Toetsch wrote:

Stopping all interpreters seems to be cheaper. The rwlock will sooner or
later stop all interpreters anyway (on first PMC access), so we can omit
the price for the rwlock and just hold the world(s).

 The rwlock only stops all the interpreters when the DOD runs.
Sure. But that would still need to aquire a readers rwlock for each PMC
access. This is more expensive then a mutex. During DOD any PMC access
will halt the interpreter, so we can do that explicitely too and save
the cost for the rwlock overhead. Albeit I can imagine, that aggregates
will need a rwlock anyway.
Well... only the mutating vtable entries need to get the lock, so 
that reduces the expense somewhat. Still, I agree, it may be 
untenably expensive.

[ incremental GC ]

 If we have the facilities to do incremental DOD runs then this is
 definitely a possibility except for finalizers. Finalizers make
 things interesting, though if the background thread doing the DOD is
 a member of the interpreter pool then it'd work out OK.
Fianlizers and incremental DOD don't play together. The DOD must run to
end to be sure, that the objects isn't referenced any more.
Finalizers and incremental DOD work just fine together. At some point 
the incremental DOD will figure out that something's dead, just as 
the stop-the-world DOD will. It just may take a bit longer.
--
Dan

--"it's like this"---
Dan Sugalski  even samurai
[EMAIL PROTECTED] have teddy bears and even
  teddy bears get drunk


Re: Start of thread proposal

2004-01-24 Thread Leopold Toetsch
Dan Sugalski <[EMAIL PROTECTED]> wrote:
> At 5:56 PM +0100 1/23/04, Leopold Toetsch wrote:

>>Stopping all interpreters seems to be cheaper. The rwlock will sooner or
>>later stop all interpreters anyway (on first PMC access), so we can omit
>>the price for the rwlock and just hold the world(s).

> The rwlock only stops all the interpreters when the DOD runs.

Sure. But that would still need to aquire a readers rwlock for each PMC
access. This is more expensive then a mutex. During DOD any PMC access
will halt the interpreter, so we can do that explicitely too and save
the cost for the rwlock overhead. Albeit I can imagine, that aggregates
will need a rwlock anyway.

[ incremental GC ]

> If we have the facilities to do incremental DOD runs then this is
> definitely a possibility except for finalizers. Finalizers make
> things interesting, though if the background thread doing the DOD is
> a member of the interpreter pool then it'd work out OK.

Fianlizers and incremental DOD don't play together. The DOD must run to
end to be sure, that the objects isn't referenced any more.

leo


Re: Start of thread proposal

2004-01-23 Thread Gordon Henriksen
On Friday, January 23, 2004, at 11:09 , Dan Sugalski wrote:

Ah, OK, I see. The problem comes in where we've got an object in the 
transient root set (basically the processor stack and registers) that 
gets anchored into the base root set (stash, pads, or whatever) after 
the DOD has traced where it's going into and falls out of the transient 
root set before the DOD traces over to it.
(Worse than that. It could come from any untraced location—or possibly 
even be brand new, depending upon memory allocation details.)

—

Gordon Henriksen
[EMAIL PROTECTED]


Re: Start of thread proposal

2004-01-23 Thread Dan Sugalski
At 5:56 PM +0100 1/23/04, Leopold Toetsch wrote:
Dan Sugalski <[EMAIL PROTECTED]> wrote:
 At 12:04 AM -0500 1/20/04, Gordon Henriksen wrote:
C was not marked reachable (although it was) and was thus
erroneously collected, leaving a dangling pointer. This problem
applies equally to copying and mark-sweep collectors.

 Ah, OK, I see.
That is well knowm in the literature as "Tri-Color Invariant": Black are
the already marked (live) PMCs, grey the PMCs on the next_for_GC list,
white the not yet reached PMCs. The strong tri-color invariants states
that no black object may point to a white object, the weak invariant
states, that at least one path from the black to a white object must
contain a grey one.
This can be handled by either stop the world GCs or by intercepting each
read or write access that would change the color of an object and update
the color accordingly. This is e.g. used for incremental GC. As soon as
we have a thread in the background that runs GC, we have to cope with
these issues.
Yeah, point. And since we want to be able to have an incremental DOD 
at some point we need to get support for it in now.

 > ... or have some way to force a
 low-overhead rendezvous.

 Okay, we're going to mandate that we have read/write locks, each
 interpreter pool has one, and mutating vtable entries must get a read
 lock on the pool read/write lock. Pricey (ick) but isolated to
 mutators. The DOD gets a write lock on it, which'll block all
 read/write access so no mutators can be in process while the pool DOD
 runs.
Stopping all interpreters seems to be cheaper. The rwlock will sooner or
later stop all interpreters anyway (on first PMC access), so we can omit
the price for the rwlock and just hold the world(s).
The rwlock only stops all the interpreters when the DOD runs. 
Anything that mutates a PMC gets a *read* lock so that they don't 
interfere with each other, and only pause if the DOD is running. The 
DOD getting a *write* lock will block any read lock attempts, so when 
the DOD is running no mutation can take place. Since mutation doesn't 
require any global exclusion it doesn't need a write lock -- the read 
lock is sufficient.

An alternative would be real background incremental GC, *when* running
multiple threads. I estimate the overhead to be in regions of a rwlock
(with no contention of course).
If we have the facilities to do incremental DOD runs then this is 
definitely a possibility except for finalizers. Finalizers make 
things interesting, though if the background thread doing the DOD is 
a member of the interpreter pool then it'd work out OK.
--
Dan

--"it's like this"---
Dan Sugalski  even samurai
[EMAIL PROTECTED] have teddy bears and even
  teddy bears get drunk


Re: Start of thread proposal

2004-01-23 Thread Leopold Toetsch
Dan Sugalski <[EMAIL PROTECTED]> wrote:
> At 12:04 AM -0500 1/20/04, Gordon Henriksen wrote:
>>C was not marked reachable (although it was) and was thus
>>erroneously collected, leaving a dangling pointer. This problem
>>applies equally to copying and mark-sweep collectors.

> Ah, OK, I see.

That is well knowm in the literature as "Tri-Color Invariant": Black are
the already marked (live) PMCs, grey the PMCs on the next_for_GC list,
white the not yet reached PMCs. The strong tri-color invariants states
that no black object may point to a white object, the weak invariant
states, that at least one path from the black to a white object must
contain a grey one.

This can be handled by either stop the world GCs or by intercepting each
read or write access that would change the color of an object and update
the color accordingly. This is e.g. used for incremental GC. As soon as
we have a thread in the background that runs GC, we have to cope with
these issues.

> That means we're going to have to have either a really forgiving DOD
> system that takes multiple passes before it collects up a PMC or
> buffer (which still isn't safe)

Alas not an alternative, it doesn't work.

> ... or have some way to force a
> low-overhead rendezvous.

> Okay, we're going to mandate that we have read/write locks, each
> interpreter pool has one, and mutating vtable entries must get a read
> lock on the pool read/write lock. Pricey (ick) but isolated to
> mutators. The DOD gets a write lock on it, which'll block all
> read/write access so no mutators can be in process while the pool DOD
> runs.

Stopping all interpreters seems to be cheaper. The rwlock will sooner or
later stop all interpreters anyway (on first PMC access), so we can omit
the price for the rwlock and just hold the world(s).

An alternative would be real background incremental GC, *when* running
multiple threads. I estimate the overhead to be in regions of a rwlock
(with no contention of course).

leo


Re: Start of thread proposal

2004-01-23 Thread Dan Sugalski
At 12:04 AM -0500 1/20/04, Gordon Henriksen wrote:
On Monday, January 19, 2004, at 06:37 , Gordon Henriksen wrote:

Dan Sugalski wrote:

For a copying collector to work, all the mutators must be blocked,
and arguably all readers should be blocked as well.
True of non-moving collectors, too. [...]

Some of what I've written up addresses why. [...] I'll send that 
section when I get out of the office.
Consider this simple object graph:

C was not marked reachable (although it was) and was thus 
erroneously collected, leaving a dangling pointer. This problem 
applies equally to copying and mark-sweep collectors.
Ah, OK, I see. The problem comes in where we've got an object in the 
transient root set (basically the processor stack and registers) that 
gets anchored into the base root set (stash, pads, or whatever) after 
the DOD has traced where it's going into and falls out of the 
transient root set before the DOD traces over to it.

Race condition. Dammit.

Okay, I'd not wrapped my brain around that possibility, which will 
make for some interesting DOD tracing, especially on SMP systems. I 
was *really* hoping a single lock on the arena allocation system that 
the DOD held onto while tracing would be sufficient, but I see that 
it isn't.

That means we're going to have to have either a really forgiving DOD 
system that takes multiple passes before it collects up a PMC or 
buffer (which still isn't safe) or have some way to force a 
low-overhead rendezvous.

The obvious rendezvous point is the arena lock, but that's going to 
see a lot of contention anyway, and we'd as soon not have a single 
one for speed reasons. Feh.

Okay, we're going to mandate that we have read/write locks, each 
interpreter pool has one, and mutating vtable entries must get a read 
lock on the pool read/write lock. Pricey (ick) but isolated to 
mutators. The DOD gets a write lock on it, which'll block all 
read/write access so no mutators can be in process while the pool DOD 
runs.

I think that'll work. The .1j thread spec requires r/w locks, and we 
can fake them on platforms that don't implement them. Hopefully 
Nigel's got the Windows scoop so we can see if Win32 has anything 
like this (which I'd expect it does)
--
Dan

--"it's like this"---
Dan Sugalski  even samurai
[EMAIL PROTECTED] have teddy bears and even
  teddy bears get drunk


Re: Start of thread proposal

2004-01-23 Thread Dan Sugalski
At 10:40 PM +0100 1/22/04, Leopold Toetsch wrote:
Dan Sugalski <[EMAIL PROTECTED]> wrote:

 The only tricky bit comes in with the examination of the root set of
 other threads--accessing the hardware register contents of another
 running thread may be... interesting. (Which, I suppose, argues for
 some sort of volatile marking of the temp variables)
You'll provide the "interesting" part, that is:

use Psi::Estimate::CPU_Register_Changes_in_Future_till_mark_is_done;
Nah, no need for that one. I need to go back and recheck the stuff 
that Gordon posted in case I missed something, but if you put a lock 
on the arena allocator this isn't an issue.
--
Dan

--"it's like this"---
Dan Sugalski  even samurai
[EMAIL PROTECTED] have teddy bears and even
  teddy bears get drunk


Re: Start of thread proposal

2004-01-22 Thread Leopold Toetsch
Dan Sugalski <[EMAIL PROTECTED]> wrote:

> The only tricky bit comes in with the examination of the root set of
> other threads--accessing the hardware register contents of another
> running thread may be... interesting. (Which, I suppose, argues for
> some sort of volatile marking of the temp variables)

You'll provide the "interesting" part, that is:

use Psi::Estimate::CPU_Register_Changes_in_Future_till_mark_is_done;

SCNR, leo


Re: Start of thread proposal

2004-01-22 Thread Dan Sugalski
At 4:59 PM -0800 1/19/04, Dave Whipp wrote:
"Dan Sugalski" <[EMAIL PROTECTED]> wrote:
 =head2 Guarantees
Maybe this isn't strictly a threading thing, but are we going to make any
guarantees about event orderings? For example, will we guarantee that a
sequence of events send from one thread to another will always be received
in the order they are sent?
Hrm. I suppose we really ought to, so we will. If we prioritize 
events (and I'm still torn) then they'll be in priority then send 
order.
--
Dan

--"it's like this"---
Dan Sugalski  even samurai
[EMAIL PROTECTED] have teddy bears and even
  teddy bears get drunk


RE: Start of thread proposal

2004-01-22 Thread Dan Sugalski
[Note to everyone -- I'm digging through my mail so be prepared for a 
potential set of responses to things that're already answered...]
At 6:37 PM -0500 1/19/04, Gordon Henriksen wrote:
Dan Sugalski wrote:

 For a copying collector to work, all the mutators must be blocked,
 and arguably all readers should be blocked as well.
True of non-moving collectors, too. Or, let me put it this way: non-
copying *GC* (the sweep or copy phase) can be threadsafe, but the mark
phase is never threadsafe. The method in which marking is not
threadsafe is a bit more pathological (i.e., it's not the common case
as it is with the copying collector), but a standard tracing DOD
cannot be correct when competeting with mutators. It WILL collect non-
garbage (those are MUTATORS, remember), and the result WILL be
Heizenbugs and crashes.
Some of what I've written up addresses why. It's pretty simple to
demonstrate a single case to prove the point, but I don't feel like
re-creating the ASCII art right now. :) I'll send that section when I
get out of the office.
parrot will have to be able to suspend all threads in the environment.
Unfortunate, but really quite unavoidable.
I'm not sure that's the case. What we need to do is suspend metadata 
mutation--that is, buffers can't be resized while a gc run is in 
progress. Other than that, if we have guarantees of aligned atomic 
pointer writes (so we don't have word shearing to deal with) we don't 
have to stop the mutation of the contents of the buffers themselves.

The only tricky bit comes in with the examination of the root set of 
other threads--accessing the hardware register contents of another 
running thread may be... interesting. (Which, I suppose, argues for 
some sort of volatile marking of the temp variables)
--
Dan

--"it's like this"---
Dan Sugalski  even samurai
[EMAIL PROTECTED] have teddy bears and even
  teddy bears get drunk


Re: Start of thread proposal

2004-01-22 Thread Dan Sugalski
At 12:15 AM +0100 1/22/04, Leopold Toetsch wrote:
Dan Sugalski <[EMAIL PROTECTED]> wrote:

[ No Guarantees WRT data access }

... seems to indicate that even whole ops like add P,P,P are atomic.

 Yep. They have to be, because they need to guarantee the integrity of
 the pmc structures and the data hanging off them (which includes
 buffer and string stuff)
But isn't that a contradiction? Or better: When even an opcode like
above is atomic, that an access to a shared PerlNum should be guaranteed
being atomic too.
Sure, but there's a *lot* more to user data integrity than atomic 
access to individual pieces. That's the least of the problem. The 
user data issue is one where you have multiple pieces being updated, 
or one piece being updated multiple times--that's the stuff we're not 
guaranteeing.

So, while we will make sure that storing a single value into a hash 
happens atomically, we won't guarantee that a series of stores into 
the hash, or a combination of loads and stores, or even a combination 
of reads and writes to a scalar, will happen atomically.
--
Dan

--"it's like this"---
Dan Sugalski  even samurai
[EMAIL PROTECTED] have teddy bears and even
  teddy bears get drunk


Re: Start of thread proposal

2004-01-21 Thread Damien Neil
On Wed, Jan 21, 2004 at 01:14:46PM -0500, Dan Sugalski wrote:
> >... seems to indicate that even whole ops like add P,P,P are atomic.
> 
> Yep. They have to be, because they need to guarantee the integrity of 
> the pmc structures and the data hanging off them (which includes 
> buffer and string stuff)

Personally, I think it would be better to use corruption-resistant
buffer and string structures, and avoid locking during basic data
access.  While there are substantial differences in VM design--PMCs
are much more complicated than any JVM data type--the JVM does provide
a good example that this can be done, and done efficiently.

Failing this, it would be worth investigating what the real-world
performance difference is between acquiring multiple locks per VM
operation (current Parrot proposal) vs. having a single lock
controlling all data access (Python) or jettisoning OS threads
entirely in favor of VM-level threading (Ruby).  This forfeits the
ability to take advantage of multiple CPUs--but Leopold's initial
timing tests of shared PMCs were showing a potential 3-5x slowdown
from excessive locking.

I've seen software before that was redesigned to take advantage of
multiple CPUs--and then required no less than four CPUs to match
the performance of the older, single-CPU version.  The problem was
largely attributed to excessive locking of mostly-uncontested data
structures.

- Damien


Re: Start of thread proposal

2004-01-21 Thread Leopold Toetsch
Dan Sugalski <[EMAIL PROTECTED]> wrote:

[ No Guarantees WRT data access }

>>... seems to indicate that even whole ops like add P,P,P are atomic.

> Yep. They have to be, because they need to guarantee the integrity of
> the pmc structures and the data hanging off them (which includes
> buffer and string stuff)

But isn't that a contradiction? Or better: When even an opcode like
above is atomic, that an access to a shared PerlNum should be guaranteed
being atomic too.

leo


Re: Start of thread proposal

2004-01-21 Thread Dan Sugalski
At 10:25 AM +0100 1/21/04, Leopold Toetsch wrote:
Dan Sugalski <[EMAIL PROTECTED]> wrote:

One more question:

=head2 Guarantees
... doesn't have anything about user data integrity. So when 2 threads
access a PerlNum, they could get a mixture of the typically 2 involved
words.
Potentially, yeah, though it's really unlikely.

But:

=item All shared PMCs must have a threadsafe vtable

 The first thing that any vtable function of a shared PMC must do is to
 aquire the mutex of the PMCs in its parameter list
... seems to indicate that even whole ops like add P,P,P are atomic.
Yep. They have to be, because they need to guarantee the integrity of 
the pmc structures and the data hanging off them (which includes 
buffer and string stuff)

And how does user level locking play together with that?
I've not decided -- That was something I thought we might hash out as 
we abused the first half of the design doc. Personally I'm all for 
the user and low-level lock being the same thing, but that doesn't 
have to be the case. There are advantages and disadvantages to either 
way of doing things.
--
Dan

--"it's like this"---
Dan Sugalski  even samurai
[EMAIL PROTECTED] have teddy bears and even
  teddy bears get drunk


Re: Start of thread proposal

2004-01-21 Thread nigelsandever
21/01/2004 02:12:09, Gordon Henriksen <[EMAIL PROTECTED]> wrote:

> This is false. The mark phase will still need to run over the entire 
> process, else it cannot detect all references into the pool.
>

If by reference, you mean address, then that is true.

If when a reference is taken, the address of the referent is
stored in arbitrary other data structures, then all memory 
be scanned by the GC,

However, if references were not addresses, but an index into a 
table of addresses, then only the table need be scanned. 

When a reference, or the thing holding it, is deleted, then 
the indexed slot in the address table is blanked and subsequent
GC passes looking for references to a referent will no longer 
find the reference in the table. 

With the addition of a reference count to the table, all references
to a single entity can share the same index, but now I'm groping
back in the direction of reference counting, which is not flavour
of the month:)

Nigel.




Re: Start of thread proposal

2004-01-21 Thread Leopold Toetsch
Dan Sugalski <[EMAIL PROTECTED]> wrote:

One more question:

>=head2 Guarantees

... doesn't have anything about user data integrity. So when 2 threads
access a PerlNum, they could get a mixture of the typically 2 involved
words.

But:

>=item All shared PMCs must have a threadsafe vtable

> The first thing that any vtable function of a shared PMC must do is to
> aquire the mutex of the PMCs in its parameter list

... seems to indicate that even whole ops like add P,P,P are atomic.

And how does user level locking play together with that?

leo


Re: Start of thread proposal

2004-01-21 Thread Leopold Toetsch
[EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote:

> I believe that parrot already has the concept of memory pools in it's
> memory management. The idea is that by allocating similarly sized objects

The problem is not in the fixed sized header pools, its with the
*memory* pool used e.g for string memory.

During GC we are walking the *header* pools, and if we find its buffer
memory being used, we move that buffer memory to a new store, thereby
compacting it. The old memory store(s) are freed then.

So string memory can move around beyond your code.

> Nigel/

leo


Re: Start of thread proposal

2004-01-20 Thread nigelsandever
20/01/2004 13:29:35, Gordon Henriksen <[EMAIL PROTECTED]> wrote:

>On Monday, January 19, 2004, at 07:58 , [EMAIL PROTECTED] 
>wrote:
>
>> Is there any likelyhood that memory allocation will be hidden behind 
>> macros at two levels:
>>  - ALLOCPOOL() for allocating large chunks of memory (ppols) that are
>>later sub-allocated (and managed) by:
>>
>>  - SUBALLOC() for sub allocating within a pool
>
>Are you wanting something akin to Apache 2 pools, which are hierarchical 
>and designed to reduce path length when freeing blocks of objects? For 
>instance, all objects and sub-pools allocated during an HTTP request 
>cycle can be deallocated just by free()'ing the top-level request pool.
>
Nothing to do with Apache memory pools.

I believe that parrot already has the concept of memory pools in it's
memory management. The idea is that by allocating similarly sized objects 
from separate (large) allocations, you can reduce the fragmentation of 
chunks and reduce the incidences where the memory need to be GC'd and 
compacted. 

Allocating an 8 byte chunk from a common memory pool is quite likely to 
nip a little off from a previously freed large chunk. When it comes time 
reallocate another chunk the same size as that large, freed chunk, although 
there is enough room in the over all freespace chain to accommodate it, the 
largest available chunk is now 8 bytes or so too small for the requirement.

That induces either a compaction cycle or the need to extend the memory pool 
by the size of the large request.

Allocating all small requests from the same pool, and large from another 
pool means that you are less likely to fragment the memory and more likely 
to be able to re-use an existing slot in the free-space chain for any
given request.

If the allocation of pools, and the allocation of bit-of-a-pool, are 
macroised, it makes it possible for OS's where there are multiple APIs
for memory allocation to bypass the CRT memory allocation routines and
use which ever native APis are best suited for the type of allocation.

Personally, I would like to see memory allocation for each class type
be managed by the class constructor itself. This would theoretically allow
each class that has a fixed instance size to manage it's own pool on OS's
where that makes sense. The class would allocate a pool for itself when 
loaded and then allocate instances from that pool on new() and deallocate 
upon DESTROY. If it's memory pool was exhausted when new was called,
it would invoke the GC on *it's pool only*. 

This separation would mean that each run of the GC would have a much smaller
pool of memory to compact and garbage collect when it was invoked. It would 
also be less likely to be called, as each allocation from a pool of fixed 
sized sub allocations will only ever need to call the GC when it's pool is
entirely exhausted. 

But that is a radical departure :), so if would just like to see separate calls
for pool allocation/reallocation and element allocation/reallocation, rather 
than having calls to malloc() scattered through out the codebase.

>
>I don't think parrot could make use of that model, since it can't very 
>well guarantee that the user cannot retain references past the lifetime 
>of the pool. Apache trusts modules not to make such errors; parrot can't 
>trust the byte-code it's executing any further than it can throw it. A 
>generational collector is a more likely means by which parrot might 
>reduce memory-related overhead.
>
>
>
>Gordon Henriksen
>[EMAIL PROTECTED]
>
>

Nigel/





Re: Start of thread proposal

2004-01-20 Thread Gordon Henriksen
On Tuesday, January 20, 2004, at 07:56 , [EMAIL PROTECTED] 
wrote:

I believe that parrot already has the concept of memory pools in it's 
memory management. The idea is that by allocating similarly sized 
objects from separate (large) allocations, you can reduce the 
fragmentation of chunks and reduce the incidences where the memory need 
to be GC'd and compacted.
Don't presume that. General purpose memory allocators tend to handle 
this scenario... generically. Use of custom allocators is usually a 
premature optimization; the best generic memory allocators are very, 
very good.

But the ultimate effect of the overuse of custom allocators or pools is 
to move memory fragmentation to a higher level, preventing the generic 
allocator from addressing it even when the memory allocation patterns 
would otherwise allow it.

If the allocation of pools, and the allocation of bit-of-a-pool, are 
macroised, it makes it possible for OS's where there are multiple APIs 
for memory allocation to bypass the CRT memory allocation routines and 
use which ever native APis are best suited for the type of allocation.
Would much rather see them abstracted, as they are now, beneath an API. 
This reduces the number of ways in which compiled parrot extensions can 
be incompatible with the parrot core.

Personally, I would like to see memory allocation for each class type 
be managed by the class constructor itself. This would theoretically 
allow each class that has a fixed instance size to manage it's own pool 
on OS's where that makes sense. The class would allocate a pool for 
itself when loaded and then allocate instances from that pool on new() 
and deallocate upon DESTROY. If it's memory pool was exhausted when new 
was called, it would invoke the GC on *it's pool only*.
A PMC class is free to do precisely that. Only the PMC header cannot be 
thus managed, and that's already pooled.

This separation would mean that each run of the GC would have a much 
smaller pool of memory to compact and garbage collect when it was 
invoked.
This is false. The mark phase will still need to run over the entire 
process, else it cannot detect all references into the pool. (Unless you 
can provide a type-safety guarantee that your type can only be 
referenced by other instances of itself. In which case, your type is 
necessarily "garbage" and the best optimization strategy would be 
dead-code elimination. :)

I predict the best overall throughput would be with the sweep or copy 
phase run immediately after the mark phase, across the entire process: 
It would be wasteful to do all that marking and yet leave known garbage 
uncollected.

It would also be less likely to be called, as each allocation from a 
pool of fixed sized sub allocations will only ever need to call the GC 
when it's pool is entirely exhausted.
Statistically, multiple pools will individually exhaust themselves 
*MORE* frequently than a global pool. The ideal case for collection 
frequency is that there is only one pool, or that all pools rise to 
capacity concurrently. In these ideal cases, the segregated pools will 
exhaust themselves precisely *AS* frequently as a global pool. In any 
case, there is no possibility for a decrease in collection events 
through the use of pooled memory. As per above, collection events will 
also not become less expensive. Thus, expect garbage collection to have 
a negative net impact on performance if pooled memory is used.

But that is a radical departure :), so if would just like to see 
separate calls for pool allocation/reallocation and element 
allocation/reallocation, rather than having calls to malloc() scattered 
through out the codebase.
Uh. See memory.c and smallobject.c...

Digression:

Most copying collectors avoid free space fragmentation by design. 
(That's one of their benefits.) An allocator as might be employed by a 
standard compacting, generational system (Java, MS CLR), might resemble 
the following:

#define SYS_ALIGN sizeof(double) /* for instance */

struct generation {
volatile void *next;
void *top;
void *base;
}
struct generation *generation_init(size_t capacity) {
struct generation *g = (struct generation *) malloc(sizeof(struct 
generation));
g->next = g->base = malloc(capacity);
g->top = g->base + capacity;
}

void* generation_alloc(struct generation* p, size_t len) {
len = (len + SYS_ALIGN - 1) & ~(SYS_ALIGN - 1); /* round up */
do {
void *ptr = ATOMIC_ADD(&p->next, len);
if (ptr <= p->top) {
return ptr - len; /* break the loop */
}
gc();
} while (true); /* try again */
}
+ No free space fragmentation at all.
+ Super-duper fast.
+ Threadsafe.
+ Lock-free unless ATOMIC_ADD cannot be implemented without a mutex.
- There is copying overhead when the generation is exhausted.
- One could say that the generation is fragmented by garbage.
+ It is no more fragmented by garbage than a GC system which uses a 
freelist allocator.

—

Gordon He

Re: Start of thread proposal

2004-01-20 Thread Leopold Toetsch
Dan Sugalski <[EMAIL PROTECTED]> wrote:
> At 5:16 PM +0100 1/20/04, Leopold Toetsch wrote:
>>What about temporary PMCs (or strings)?

> Those won't get marked as shared unless we're unconditionally marking
> things as shared. (Though we may just give 'em a mutex anyway)

This needs either one check per PMC, if its shared or not, or additional
costs for locking temps. Both is rather expensive, compared to the raw
working functionality of a vtable.

>>  Evaluating a non-trivial
>>expression can have lot of these. Each new temp would need a lock on the
>>memory sub-system.

> Only on allocation.

Yes: "Each *new* temp .."

> ... We could have a per-thread freelist if we wanted.
> Wouldn't be unreasonable.

That already smells like separate memory sub-systems. The freelist has
to be filled first. During DOD runs, it has to be refilled. To achieve
some locality of the temp PMCs, you can't just give these to arbitrary
intpreters. Separate free lists seem also to imply separate DOD runs to
cleanup the temps.

>>i.e. typically 3 mutexes, it could be that the vtable of a shared PMC
>>should just grab one interpreter lock (of the owner of that PMC) and
>>that not all is shared (especially not the temps).

> Yeah, but since PMCs aren't owned by any one interpreter in the pool

the owner is for me the interpreter, that did allocate the arena.

> ... but are rather pool-wide you run into either the global-lock problem
> (which kills performance on SMP systems) or interpreters potentially
> deadlocking on unrelated data access.

Could be. But the performance is the overall throughput. When a lot of
fine grained locks (plus memory subsystem locks for temps) take much
longer then one global lock, then that scheme can nethertheless be
slower on SMP machines. It would scale better though for more CPUs.

Deadlocks shouldn't be a problem, if exactly one vtable (like in
SharedRef) just grabs one and only one lock.

leo


Re: Start of thread proposal

2004-01-20 Thread Dan Sugalski
At 5:16 PM +0100 1/20/04, Leopold Toetsch wrote:
Dan Sugalski <[EMAIL PROTECTED]> wrote:

=item Interpreter pools will share allocation pools

 All the interpreters in an interpreter pool will share header and
 memory allocation pools.
What about temporary PMCs (or strings)?
Those won't get marked as shared unless we're unconditionally marking 
things as shared. (Though we may just give 'em a mutex anyway)

 Evaluating a non-trivial
expression can have lot of these. Each new temp would need a lock on the
memory sub-system.
Only on allocation. We could have a per-thread freelist if we wanted. 
Wouldn't be unreasonable.

 Combined with the cost of:

=item All shared PMCs must have a threadsafe vtable

 The first thing that any vtable function of a shared PMC must do is to
 aquire the mutex of the PMCs in its parameter list
i.e. typically 3 mutexes, it could be that the vtable of a shared PMC
should just grab one interpreter lock (of the owner of that PMC) and
that not all is shared (especially not the temps).
Yeah, but since PMCs aren't owned by any one interpreter in the pool 
but are rather pool-wide you run into either the global-lock problem 
(which kills performance on SMP systems) or interpreters potentially 
deadlocking on unrelated data access.
--
Dan

--"it's like this"---
Dan Sugalski  even samurai
[EMAIL PROTECTED] have teddy bears and even
  teddy bears get drunk


Re: Start of thread proposal

2004-01-20 Thread Leopold Toetsch
Dan Sugalski <[EMAIL PROTECTED]> wrote:

>=item Interpreter pools will share allocation pools

> All the interpreters in an interpreter pool will share header and
> memory allocation pools.

What about temporary PMCs (or strings)? Evaluating a non-trivial
expression can have lot of these. Each new temp would need a lock on the
memory sub-system. Combined with the cost of:

>=item All shared PMCs must have a threadsafe vtable

> The first thing that any vtable function of a shared PMC must do is to
> aquire the mutex of the PMCs in its parameter list

i.e. typically 3 mutexes, it could be that the vtable of a shared PMC
should just grab one interpreter lock (of the owner of that PMC) and
that not all is shared (especially not the temps).

leo


Re: Start of thread proposal

2004-01-20 Thread Dan Sugalski
At 4:24 PM +0100 1/20/04, Leopold Toetsch wrote:
Dan Sugalski <[EMAIL PROTECTED]> wrote:
 At 1:46 PM +0100 1/20/04, Leopold Toetsch wrote:
The term "process id" is really misleading.

 Nope. It's the process ID. Nothing misleading there, unless you've
 done threading work under linux, since for a while it gave each
 thread a separate PID.
$ ps | grep [p]arrot
17472 pts/000:00:00 parrot
17473 pts/000:00:00 parrot
17474 pts/000:00:00 parrot
So the unless applies ;) This is a single parrot interpreter, with main,
thread-manager, and event-handler thread. 3 PIDs.
Right. Linux is, as I said, arguably broken here, but I don't want to 
argue that point. That's why I specified that the PID is the process 
id of whatever instantiated parrot -- in this case, no matter which 
thread you're in, when you ask for the pid from parrot you should get 
17472. (In this case, at least)

Parrot needs to grab the pid at global initialization time and store 
it away for later pid queries.
--
Dan

--"it's like this"---
Dan Sugalski  even samurai
[EMAIL PROTECTED] have teddy bears and even
  teddy bears get drunk


Re: Start of thread proposal

2004-01-20 Thread Elizabeth Mattijsen
At 16:24 +0100 1/20/04, Leopold Toetsch wrote:
Dan Sugalski <[EMAIL PROTECTED]> wrote:
 At 1:46 PM +0100 1/20/04, Leopold Toetsch wrote:
The term "process id" is really misleading.

 Nope. It's the process ID. Nothing misleading there, unless you've
 done threading work under linux, since for a while it gave each
 thread a separate PID.
$ ps | grep [p]arrot
17472 pts/000:00:00 parrot
17473 pts/000:00:00 parrot
17474 pts/000:00:00 parrot
So the unless applies ;) This is a single parrot interpreter, with main,
thread-manager, and event-handler thread. 3 PIDs.
This _all_ depends on which version of Linux you're using.  Older 
versions don''t do it that way, and newer versions don't do it either 
(the specific versions escape me at the moment, but I know RH9 does 
_not_ have pids for threads).

I know, because my Thread::Signal module depends on threads having 
pids.  But fewer and fewer Linux systems "support" it (and Linux was 
the only one who worked this way to begin with).

;-(

Liz


Re: Start of thread proposal

2004-01-20 Thread Leopold Toetsch
Dan Sugalski <[EMAIL PROTECTED]> wrote:
> At 1:46 PM +0100 1/20/04, Leopold Toetsch wrote:
>>The term "process id" is really misleading.

> Nope. It's the process ID. Nothing misleading there, unless you've
> done threading work under linux, since for a while it gave each
> thread a separate PID.

$ ps | grep [p]arrot
17472 pts/000:00:00 parrot
17473 pts/000:00:00 parrot
17474 pts/000:00:00 parrot

So the unless applies ;) This is a single parrot interpreter, with main,
thread-manager, and event-handler thread. 3 PIDs.

leo


Re: Start of thread proposal

2004-01-20 Thread Dan Sugalski
At 1:46 PM +0100 1/20/04, Leopold Toetsch wrote:
Dan Sugalski <[EMAIL PROTECTED]> wrote:

=item SHARED THREAD

 A thread that's part of a group of threads sharing a common
 interpreter environment.
I presume that this "group of threads" is one thread pool or
interpreter pool. Could you expand the definition to cover "pool".
Ah, damn, I needed to fix that up before sending it out. Should've 
been "SHARED INTERPRETER". A shared interpreter is one that's in an 
interpreter pool.

An interpreter pool is a set of interpreters that share common 
resources. They're essentially threads in the OS sense, and they have 
shared access to pretty much everything. The memory pools, arenas, 
and global namespace are shared--pretty much everything except what's 
in the interpreter structure itself.

 >=item Each interpreter show the same process id

The term "process id" is really misleading.
Nope. It's the process ID. Nothing misleading there, unless you've 
done threading work under linux, since for a while it gave each 
thread a separate PID.

 Again I presume, that one
pool is ment here:
 All the interpreters within a process will share a process ID.
Nope, again, that's really process. (Again, because of Linux' past 
bad behavior reporting prorcess IDs)
--
Dan

--"it's like this"---
Dan Sugalski  even samurai
[EMAIL PROTECTED] have teddy bears and even
  teddy bears get drunk


Re: Start of thread proposal

2004-01-20 Thread Gordon Henriksen
On Monday, January 19, 2004, at 07:58 , [EMAIL PROTECTED] 
wrote:

Is there any likelyhood that memory allocation will be hidden behind 
macros at two levels:
 - ALLOCPOOL() for allocating large chunks of memory (ppols) that are
   later sub-allocated (and managed) by:

 - SUBALLOC() for sub allocating within a pool
Are you wanting something akin to Apache 2 pools, which are hierarchical 
and designed to reduce path length when freeing blocks of objects? For 
instance, all objects and sub-pools allocated during an HTTP request 
cycle can be deallocated just by free()'ing the top-level request pool.

I don't think parrot could make use of that model, since it can't very 
well guarantee that the user cannot retain references past the lifetime 
of the pool. Apache trusts modules not to make such errors; parrot can't 
trust the byte-code it's executing any further than it can throw it. A 
generational collector is a more likely means by which parrot might 
reduce memory-related overhead.

—

Gordon Henriksen
[EMAIL PROTECTED]


Re: Start of thread proposal

2004-01-20 Thread Leopold Toetsch
Dan Sugalski <[EMAIL PROTECTED]> wrote:

>=item SHARED THREAD

> A thread that's part of a group of threads sharing a common
> interpreter environment.

I presume that this "group of threads" is one thread pool or
interpreter pool. Could you expand the definition to cover "pool".

>=item Each interpreter show the same process id

The term "process id" is really misleading. Again I presume, that one
pool is ment here:

> All the interpreters within a process will share a process ID.

I'd vote for keeping a process ID what it is and use "pool id" or such
here.

leo


Re: Start of thread proposal

2004-01-20 Thread nigelsandever
> =item MUTEX
> 
> This is a low level, under the hood, not exposed to users, thing that
> can be locked. They're non-recursive, non-read/write, exclusive
> things. When a thread gets a mutex, any other attempt to get that
> mutex will block until the owning thread releases the mutex. The
> platform-native lock construct will be used for this.
>
 
Will this be macroised to hide the platform native implementation from 
the main body of the code?

> 
> The "sleep until something pings me" construct. Useful for queue
> construction. Not conceptually associated with a MUTEX. (POSIX
> threads require this, so we're going to hide it there behind macros
> and/or functions)
>
Could you elaborate on the nature of what would constitute a "ping"?
>
> =item POSIX threads
> 
> The threading scheme must be sufficient to support a POSIX
> share-everything style of threading, such as is used in perl 5's
> "pthread" model, as well as the thread models for Ruby and Python.
> 

Thankyou:)

>
> =item "Process-type' threads
> 
> The scheme must also support the perl 5 "iThreads" threading
> model. In this model no data is shared implicitly, and all sharing
> must be done on purpose and explicitly. It very much resembles the
> Unix fork-process-with-shared-memory-segment model, not a surprise as
> it was originally developed with
>

Pseudo forks? 

An interesting extract from the Windows Services for Unix docs.

"The Interix subsystem shows improvements in all aspects of 
performance—ranging from 30 percent improvements in fork and 
exec performance..."

It's sales pitch, but I'm trying to verify it (and build parrot 
using it).

>
> 
> =item System memory allocation routines are threadsafe
> 
> We are assuming that the memory allocation system of the base OS is
> threadsafe. While some of the C runtime libraries are notoriously
> thread dangerous, memory allocation code generally is threadsafe, and
> we'll assume that on all platforms. (Though we will, in general,
> manage our own memory)
>

Is there any likelyhood that memory allocation will be hidden behind 
macros at two levels:
 - ALLOCPOOL() for allocating large chunks of memory (ppols) that are 
   later sub-allocated (and managed) by:

 - SUBALLOC() for sub allocating within a pool

?

>
> =item Each interpreter will have a unique id
> 
> This ID will be independent of the process or OS thread, and will be
> constant across the lifetime of the interpreter. Interpreter IDs
> I be reused as interpreters are destroyed and recreated, and as
> such are only guaranteed valid while an interpreter is in use.
>

The provision of method(s) to obtain the native TIDs/HANDLES would
make life for those writing implementation specific extensions easier.

>
>[SNIP] - Too much to understand before commenting...
>

>
> --"it's like this"---
> Dan Sugalski  even samurai
> [EMAIL PROTECTED] have teddy bears and even
>teddy bears get drunk

Nigel




Re: Start of thread proposal

2004-01-20 Thread Dave Whipp
"Dan Sugalski" <[EMAIL PROTECTED]> wrote:
> =head2 Guarantees

Maybe this isn't strictly a threading thing, but are we going to make any
guarantees about event orderings? For example, will we guarantee that a
sequence of events send from one thread to another will always be received
in the order they are sent? (Obviously no guarantees about interleaving of
events from other threads). This is usually only important in distributed
environments where multiple paths exists between a pair of endpoints, but it
would be nice for user-code to be able to rely on it.


Dave.




Re: Start of thread proposal

2004-01-19 Thread Gordon Henriksen
On Monday, January 19, 2004, at 06:37 , Gordon Henriksen wrote:

Dan Sugalski wrote:

For a copying collector to work, all the mutators must be blocked,
and arguably all readers should be blocked as well.
True of non-moving collectors, too. [...]

Some of what I've written up addresses why. [...] I'll send that 
section when I get out of the office.
Consider this simple object graph:

Root set = { &A, &B }

[ A=NULL]
[ B=&C  ] ---> [ C=]
Collection begins in thread 1 and marks A as reachable before being 
preempted by thread 2:

[xA=NULL]
[ B=&C  ] ---> [ C=]
Thread 2 sets A to &C, and nullifies B before being preempted again by 
thread 1:

[xA=&C  ] ---> [ C=]
[ B=NULL]
Thread 1 marks B as reachable:

[xA=&C  ] ---> [ C=]
[xB=NULL]
The mark phase complete, thread 1 sweeps:

[ A=&???] ---> ???
[ B=NULL]
C was not marked reachable (although it was) and was thus erroneously 
collected, leaving a dangling pointer. This problem applies equally to 
copying and mark-sweep collectors.

—

Gordon Henriksen
[EMAIL PROTECTED]


Re: Start of thread proposal

2004-01-19 Thread Dan Sugalski
At 12:58 AM + 1/20/04, [EMAIL PROTECTED] wrote:
 > =item MUTEX
 This is a low level, under the hood, not exposed to users, thing that
 can be locked. They're non-recursive, non-read/write, exclusive
 things. When a thread gets a mutex, any other attempt to get that
 mutex will block until the owning thread releases the mutex. The
 platform-native lock construct will be used for this.
Will this be macroised to hide the platform native implementation from
the main body of the code?
Yes.

 >
 The "sleep until something pings me" construct. Useful for queue
 construction. Not conceptually associated with a MUTEX. (POSIX
 threads require this, so we're going to hide it there behind macros
 and/or functions)
Could you elaborate on the nature of what would constitute a "ping"?
POSIX has a function cond_signal (and cond_broadcast, which is 
similar). What happens is a thread grabs the condition variable and 
goes to sleep, and sleeps until another thread does a cond_signal, 
which then wakes it up. If there are multiple threads sleeping on the 
condition variable, only one is woken. (cond_broadcast wakes them all 
up)

 > =item POSIX threads
 The threading scheme must be sufficient to support a POSIX
 share-everything style of threading, such as is used in perl 5's
 "pthread" model, as well as the thread models for Ruby and Python.
Thankyou:)
No problem. It's the model I prefer. :)

 > =item "Process-type' threads
 The scheme must also support the perl 5 "iThreads" threading
 model. In this model no data is shared implicitly, and all sharing
 must be done on purpose and explicitly. It very much resembles the
 Unix fork-process-with-shared-memory-segment model, not a surprise as
 it was originally developed with
Pseudo forks?
Yeah. On Win32, when Perl forks it starts a new thread and clones the 
interpreter and its contents. If you later do an exec in that thread 
it starts a new process and grabs hold of it. Definitely a clever 
trick.

An interesting extract from the Windows Services for Unix docs.

"The Interix subsystem shows improvements in all aspects of
performanceranging from 30 percent improvements in fork and
exec performance..."
It's sales pitch, but I'm trying to verify it (and build parrot
using it).
If it works, cool. We can add environment probing at configure time. 
I worry some about the stability and evil involved there (I've a good 
idea how much work it is to weld fork into a process with a 
fundamentally different process model) but that's Microsoft's 
problem, not ours.

 > =item System memory allocation routines are threadsafe
 We are assuming that the memory allocation system of the base OS is
 threadsafe. While some of the C runtime libraries are notoriously
 thread dangerous, memory allocation code generally is threadsafe, and
 we'll assume that on all platforms. (Though we will, in general,
 manage our own memory)
Is there any likelyhood that memory allocation will be hidden behind
macros at two levels:
 - ALLOCPOOL() for allocating large chunks of memory (ppols) that are
   later sub-allocated (and managed) by:
 - SUBALLOC() for sub allocating within a pool
We'll generally hide behind the memory.c routines, if for no other 
reason than to allow the embedder to override the memory functions. 
We need to define that at some point...

 >
 =item Each interpreter will have a unique id

 This ID will be independent of the process or OS thread, and will be
 constant across the lifetime of the interpreter. Interpreter IDs
 I be reused as interpreters are destroyed and recreated, and as
 such are only guaranteed valid while an interpreter is in use.
The provision of method(s) to obtain the native TIDs/HANDLES would
make life for those writing implementation specific extensions easier.
TIDs, definitely. It'll be a sysinfo or interpinfo function. There's 
the issue of interpreters binding to different threads, but we'll 
deal with that. I'm not sure what the HANDLE is, but explain it and 
we can work something out. :)
--
Dan

--"it's like this"---
Dan Sugalski  even samurai
[EMAIL PROTECTED] have teddy bears and even
  teddy bears get drunk


Re: Start of thread proposal

2004-01-19 Thread Sam Vilain
On Tue, 20 Jan 2004 10:45, Dan Sugalski wrote;

  > Yes. The recommendation I've always seen for deadlock avoidance is
  > to always have all your code grab its mutexes in some fixed order.

Yes; otherwise, you need to back off and start again, if one lock
acquisition fails.

Consider these functions; for the purpose of this example, lexical
lock ordering is implied;

  func1($, $, $, $, $);
  func2($, $, $, $, $);
  func3($, $);

So long as the locks are ramped up from the lowest to the highest,
there is no chance of func1 acquiring a lock to be held by func2 (eg,
$), if that other function already has one of the shared
dependancies (eg, $).

All well and good.  But, what about recursive locks?

ie

 sub func1($one is locked, $two is locked, $three is locked) {
 for my $x ($one, $two, $three) {
 func2($x.property) if $x.property;
 }
 }

 sub func2($eins is locked) {
 if ($eins.property) {
 func2($eins.property, { });
 }
 }

 $aaa.property = { };
 $bbb.property.property = $aaa;
 $ccc = { };
 
 if (thork()) {   # fork a thread
# thread A
func1($aaa, $bbb, $ccc);
 } else {
# thread B
func2($bbb.property);
 }

OK, the execution order that causes the deadlock is:

  1. Thread B - func2() acquires a lock on the $bbb.property PMC.
  2. Thread A - func() acquires a lock on $aaa, $bbb, $ccc.
  3. Thread A - func2() acquires a lock on $aaa.property, which
 returns quickly
  4. Thread A - func2() blocks waiting for a lock on $bbb.property,
 held by func2() in thread B
  5. Thread B - func2() blocks waiting for a lock on
 $bbb.property.property, held by func() in thread A.

So, there are still possibilities for deadlocks, as the non-linear
nature of subroutine calls screws up your nice lock ramping.

In summary, acquiring mutexes in a defined order as a means to avoid
deadlocks only works when you are acquiring them all up front.  If you
are acquiring any later, and you detect a deadlock (ie, a loop of
threads blocking on locks held by each other), you *must* be able to
tell one of them to "ramp off" to holding no locks at all.  ie,
ROLLBACK :).

The bugger is, winding back execution state automatically to when you
last started acquiring locks is probably an intractable problem from
an interpreter's perspective...

Sounds like a job for an exception to me ;-).

  for (1..50) {
 eval {
func_that_acquires_first_lock($args);
 };
 last unless $@ and $@ =~ m/mutex deadlock/i;
  }

-- 
Sam Vilain, [EMAIL PROTECTED]

  When I sell liquor, its called bootlegging; when my patrons serve it
on Lake Shore Drive, its called hospitality.
AL CAPONE



RE: Start of thread proposal

2004-01-19 Thread Gordon Henriksen
Dan Sugalski wrote:

> For a copying collector to work, all the mutators must be blocked, 
> and arguably all readers should be blocked as well.

True of non-moving collectors, too. Or, let me put it this way: non-
copying *GC* (the sweep or copy phase) can be threadsafe, but the mark
phase is never threadsafe. The method in which marking is not
threadsafe is a bit more pathological (i.e., it's not the common case
as it is with the copying collector), but a standard tracing DOD 
cannot be correct when competeting with mutators. It WILL collect non-
garbage (those are MUTATORS, remember), and the result WILL be 
Heizenbugs and crashes.

Some of what I've written up addresses why. It's pretty simple to
demonstrate a single case to prove the point, but I don't feel like 
re-creating the ASCII art right now. :) I'll send that section when I 
get out of the office.

parrot will have to be able to suspend all threads in the environment.
Unfortunate, but really quite unavoidable.


> Oh, arguably it can't avoid deadlock at all, what with vtable methods 
> having access to the full environment. I can live with deadlocks, 
> only because there's no real alternative.

But PMC implementations have to fall inside of the trusted environment,
so that's not really a failure. Of course uncooperative code can break
a cooperative algorithm. :)

-- 

Gordon Henriksen
IT Manager
ICLUBcentral Inc.
[EMAIL PROTECTED]



RE: Start of thread proposal

2004-01-19 Thread Dan Sugalski
At 5:32 PM -0500 1/19/04, Gordon Henriksen wrote:
Dan Sugalski wrote:

 [A] copying collector is generally untenable in a threaded
environment.

Can you elaborate upon the reasoning behind this statement?

Sure. For a copying collector to work, all the mutators must be 
blocked, and arguably all readers should be blocked as well. (I think 
they need to be, otherwise you may be accessing reclaimed and reused 
memory) That means if we keep a copying collector we need to have 
everything that accesses strings or pmcs to get a lock on the GC 
before every access of every string or PMC. A touch excessive, I 
think. :)

 > The first thing that any vtable function of a shared PMC must do is to

 aquire the mutex of the PMCs in its parameter list, in ascending
 address order. When the mutexes are released they are not required to
 be released in any order.
Note that this locking strategy cannot avoid deadlock if the user is
allowed to acquire these locks--HLL locks must be altogether different
beasts from automatic PMC locks. That's okay. Just a design consequence
worth noting for everyone.
Oh, arguably it can't avoid deadlock at all, what with vtable methods 
having access to the full environment. I can live with deadlocks, 
only because there's no real alternative.
--
Dan

--"it's like this"---
Dan Sugalski  even samurai
[EMAIL PROTECTED] have teddy bears and even
  teddy bears get drunk


RE: Start of thread proposal

2004-01-19 Thread Gordon Henriksen
Dan Sugalski wrote:

> [A] copying collector is generally untenable in a threaded
environment.

Can you elaborate upon the reasoning behind this statement?


> The first thing that any vtable function of a shared PMC must do is to

> aquire the mutex of the PMCs in its parameter list, in ascending 
> address order. When the mutexes are released they are not required to 
> be released in any order.

Note that this locking strategy cannot avoid deadlock if the user is 
allowed to acquire these locks--HLL locks must be altogether different
beasts from automatic PMC locks. That's okay. Just a design consequence
worth noting for everyone.

-- 

Gordon Henriksen
IT Manager
ICLUBcentral Inc.
[EMAIL PROTECTED]



Re: Start of thread proposal

2004-01-19 Thread Dan Sugalski
At 2:44 PM -0700 1/19/04, Luke Palmer wrote:
Dan's thread proposal mentions:
 =item Automatic PMC sharing will be provided

 When a PMC is placed into a container which is shared (including
 lexical pads and global namespaces) then that PMC will automatically
 be marked as shared. It is acceptable for this to trigger an
 exception if for some reason a PMC should not be shared between
 interpreters.
 PMCs are, by default, not shared. This avoids sharing overhead for
 PMCs which are only used as temporaries and not shared. (Note that
 this is dangerous, and may end up not being done, due to the sharing
 of continuations)
I don't know why this is dangerous.  A continuation is a data structure
just like an array or a hash.  When you share it, everything "inside" it
gets shared, too.  For a continuation, this could be an awful lot of
stuff, but it's the safe way.
True, but the danger part is if we don't mark everything grabbed by a 
continuation as shared by default. If we do, we might as well mark 
everything as shared, as there'll be less overhead.
--
Dan

--"it's like this"---
Dan Sugalski  even samurai
[EMAIL PROTECTED] have teddy bears and even
  teddy bears get drunk


Re: Start of thread proposal

2004-01-19 Thread Dan Sugalski
At 4:37 PM -0500 1/19/04, Uri Guttman wrote:
 > "DS" == Dan Sugalski <[EMAIL PROTECTED]> writes:

  DS> =item All shared PMCs must have a threadsafe vtable

  DS> The first thing that any vtable function of a shared PMC must do is to
  DS> aquire the mutex of the PMCs in its parameter list, in ascending
  DS> address order. When the mutexes are released they are not required to
  DS> be released in any order.
why the ascending address order to grab mutexes? is this to help solve
deadlocks?
Yes. The recommendation I've always seen for deadlock avoidance is to 
always have all your code grab its mutexes in some fixed order. With 
source, it's generally recommended that you grab mutex variables in 
lexical order. Since we're all binary we need a different order, and 
ascending addresses are reasonably simple to do.
--
Dan

--"it's like this"---
Dan Sugalski  even samurai
[EMAIL PROTECTED] have teddy bears and even
  teddy bears get drunk


Re: Start of thread proposal

2004-01-19 Thread Luke Palmer
Dan's thread proposal mentions:
> =item Automatic PMC sharing will be provided
> 
> When a PMC is placed into a container which is shared (including
> lexical pads and global namespaces) then that PMC will automatically
> be marked as shared. It is acceptable for this to trigger an
> exception if for some reason a PMC should not be shared between
> interpreters.
> 
> PMCs are, by default, not shared. This avoids sharing overhead for
> PMCs which are only used as temporaries and not shared. (Note that
> this is dangerous, and may end up not being done, due to the sharing
> of continuations)

I don't know why this is dangerous.  A continuation is a data structure
just like an array or a hash.  When you share it, everything "inside" it
gets shared, too.  For a continuation, this could be an awful lot of
stuff, but it's the safe way.

Luke


Re: Start of thread proposal

2004-01-19 Thread Uri Guttman
> "DS" == Dan Sugalski <[EMAIL PROTECTED]> writes:

  DS> =item All shared PMCs must have a threadsafe vtable

  DS> The first thing that any vtable function of a shared PMC must do is to
  DS> aquire the mutex of the PMCs in its parameter list, in ascending
  DS> address order. When the mutexes are released they are not required to
  DS> be released in any order.

why the ascending address order to grab mutexes? is this to help solve
deadlocks?

uri

-- 
Uri Guttman  --  [EMAIL PROTECTED]   http://www.stemsystems.com
--Perl Consulting, Stem Development, Systems Architecture, Design and Coding-
Search or Offer Perl Jobs    http://jobs.perl.org


Re: Start of thread proposal

2004-01-19 Thread Dan Sugalski
At 3:16 PM -0500 1/19/04, Dan Sugalski wrote:
I've not gotten into the technical bits yet. That's next, but rip 
this apart first.
First rip -- I left a section unfinished. It should be:

=item "Process-type" threads

The scheme must also support the perl 5 "iThreads" threading
model. In this model no data is shared implicitly, and all sharing
must be done on purpose and explicitly. It very much resembles the
Unix fork-process-with-shared-memory-segment model, not a surprise as
it was originally developed with emulation of Unix's fork system in
mind.
--
Dan
--"it's like this"---
Dan Sugalski  even samurai
[EMAIL PROTECTED] have teddy bears and even
  teddy bears get drunk


Start of thread proposal

2004-01-19 Thread Dan Sugalski
I've not gotten into the technical bits yet. That's next, but rip 
this apart first.

=head1 DEFINITIONS

So we can all talk about things the same way, the following
definitons apply. Some of these are drawn from the POSIX thread spec,
and as such we should have a translation section at the end.
=over 4

=item THREAD

An OS level thread. If that makes no sense, neither will any of the
rest of this, in which case I recommend picking up "Programming with
POSIX Threads" by Dave Butenhof, and coming back when you have.
=item MUTEX

This is a low level, under the hood, not exposed to users, thing that
can be locked. They're non-recursive, non-read/write, exclusive
things. When a thread gets a mutex, any other attempt to get that
mutex will block until the owning thread releases the mutex. The
platform-native lock construct will be used for this.
=item LOCK

This is an exposed-to-HLL-code thing that can be locked. Only PMCs can
be locked, and the lock may or may not be recursive or read/write.
=item CONDITION VARIABLE

The "sleep until something pings me" construct. Useful for queue
construction. Not conceptually associated with a MUTEX. (POSIX
threads require this, so we're going to hide it there behind macros
and/or functions)
=item RENDEZVOUS POINT

A HLL version of a condition variable.

=item INTERPRETER

Those bits of the Parrot_Interp structure that are absolutely required
to be thread-specific. This includes the current register sets and
stack pointers, as well as security context information. Basically if
a continuation captures it, it's the interpreter.
=item INTERPRETER ENVIRONMENT

Those bits of the Parrot_Interp structure that aren't required to be
thread-specific (though I'm not sure there are any) I anything
pointed to that doesn't have to be thread-specific.
The environment includes the global namespaces, pads, stack chunks,
memory allocation regions, arenas, and whatnots. Just because the
pointer to the current pad is thread-specific doesn't mean the pad
I has to be. It can be shared.
=item INDEPENDENT THREAD

A thread that has no contact I with the internal data of any
other thread in the current process. Independent threads need no
synchronization for anything other than what few global things we
have. And the fewer the better, though alas we can't have none at all.
Note that independent threads may still communicate back and forth by
passing either atomic things (ints, floats, and pointers) or static
buffers that can become the property of the destination thread.
=item SHARED THREAD

A thread that's part of a group of threads sharing a common
interpreter environment.
=back

=head1 REQUIREMENTS

=head2 Supported Models

=over 4

=item POSIX threads

The threading scheme must be sufficient to support a POSIX
share-everything style of threading, such as is used in perl 5's
"pthread" model, as well as the thread models for Ruby and Python.
=item "Process-type' threads

The scheme must also support the perl 5 "iThreads" threading
model. In this model no data is shared implicitly, and all sharing
must be done on purpose and explicitly. It very much resembles the
Unix fork-process-with-shared-memory-segment model, not a surprise as
it was originally developed with
=back

=head2 Guarantees

=over 4

=item No Crashes

The interpreter guarantees that no user program errors of any sort
will crash the interpreter. This includes threading problems. As
such, synchronization issues (where multiple interpreters are
accessing the same shared data) must not crash the interpreter or
corrupt its internal state.
=back

=head2 Assumptions

=over 4

=item System memory allocation routines are threadsafe

We are assuming that the memory allocation system of the base OS is
threadsafe. While some of the C runtime libraries are notoriously
thread dangerous, memory allocation code generally is threadsafe, and
we'll assume that on all platforms. (Though we will, in general,
manage our own memory)
=back

=head1 Proposal

The proposal is as follows:

=over 4

=item All global state shall be protected by mutexes

Straightforward enough. This allows for independent threads to
coexist without threatening the state of the proces.
=item Multiple independent interpreters will be allowed

Once again, straightforward enough. With threadsafe protected global
state, there's no issue here.
=item Only one OS thread in an interpreter at once

While there is no requirement that any interpreter be tied to an
underlying OS thread, under no circumstances may multiple OS threads
use a single interpreter simultaneously.
=item A Stop-and-copy communication method will be provided

Parrot will provide a function to make a call into another interpreter
and wait for that call to complete. This call may pass in data and
have data returned to it. The interpreter making the call will block
until the call is complete. The data passed in as parameters will be
copied into the called interpreter, and any return values will be
copied back into the calli