Re: Threads Design. A Win32 perspective.

Nigel Sandever Sat, 03 Jan 2004 14:39:32 -0800

On Sat, 3 Jan 2004 21:00:31 +0100, [EMAIL PROTECTED] (Leopold Toetsch) wrote:
> >> That's exactly, what a ParrotInterpreter is: "the entire state for a
> >> thread".
> 
> > This is only true if a thread == interpreter.
> > If a single interpreter can run 2 threads then that single interpreter
> > cannot represent the state of both threads safely.
> 
> Yep. So if a single interpreter (which is almost a thread state) should
> run two threads, you have to allocate and swap all.


When a kernel level thead is spawned, no duplication of application memory 
is required, Only a set of registers, program counter and stack. These 
represent the entire state of that thread.

If a VM thread mirrors this, by duplicating the VM program counter, 
VM registers and  VM stack, then this VM thread context can also
avoid the need to replicate the rest of the program data (interpreter).

> What should the
> advantage of such a solution be?

The avoidance of duplication. 
Transparent interlocking of VHLL fat structures performed automatically
by the VM itself. No need for :shared or lock().


> 
> > With 5005threads, multiple threads exist in a single interpreter.
> 
> These are obsolete.

ONLY because they couldn't be made to work properly. The reason 
that was true are entirely due to the architecture of P5.

Dan Sugalski suggested in this list back in 2001, that he would prefer
pthreads to ithreads. 

I've used both in p5, and pthreads are vastly more efficient, but flaky and
difficult to use well. These limitations are due to the architecture upon 
which they were built. My interest is in seeing the Parrot architecture
not exclude them.

> 
> > With ithreads, each thread is also a seperate interpreter.
> 
> >  Spawning a new thread becomes a process of duplicating everything.
> >  The interpreter, the perl program, and all it existing data.
> 
> Partly yes. A new interpreter is created, the program, i.e. the opcode
> stream is *not* duplicated, but JIT or prederef information has to be
> rebuilt (on demand, if that run-core is running), and existing
> non-shared data items are cloned.
> 

Only duplicating shared data on demand (COW) may work well on systems
that support COW in the kernel. But on systems that don't, this has to be
emulated in user space, with all the inherent overhead that implies.

My desire was that the VM_Spawn_Thread VM_Share_PMC and 
VM_Lock_PMC opcodes could be coded such that those platforms where
the presence of kernel level COW and other native features mean that
the ithreads-style model of VMthread == kernel thread + interpreter 
is the best way to go, then that would be the underlying implementation.

On those platforms where VMthread == kernel thread + VMthread context
is the best way, then that would be the underlying implementation.

In order for this to be possible, it implies a certain level of support for
both be engrained in the design of the interpreter.

My (long) oroginal post, with all the subjects covered and details given 
was my attempt to describe the support required in the design for the 
latter. It would be necessary to consider all the elements, and the way 
they intereact, and take these into consideration when implementing
Parrots threading in order that this would be achievable.

Each element, the seraration of the VMstate from the interpreter state,
the atomisation of VM operations, the automated detection and locking of
concurrect access attempts and the serialisation of the VM threads when 
it is detected all need support at the highest level before they may be 
implemented at the lowest (platform specific) levels.

It simply isn't possible to implement them on one platform at the lowest
levels unless the upper levels of the design are contructed with the 
possibilities in mind.

> >  Sharing data between the threads/interpreters is implemented by
> >  tieing
> 
> Parrot != perl5.ithreads
> 
> > If Parrot has found a way of avoiding these costs and limitations
> > then everything I offered is a waste of time, because these are
> > the issues  was attempting to address.
> 
> I posted a very premature benchmark result, where an unoptimized Parrot
> build is 8 times faster then the equivalent perl5 code.
> 
> > And the reaction from those wh have tried to make use of ithreads
> > under p5 are all too aware that replicating them for Parrot would
> > be ..... [phrase deleted as too emotionally charged:)]
> 
> I don't know how ithreads are working internally WRT the relevant issues
> like object allocation and such. But threads at the OS level provide
> shared code and data segments. So at the VM level you have to unshare
> non-shared resources at thread creation. 

You only need to copy them, if the two threads can attempt to modify
the contents of the objects concurrently. By precluding this possibility,
by atomising VMthread level operations by preventing a new VM thread
form being scheduled until any othe VM thread completes its current 
operation and ensuring that each VMthreads state is in a complete and
coherent state before another VM thread can run, you can avoid the need 
for the duplication.

> You can copy objects lazily and
This works well if the kernel will take care of the COW, but on 
kernels not supporting this, you have to add costly extra code to 
detect the writes (or the reads) and perform the duplication (and
possibly the allocation) in user code. This is what p5 ithreads do,
and the result is less than satisfactory.

> make 2 distinct items when writing, or you copy them in the first
> place. But you have these costs at thread start - and not later.

Duplicating everything a thread start, is effectively how win32
emulates forking. The problem is that in addition to not supporting
forking natively, which then requires complicated steps to be taken
in user level code to introspect the OS to find and duplicate the 
appropriate data segments. Stuff that *nix kernal do as a result
of a single syscall that has access to all the appropriate kernel
tables and data, costs hugely when performed from user space.

In addition to that, the there are several other *nix kernel facilities
associated with forking (SIG_CHLD etc.) that th win32 kernel doesn't 
support. These have never been suitably emulated, and so forking as a
concept is still pretty unusable on win32. 

My desire is that *all* OS concepts used from the interpreter be
virtualised so that where they are native, they can be macro'd 
directly to teh undrlying OS syscalls. But where they are not so
supported, they can be implemented in isolation without the need to 
make wholesale changes throughout the sources.

> leo

Nigel.

Re: Threads Design. A Win32 perspective.

Reply via email to