Re: Threads: Time to get the terminology straight

2004-01-04 Thread Sam Vilain
On Mon, 05 Jan 2004 15:43, Nigel Sandever wrote;

  > I accept that it may not be possible on all platforms, and it may
  > be too expensive on some others. It may even be undesirable in the
  > context of Parrot, but I have seen no argument that goes to
  > invalidate the underlying premise.

I think you missed this:

LT> Different VMs can run on different CPUs. Why should we make atomic
LT> instructions out if these? We have a JIT runtime performing at 1
LT> Parrot instruction per CPU instruction for native integers. Why
LT> should we slow down that by a magnitude of many tenths?

LT> We have to lock shared data, then you have to pay the penalty, but
LT> not for each piece of code.

and this:

LT> I think, that you are missing multiprocessor systems totally.

You are effectively excluding true parallellism by blocking other
processors from executing Parrot ops while one has the lock.  You may
as well skip the thread libraries altogether and multi-thread the ops
in a runloop like Ruby does.

But let's carry the argument through, restricting it to UP systems,
with hyperthreading switched off, and running Win32.  Is it even true
that masking interrupts is enough on these systems?

Win32 `Critical Sections' must be giving the scheduler hints not to
run other pending threads whilst a critical section is running.  Maybe
it uses the CPU sti/cli flags for that, to avoid the overhead of
setting a memory word somewhere (bad enough) or calling the system
(crippling).  In that case, setting STI/CLI might only incur a ~50%
performance penalty for integer operations.

but then there's this:

  NS> Other internal housekeeping operations, memory allocation, garbage
  NS> collection etc. are performed as "sysopcodes", performed by the VMI
  NS> within the auspices of the critical section, and thus secured.

UG> there may be times when a GC run needs to be initiated DURING a VM
UG> operation. if the op requires an immediate lare chunk of ram it
UG> can trigger a GC pass or allocation request. you can't force those
UG> things to only happen between normal ops (which is what making
UG> them into ops does). so GC and allocation both need to be able to
UG> lock all shared things in their interpreter (and not just do a
UG> process global lock) so those things won't be modified by the
UG> other threads that share them.

I *think* this means that even if we *could* use critical sections for
each op, where this works and isn't terribly inefficient, GC throws a
spanner in the works.  This could perhaps be worked around.

In any case, it won't work on the fastest known threading
implementations (Solaris, Linux NPTL, etc), as they won't know to
block all the other threads in a given process just because one of
them set a CPU flag cycles before it was pre-empted.

So, in summary - it won't work on MP, and on UP, it couldn't possibly
be as overhead-free as the other solutions.

Clear as mud ?  :-)

[back to processors]
> Do these need to apply lock on every machine level entity that
> they access?

Yes, but the only resource that matters here is memory.  Locking
*does* take place inside the processor, but the locks are all close
enough to be inspected in under a cycle.  And misses incur a penalty
of several cycles - maybe dozens, depending on who has the memory
locked.

Registers are also "locked" by virtue of the fact that the
out-of-order execution and pipelining logic will not schedule/allow an
instruction to proceed until its data is ready.  Any CPU with
pipelining has this problem.

There is an interesting comparison to be drawn between the JIT
assembly happening inside the processor from the bytecode being
executed (x86) into a RISC core machine language (µ-ops) on
hyperthreading systems, and Parrot's compiling PASM to native machine
code.  It each case is the µ-ops that are ordered to maximize
performance and fed into the execution units.

On a hyperthreading processor, it has the luxury of knowing how long
it will take to check the necessary locks for each instruction,
probably under a cycle, so that µ-ops may scream along.

With Parrot, it might have to contact another host over an ethernet
controller to acquire a lock (eg, threads running in an OpenMOSIX
cluster).  This cannot happen for every instruction!
-- 
Sam Vilain, [EMAIL PROTECTED]

  The golden rule is that there are no golden rules
GEORGE BERNARD SHAW




Re: Threads: Time to get the terminology straight

2004-01-04 Thread Luke Palmer
Nigel Sandever writes:
> Whilst the state of higher level objects, that the machine level 
> objects are a part of, may have their state corrupted by two 
> threads modifying things concurrently. The state of the threads
> (registers sets+stack) themselves cannot be corrupted. 

I'm going to ask for some clarification here.

I think you're saying that each thread gets its own register set and
register stacks (the call chain is somewhat hidden within the register
stacks).  Dan has been saying we'll do this all along.

But you're also saying that a particular PMC can only be loaded into one
register at a time, in any thread.  So if thread A has a string in its
P17, then if thread B tries to load that string into its, say, P22, it
will block until thread A releases it.  Is this correct?

This is a novel idea.  It reduces lock acquisition/release from every
opcode to only the C opcodes.  That's a win.

Sadly, it greatly increases the chances of deadlocks.  Take this
example:

my ($somestr, $otherstr);
sub A {
$somestr .= "foo";
$otherstr .= "bar";
}
sub B {
$otherstr .= "foo";
$somestr .= "bar";
}
Thread->new(\&A);
Thread->new(\&B);

This is a fairly trivial example, and it should work smoothly if we're
automatically locking for the user. 

But consider your scheme:  A loads $somestr into its P17, and performs
the concatenation.   B loads $otherstr into its P17, and performs the
concatenation.  A tries to load $otherstr into its P18, but blocks
because it's in B's P17.  B then tries to load $somestr into its P18,
but blocks because it's in A's P17.  Deadlock.

Did I accurately portray your scheme?  If not, could you explain what
yours does in terms of this example?

Luke

> Regards, Nigel


Re: Threads: Time to get the terminology straight

2004-01-04 Thread Nigel Sandever
05/01/04 01:22:32, Sam Vilain <[EMAIL PROTECTED]> wrote:

[STUFF] :)

In another post you mentions intel hyperthreading. 
Essentially, duplicate sets of registers within a single CPU.

Do these need to apply lock on every machine level entity that
they access?
 No. 

Why not?

Because they can only modify an entity if it is loaded into a register
and the logic behind hyperthreading won't allow both register sets 
to load the same entity concurrently. 

( I know this is a gross simplificationof the interactions 
between the on-board logic and L1/L2 caching!)

--- Not an advert or glorification of Intel. Just an example -

Hyper-Threading Technology provides thread-level-parallelism (TLP) 
on each processor resulting in increased utilization of processor 
execution resources. As a result, resource utilization yields higher 
processing throughput. Hyper-Threading Technology is a form of 
simultaneous multi-threading technology (SMT) where multiple 
threads of software applications can be run simultaneously on one
 processor. 

This is achieved by duplicating the *architectural state* on each 
processor, while *sharing one set of processor execution resources*.
--

The last paragraph is the salient one as far as I am concerned.

The basic premise of my original proposal was that multi-threaded,
machine level applications don't have to interlock on machine level 
entities, because each operation they perform is atomic. 

Whilst the state of higher level objects, that the machine level 
objects are a part of, may have their state corrupted by two 
threads modifying things concurrently. The state of the threads
(registers sets+stack) themselves cannot be corrupted. 

This is because they have their own internally consistant state,
that only changes atomically, and that is completely separated,
each from the other. They only share common data (code is data
to the cpu, just bytecode is data to a VM).

So, if you are going to emulate a (hyper)threaded CPU, in a 
register-based virtual machine interpreter. And allow for
concurrent threads of execution within that VMI.
Then one way of ensuring that the internal state of the 
VMI was never corrupted, would be to have each thread have
it's own copy of the *architectural state* of the VM, whilst
 *one set of processor execution resources*.

For this to work, you would need to achieve the same opcode
atomicity at the VMI level. Interlocking the threads so that
on shared thread can not start an opcode until anothe shared 
threads has completed gives this atomicity. The penalty is that
if the interlocking is done for every opcode, then shared 
threads end up with very long virtual timeslices. To prevent 
that being the case (most of the time), the interlocking should
 only come into effect *if* concurrent access to a VM level 
entity is imminant. 

As the VMI cannot access (modify) the state of a VM level
entity (PMC) until it has loaded it into a VM register, the
interlosking only need come into effect *if*, the entity
who's reference is being loaded into a PMC register is 
currently in-use by (another) thread. 

The state if a PMCs in-useness can be flagged by a single bit
in its header. This can be detected by a shared thread when
the reference to it is loaded into teh PMC register and 
when it is, that shared thread then waits on the single,
shared mutex before proceeding.

It is only when the combination of atomised VM opcodes,
and lightweight in-use detection come together, that the
need for a mutex/entity can be avoided.

If the mutex used is capable of handling SMP, NUMA,
clusters etc, then the mechinsm will work. 

If the lightweight bit-test&-set opcode isn't available,
then a heavyweight equivalent could be used, though the
advantages would be reduced.


>Sam Vilain, [EMAIL PROTECTED]

I hope that clarifiies my thinking and how I arrived at it.

I accept that it may not be possible on all platforms, and
it may be too expensive on some others. It may even be 
undesirable in the context of Parrot, but I have seen no
argument that goes to invalidate the underlying premise.

Regards, Nigel





Re: Threads: Time to get the terminology straight

2004-01-04 Thread Sam Vilain
On Mon, 05 Jan 2004 12:58, Nigel Sandever wrote;

  > Everything else, were my attempts at solving the requirements of
  > synchronisation that this would require, whilst minimising the
  > cost of that synchronisation, by avoiding the need for a mutex on
  > every shared entity, and the costs of attempting to aquire a mutex
  > except when two SHARED THREADS attempted concurrent access to a
  > shared entity.

This paragraph sounds like you're trying to solve an intractable problem.
Try posting some psuedocode to explain what you mean.

But it has given me an idea that could minimise the number of locks,
ie not require a mutex on each PMC, just each shared PMC.  10 points
to the person who finds a flaw in this approach :)

  Each object you share, could create a new (virtual) shared memory
  segment with its own semaphore.  This virtual memory segment is
  considered its own COW domain (ie, its own thread to the GC);
  references inserted back to non-shared memory will pull the
  structures into that virtual COW thread.

  Access to the entire structure is controlled via a multiple
  reader/single writer lock (close to what a semaphore is IIRC); locks
  for a thread are released when references to places inside the
  shared segment are no longer anywhere in any @_ on the locking
  thread's call stack, or in use by any opcode (is that good enough?),
  and are acquired for writing when anything needs to be changed.

  Virtual shared memory segments can then easily be cleaned up by
  normal GC.

The major problem I can see is that upgrading a lock from reading to
writing can't work if there are concurrent writes (and the read lock
to be upgraded cannot sensibly be released).  But that should be OK,
since operation signatures will mark variables that need changing as
read-write as early as possible.

For example, in this sort of code (sorry for P5 code);

  sub changer {
 my $shared_object = shift;
 $shared_object->{bar} = "baz";
  }

A read lock to the segment \$shared_object is in is acquired, then
released when it is `shifted' off.  As the next instruction has a
writable lvalue, it acquires a write lock.  But this code:

  sub changer {
 my $shared_object = shift;
 $shared_object->{bar} = &somefunc();
  }

Will hold the write lock on $shared_object open until &somefunc
runs.

My 2¢ :).  This discussion will certainly reach a dollar soon ;).
-- 
Sam Vilain, [EMAIL PROTECTED]

  Start every day with a smile and get it over with. 
W C FIELDS




Re: Threads Design. A Win32 perspective.

2004-01-04 Thread Damien Neil
On Sun, Jan 04, 2004 at 12:17:33PM -0800, Jeff Clites wrote:
> What are these standard techniques? The JVM spec does seem to guarantee 
> that even in the absence of proper locking by user code, things won't 
> go completely haywire, but I can't figure out how this is possible 
> without actual locking. (That is, I'm wondering if Java is doing 
> something clever.) For instance, inserting something into a collection 
> will often require updating more than one memory location (especially 
> if the collection is out of space and needs to be grown), and I can't 
> figure out how this could be guaranteed not to completely corrupt 
> internal state in the absence of locking. (And if it _does_ require 
> locking, then it seems that the insertion method would in fact then be 
> synchronized.)

My understanding is that Java Collections are generally implemented
in Java.  Since the underlying Java bytecode does not permit unsafe
operations, Collections are therefore safe.  (Of course, unsynchronized
writes to a Collection will probably result in exceptions--but it
won't crash the JVM.)

For example, insertion into a list might be handled something like
this (apologies for rusty Java skills):
  void append(Object new_entry) {
if (a.length <= size) {
  Object new_a[] = new Object[size * 2];
  for (int i = 0; i < size; i++) {
new_a[i] = a;
  }
}
a[size++] = new_entry;
  }

If two threads call this function at the same time, they may well leave
the list object in an inconsistent state--but there is no way that the
above code can cause JVM-level problems.

The key decision in Java threading is to forbid modification of all
bytecode-level types that cannot be atomically modified.  For example,
the size of an array cannot be changed, and strings are constant.
If it WERE possible to resize arrays, the above code would require locks
to avoid potential JVM corruption--every access to 'a' would need a lock
against the possiblity that another thread was in the process of resizing
it.

It's my understanding that Parrot has chosen to take the path of using
many mutable data structures at the VM level; unfortunately, this is
pretty much incompatible with a fast or elegant threading model.

- Damien


Re: Thread notes

2004-01-04 Thread Sam Vilain
On Mon, 05 Jan 2004 10:13, Dan Sugalski wrote;

  [...]
  > these things. It's a set of 8 4-processor nodes with a fast 
  > interconnect between them which functions as a 32 CPU system. The 
  > four processors in each node are in a traditional SMP setup with a 
  > shared memory bus, tightly coupled caches, and fight-for-the-bus 
  [...]

I know what a NUMA system is, I was just a little worried by the
combination of the terms SMP and NUMA in the same sentence :).

Normally "SMP" means "Shared Everything" - meaning Uniform Memory
Access.  If compared to the term MPP or AMP (in which different CPUs
are put to different tasks), it is true that each node in a NUMA
system could be put to any task.  So, the term "SMP" would seem to fit
partially; but the implication is with NUMA that there are clear
benefits to *not* having each processor doing *exactly* the same
thing, all the time.  "CPU affinity" & all that.

Groups of processors in a NUMA machine have shared a group of memory
in all the NUMA systems I've seen too (SGI Origin/Onyx, and Sun
Enterprise servers *must* be, though they don't mention it!).  So I'd
say that the "SMP" is practically redundant, borderlining on
confusing.  Maybe a term like "8 x 4MP NUMA" is better.

I did apologise at the beginning for being pedantic.  But hey, didn't
this digression serve to elaborate on the meaning of NUMA ?  :)

  > Given the increases in processor vs memory vs bus speeds, this
  > setup may not hold for that much longer, as it's only really
  > workable when a single CPU doesn't saturate the memory bus with
  > any regularity, which is getting harder and harder to
  > do. (backplane and memory speeds can be increased pretty
  > significantly with a sufficient application of cash, which is why
  > the mini and mainframe systems can actually do it, but there are
  > limits beyond which cash just won't get you)

Opteron, and Sparc IV (IIRC) both have 3 bi-directional high speed
(=core speed) interconnects, so these could `easily' be arranged into
NUMA configurations with SMP groups.  Also, some high-end processors
are going multicore, which presumably has different characteristics
again (especially if the two chips on the die share a cache!).

Then of course there's single-processor multi-threading (eg, Intel
HyperThreading).  These systems have twice the registers internally
and interleave instructions from each `thread' as the processor can
deal with them; using seperate registers for them all helps keep the
execution units busy (kind of like what GCC does with unrolled loops
on a RISC system with more registers in the first place).  These
perform like little NUMAs, because the cache is `hotter' (ie, locks on
those memory pages are held) on the other virtual processor than other
CPUs on the motherboard.  If my understanding is correct, the Intel
implementation is not truly SMP, as the other virtual processor must
share code segments to run threads.  If that is true, doing JIT (or
otherwise changing the executable code, eg dlopen()) in a thread might
break Hyperthreading.  But then again, it might not.  Maybe someone
who gives a flying fork() would like to devise a test to see if this
is the case.

Apparently current dual Opteron systems are also effectively NUMA (as
each chip has its own memory controller), but at the moment, NUMA mode
with Linux is slower than straight SMP mode.  Presumably because it's
a bitch to code for ;-)

So these fun systems are here to stay!  :)
-- 
Sam Vilain, [EMAIL PROTECTED]

All things being equal, a fat person uses more soap than a thin
person.
 - anon.



Re: Threads: Time to get the terminology straight

2004-01-04 Thread Nigel Sandever
On Sun, 4 Jan 2004 15:47:35 -0500, [EMAIL PROTECTED] (Dan Sugalski) wrote:

> *) INTERPRETER - those bits of the Parrot_Interp structure that are 
> absolutely required to be thread-specific. This includes the current 
> register sets and stack pointers, as well as security context 
> information. Basically if a continuation captures it, it's the 
> interpreter.
> 
> *) INTERPRETER ENVIRONMENT - Those bits of the Parrot_Interp 
> structure that aren't required to be thread-specific (though I'm not 
> sure there are any) *PLUS* anything pointed to that doesn't have to 
> be thread-specific.
> 
> The environment includes the global namespaces, pads, stack chunks, 
> memory allocation regions, arenas, and whatnots. Just because the 
> pointer to the current pad is thread-specific doesn't mean the pad 
> *itself* has to be. It can be shared.
> 

> *) SHARED THREAD - A thread that's part of a group of threads sharing 
> a common interpreter environment.

Ignoring the implementation of the synchronisation required, the basic
premise of my long post was that each SHARED THREAD, should have it's
own INTERPRETER (a VM in my terms). and that these should share a 
common INTERPRETER ENVIRONMENT.

Simplistically, 5005threads shared an INTERPRETER ENVIRONMENT 
and a single INTERPRETER. Synchronising threaded access to the shared
INTERPRETER (rather than it's environment) was the biggest headache.
(I *think*).

With ithreads, each SHARED THREAD has it's own INTERPRETER *and*
INTERPRETER ENVIRONMENT. This removes the contention for and the
need to synchronise access to the INTERPRETER, but requires the 
duplication of shared elements of the INTERPRETER ENVIRONMENT and
the copy_on_read, with the inherent costs of the duplication at start-up, 
and slow, indirect access to shared entities across the duplicated 
INTERPRETER ENVIRONMENTS.

My proposal was that each SHARED THREAD, 
should have a separate copy of the INTERPRETER, 
but share a copy of the INTERPRETER ENVIRONMENT.

Everything else, were my attempts at solving the requirements of 
synchronisation that this would require, whilst minimising the cost
of that synchronisation, by avoiding the need for a mutex on every
shared entity, and the costs of attempting to aquire a mutex except
when two SHARED THREADS attempted concurrent access to a
shared entity. 

I think that by having SHARED THREAD == INTERPRETER, sharing
a common INTERPRETER ENVIRONMENT, you can avoid (some) of 
the problems associated with 5005threads but retain the direct 
access of shared entities. 

This imposes it's own set of requirements and costs, but (I believe)
that the ideas that underly the mechanisms I offered as solutions
are sound. The specific implementation is a platform specific detail
that could be pushed down to a lower level.

> ...those bits of the Parrot_Interp 
> structure that aren't required to be thread-specific (though I'm not 
> sure there are any) 

This is were I have a different (and quite possibly incorrect) view.
My mental picture of the INTERPRETER ENVIRONMENT includes
both the impementation of all the classes in the process *plus* all
the memory of every instance of those classes. 

I think your statement above implies that these would not be a part
of the INTERPRETER ENVIRONMENT per se, but would be allocated 
from global heap and only referenced from the bytecode that would live
in the INTERPRETER ENVIRONMENT? 

I realise that this is possible, and maybe even desirable, but the cost
of the GC walking a global heap, especially in the situationof a single 
process that contains to entirely separate instances of the 
INTERPRETER ENVIRONMENT, would be (I *think*) rather high.

I realise that this is a fairly rare occurance on mist platforms,
but in the win32 situation of emulated forks, each pseudo-process
must have an entirely separate INTERPRETER ENVIRONMENT,
potentially with each having multiple SHARED THREADS. 

If the memory for all entities in all pseudo-process is allocated from 
a (real) process-global heap, then the multiple GC's required by the 
multiple pseudo-process are going to be walking the same heap.
Possibly concurrently.  I realise that this problem (if it is such)
does not occur on platforms that have real forks available, but it
would be useful if the high level design would allow for the use of
separate (virtual) heaps tied to the INTERPRETER ENVIRONMENTs
which win32 has the ability to do.


> 
>  Dan
>

Nigel.





Re: Thread Question and Suggestion -- Matt

2004-01-04 Thread Jeff Clites
On Jan 4, 2004, at 1:58 PM, Matt Fowles wrote:

Dave Mitchell wrote:

Why on earth would they be all one kernel-level thread?


Truth to tell I got the idea from Ruby.  As I said it make 
syncronization easier, because the interpreter can dictate when 
threads context switch, allowing them to only switch at safe points.  
There are some tradeoffs to this though.  I had forgotten about 
threads calling into C code.  Although the example of regular 
expressions doesn't work as I think those are supposed to compile to 
byte code...
Ah yes, I think you are right about regexes, judging from 'perldoc 
ops/rx.ops'. I was thinking of a Perl5-style regex engine, in which 
regex application is a call into compiled code (I believe...).

JEff



Re: Thread Question and Suggestion -- Matt

2004-01-04 Thread Matt Fowles
Dave Mitchell wrote:

On Sat, Jan 03, 2004 at 08:24:06PM -0500, Matt Fowles wrote:
 

All~

I have a naive question:

Why must each thread have its own interpreter?

I understand that this suggestion will likely be disregarded because of 
the answer to the above question.  But here goes anyway...

Why not have the threads that share everything share interpreters.  We 
can have these threads be within the a single interpreter thus 
eliminating the need for complicated GC locking and resource sharing 
complexity.  Because all of these threads will be one kernel level 
thread
   

Why on earth would they be all one kernel-level thread?

 

Truth to tell I got the idea from Ruby.  As I said it make 
syncronization easier, because the interpreter can dictate when threads 
context switch, allowing them to only switch at safe points.  There are 
some tradeoffs to this though.  I had forgotten about threads calling 
into C code.  Although the example of regular expressions doesn't work 
as I think those are supposed to compile to byte code...

Matt



Re: Thread notes

2004-01-04 Thread Dan Sugalski
At 10:01 AM +1300 1/5/04, Sam Vilain wrote:
On Sun, 04 Jan 2004 17:53, Dan Sugalski wrote;

  > Given that it's not a SMP, massively out of order NUMA system with
  > delayed writes... no. 'Fraid not.
Sorry to be pedantic, but I always thought that the NU in NUMA implied
a contradiction of the S in SMP!
"NUMA MP" or "SMP", what does it mean to have *both* ?
It means you've got loosely coupled clusters of SMP things. For an 
example, if you go buy an Alpha GS3200 32 processor system (assuming 
DEC^WCompaq^HP still knows how to sell the things) you have one of 
these things. It's a set of 8 4-processor nodes with a fast 
interconnect between them which functions as a 32 CPU system. The 
four processors in each node are in a traditional SMP setup with a 
shared memory bus, tightly coupled caches, and fight-for-the-bus 
access to the memory on that node. Access to memory on another node 
goes over a slower bus, though it still looks and acts like local 
memory.

Nearly all of the NUMA systems I know of act like this, because it's 
still feasible to have tightly coupled 2 or 4 CPU SMP systems. The 
global slowdown generally occurs past that point, so the NUMA systems 
usually group 2 or 4 CPU SMP systems together this way.

Given the increases in processor vs memory vs bus speeds, this setup 
may not hold for that much longer, as it's only really workable when 
a single CPU doesn't saturate the memory bus with any regularity, 
which is getting harder and harder to do. (backplane and memory 
speeds can be increased pretty significantly with a sufficient 
application of cash, which is why the mini and mainframe systems can 
actually do it, but there are limits beyond which cash just won't get 
you)
--
Dan

--"it's like this"---
Dan Sugalski  even samurai
[EMAIL PROTECTED] have teddy bears and even
  teddy bears get drunk


Re: Thread notes

2004-01-04 Thread Sam Vilain
On Sun, 04 Jan 2004 17:53, Dan Sugalski wrote;

  > Given that it's not a SMP, massively out of order NUMA system with 
  > delayed writes... no. 'Fraid not.

Sorry to be pedantic, but I always thought that the NU in NUMA implied
a contradiction of the S in SMP!

"NUMA MP" or "SMP", what does it mean to have *both* ?
-- 
Sam Vilain, [EMAIL PROTECTED]

  What would life be if we had no courage to attempt anything ?
VINCENT van GOGH



Re: Threads Design. A Win32 perspective.

2004-01-04 Thread Dan Sugalski
At 9:27 AM +1300 1/5/04, Sam Vilain wrote:
On Sat, 03 Jan 2004 20:51, Luke Palmer wrote;

  > Parrot is platform-independent, but that doesn't mean we can't
  > take advantage of platform-specific instructions to make it faster
  > on certain machines.  Indeed, this is precisely what JIT is. 
  > But a lock on every PMC is still pretty heavy for those non-x86
  > platforms out there, and we should avoid it if we can.

So implement threading on architectures that don't support interrupt
masking with completely user-space threading (ie, runloop round-robin)
like Ruby does.  *That* is available on *every* platform.
Interrupt masking and a proper threading interface can be considered 
a prerequisite for threads of any sort under Parrot, the same way an 
ANSI C89-compliant compiler is a requirement. Platforms that can't 
muster at least thread spawning, mutexes, and condition variables 
don't get threads, and don't have to be considered. (You can, it's 
just not required, and you'd be hard-pressed to find anything outside 
the embedded realm that doesn't support at least that level of 
functionality, and I'm OK if there are no threads on the Gameboy port)
--
Dan

--"it's like this"---
Dan Sugalski  even samurai
[EMAIL PROTECTED] have teddy bears and even
  teddy bears get drunk


Re: Threads Design. A Win32 perspective.

2004-01-04 Thread Dan Sugalski
At 3:17 PM -0500 1/4/04, Uri Guttman wrote:
 > "DS" == Dan Sugalski <[EMAIL PROTECTED]> writes:

  DS> And don't forget the libraries that are picky about which thread calls
  DS> into them -- there are some that require that the thread that created
  DS> the handle for the library be the thread that calls into the library
  DS> with that handle. (Though luckily those are pretty rare) And of course
  DS> the non-reentrant libraries that require a global library lock for all
  DS> calls otherwise the library state gets corrupted.
  DS> Aren't threads fun? :)

hence my love for events and forked procs.
Forks make some of this worse. There are more libraries that don't 
work with connections forked across processes than libraries that 
don't work with calls from a different thread.
--
Dan

--"it's like this"---
Dan Sugalski  even samurai
[EMAIL PROTECTED] have teddy bears and even
  teddy bears get drunk


Threads: Time to get the terminology straight

2004-01-04 Thread Dan Sugalski
I think some of the massive back and forth that's going on is in part 
due to terminology problems, which are in part causing some 
conceptual problems. So, for the moment, let's agree on the following 
things:

*) MUTEX - this is a low level, under the hood, not exposed to users, 
thing that can be locked. They're non-recursive, non-read/write, 
exclusive things. When a thread gets a mutex, any other attempt to 
get that mutex will block until the owning thread releases the mutex. 
The platform-native lock construct will be used for this.

*) LOCK - This is an exposed-to-HLL-code thing that can be locked. 
Only PMCs can be locked, and the lock may or may not be recursive or 
read/write.

*) CONDITION VARIABLE - the "sleep until something pings me" 
construct. Useful for queue construction, always associated with a 
MUTEX.

*) RENDEZVOUS POINT - A HLL version of a condition variable. *not* 
associated with a lock -- these are standalone.

Note that the mutex/condition association's a POSIX limitation, and 
POSIX threads is what we have on some platforms. If you want to 
propose abstracting it away, go for it. The separation doesn't buy us 
anything, though it's useful in other circumstances.

*) INTERPRETER - those bits of the Parrot_Interp structure that are 
absolutely required to be thread-specific. This includes the current 
register sets and stack pointers, as well as security context 
information. Basically if a continuation captures it, it's the 
interpreter.

*) INTERPRETER ENVIRONMENT - Those bits of the Parrot_Interp 
structure that aren't required to be thread-specific (though I'm not 
sure there are any) *PLUS* anything pointed to that doesn't have to 
be thread-specific.

The environment includes the global namespaces, pads, stack chunks, 
memory allocation regions, arenas, and whatnots. Just because the 
pointer to the current pad is thread-specific doesn't mean the pad 
*itself* has to be. It can be shared.

*) INDEPENDENT THREAD - A thread that has no contact *AT ALL* with 
the internal data of any other thread in the current process. 
Independent threads need no synchronization for anything other than 
what few global things we have. And the fewer the better, though alas 
we can't have none at all.

Note that independent threads may still communicate back and forth by 
passing either atomic things (ints, floats, and pointers) or static 
buffers that can become the property of the destination thread.

*) SHARED THREAD - A thread that's part of a group of threads sharing 
a common interpreter environment.

Anyway, there's some terminology. It doesn't solve the design 
problem, but hopefully it'll help everyone talk the same language.

Remember that everything from the wrapped OS interface on up is up 
for grabs -- while we're not going to build our own mutexes or thread 
scheduler, everything that's been implemented or designed to date can 
be changed with sufficient good reason. (Though, again, the more you 
want to change the more spectacular the design has to be)
--
Dan

--"it's like this"---
Dan Sugalski  even samurai
[EMAIL PROTECTED] have teddy bears and even
  teddy bears get drunk


Re: Threads Design. A Win32 perspective.

2004-01-04 Thread Sam Vilain
On Sat, 03 Jan 2004 20:51, Luke Palmer wrote;

  > Parrot is platform-independent, but that doesn't mean we can't
  > take advantage of platform-specific instructions to make it faster
  > on certain machines.  Indeed, this is precisely what JIT is.  
  > But a lock on every PMC is still pretty heavy for those non-x86
  > platforms out there, and we should avoid it if we can.

So implement threading on architectures that don't support interrupt
masking with completely user-space threading (ie, runloop round-robin)
like Ruby does.  *That* is available on *every* platform.
-- 
Sam Vilain, [EMAIL PROTECTED]

Seeing a murder on television... can help work off one's antagonisms.
And if you haven't any antagonisms, the commercials will give you
some.
 -- Alfred Hitchcock



Re: Threads Design. A Win32 perspective.

2004-01-04 Thread Uri Guttman
> "DS" == Dan Sugalski <[EMAIL PROTECTED]> writes:

  DS> And don't forget the libraries that are picky about which thread calls
  DS> into them -- there are some that require that the thread that created
  DS> the handle for the library be the thread that calls into the library
  DS> with that handle. (Though luckily those are pretty rare) And of course
  DS> the non-reentrant libraries that require a global library lock for all
  DS> calls otherwise the library state gets corrupted.

  DS> Aren't threads fun? :)

hence my love for events and forked procs. i even have a solution to
that very problem by forking the DBI (or other nonthreaded lib) process
and communicating to that via messages. but we still have to support
threads. just gonna be messy and i will prolly rarely use them
(especially since we will have a core event loop and async i/o (which
will prolly use kernel threads (according to dan) but not be parrot
threads ) ) (end of lisp text) :-/

uri

-- 
Uri Guttman  --  [EMAIL PROTECTED]   http://www.stemsystems.com
--Perl Consulting, Stem Development, Systems Architecture, Design and Coding-
Search or Offer Perl Jobs    http://jobs.perl.org


Re: Threads Design. A Win32 perspective.

2004-01-04 Thread Jeff Clites
On Jan 3, 2004, at 8:59 PM, Gordon Henriksen wrote:

On Saturday, January 3, 2004, at 04:32 , Nigel Sandever wrote:

Transparent interlocking of VHLL fat structures performed 
automatically by the VM itself. No need for :shared or lock().
Completely specious, and repeatedly proven unwise. Shouldn't even be 
pursued.

Atomic guarantees on collections (or other data structures) are rarely 
meaningful; providing them is simply a waste of time. Witness the 
well-deserved death of Java's synchronized Vector class in favor of 
ArrayList. The interpreter absolutely shouldn't crash due to threading 
errors—it should protect itself using standard techniques—but it would 
be a mistake for parrot to mandate that all ops and PMCs be 
thread-safe.
What are these standard techniques? The JVM spec does seem to guarantee 
that even in the absence of proper locking by user code, things won't 
go completely haywire, but I can't figure out how this is possible 
without actual locking. (That is, I'm wondering if Java is doing 
something clever.) For instance, inserting something into a collection 
will often require updating more than one memory location (especially 
if the collection is out of space and needs to be grown), and I can't 
figure out how this could be guaranteed not to completely corrupt 
internal state in the absence of locking. (And if it _does_ require 
locking, then it seems that the insertion method would in fact then be 
synchronized.)

So my question is, how do JVMs manage to protect internal state in the 
absence of locking? Or do they?

JEff


Re: Problem during "make test"

2004-01-04 Thread Harry Jackson
Leopold Toetsch wrote:
Harry Jackson <[EMAIL PROTECTED]> wrote:


Can someone tell me if there is an error in the code below.


The code is fine.


it repeatedly from the command line it sometimes freezes ie it prints
the contents of the array and then just stops and I need to do a CTRL-C
to get back to the command line.


Your are sure that there is no hardware problem? Run memcheck for a
couple of hours for example.
I managed to compile gcc which is a fairly good indication that my 
hardware is ok but you never know. I will try memtest86 and see how it goes.

They are the same. The first one is PASM syntax, the second is PIR
syntax.
E.g. running your "imc trouble" code
$ parrot -o- hj.imc
_MAIN:
new P16, 31  # .PerlArray
new P17, 36  # .PerlString
set P16, 10
set P16[0], "Zero"
...
yields the generated PASM code (with variables names allocated to Parrot
registers).
I tried that as well, it spits out identical PASM each time but on the 
odd occasion I need to use CTRL-C to get back to the shell.

H



Re: Threads Design. A Win32 perspective.

2004-01-04 Thread Dan Sugalski
At 12:05 PM -0800 1/4/04, Jeff Clites wrote:
On Jan 4, 2004, at 5:47 AM, Leopold Toetsch wrote:

Elizabeth Mattijsen <[EMAIL PROTECTED]> wrote:

When you use an external library in Perl, such as e.g. libxml, you
have Perl data-structures and libxml data-structures.  The Perl
data-structures contain pointers to the libxml data-structures.

In comes the starting of an ithread and Perl clones all of the Perl
data-structures.  But it copies _only_ does things it knows about.
And thus leaves the pointers to the libxml data-structures untouched.
Now you have 2 Perl data-structures that point to the _same_ libxml
data-structures.  Voila, instant sharing.
I see. Our library loading code should take care of that. On thread
creation we call again the _init code, so that the external lib can
prepare itself to be used from multiple threads. But don't ask me about
details ;)
But I think we'll never be able to make this work as the user would 
initially expect. For instance if we have a DBI implementation, and 
some PMC is holding an external reference to a database cursor for 
an open transaction, then we can't properly duplicate the necessary 
state to make the copy of the PMC work correctly (that is, 
independently). (And I'm not saying just that we can't do it from 
within parrot, I'm saying the native database libraries can't do 
this.)
And don't forget the libraries that are picky about which thread 
calls into them -- there are some that require that the thread that 
created the handle for the library be the thread that calls into the 
library with that handle. (Though luckily those are pretty rare) And 
of course the non-reentrant libraries that require a global library 
lock for all calls otherwise the library state gets corrupted.

Aren't threads fun? :)
--
Dan
--"it's like this"---
Dan Sugalski  even samurai
[EMAIL PROTECTED] have teddy bears and even
  teddy bears get drunk


Re: Threads Design. A Win32 perspective.

2004-01-04 Thread Jeff Clites
On Jan 4, 2004, at 5:47 AM, Leopold Toetsch wrote:

Elizabeth Mattijsen <[EMAIL PROTECTED]> wrote:

When you use an external library in Perl, such as e.g. libxml, you
have Perl data-structures and libxml data-structures.  The Perl
data-structures contain pointers to the libxml data-structures.

In comes the starting of an ithread and Perl clones all of the Perl
data-structures.  But it copies _only_ does things it knows about.
And thus leaves the pointers to the libxml data-structures untouched.
Now you have 2 Perl data-structures that point to the _same_ libxml
data-structures.  Voila, instant sharing.
I see. Our library loading code should take care of that. On thread
creation we call again the _init code, so that the external lib can
prepare itself to be used from multiple threads. But don't ask me about
details ;)
But I think we'll never be able to make this work as the user would 
initially expect. For instance if we have a DBI implementation, and 
some PMC is holding an external reference to a database cursor for an 
open transaction, then we can't properly duplicate the necessary state 
to make the copy of the PMC work correctly (that is, independently). 
(And I'm not saying just that we can't do it from within parrot, I'm 
saying the native database libraries can't do this.)

So some objects such as this would always have to end up shared (or 
else non-functional in the new thread), which is bad for users because 
they have to be concerned with what objects are backed by native 
libraries and which ones cannot be made to conform to each of our 
thread styles.

That seems like a major caveat.

JEff



Re: NCI callback functions

2004-01-04 Thread Dan Sugalski
At 8:19 PM +0100 1/4/04, Leopold Toetsch wrote:
Its a bit complicated and brain-mangling, the more in the absence of 
any examples, but the current design in pdd16 seems to lack some 
flexibility and is IMHO missing the proper handling of the 
(library-provided) external_data. The latter will be passed to the 
Sub somehow, but what then?
Well...

The current system's simple on purpose, because making sure all the 
possible callback function signatures are supported would just be a 
massive pain in the neck. (Not that we're not going well into the 
nuts category with the current NCI setup--nci.o is 143K on my system 
right now--but there are limits even for me :)

Before we go extend things any more, let's get the current system 
fleshed out some (heck, let's get it working!) and then use it as a 
base to proceed. The first order of business is for me to get pdd16's 
examples a bit better fleshed out so folks know what I'm talking 
about, and then go from there.
--
Dan

--"it's like this"---
Dan Sugalski  even samurai
[EMAIL PROTECTED] have teddy bears and even
  teddy bears get drunk


[ANNOUNCE] Devel::Cover 0.32

2004-01-04 Thread Paul Johnson
This release fixes a few bugs and introduces the concept of runs in the
database whereby data for tests are stored separately and merged later
when needed.  I'm hoping this will speed things up somewhat.

 - Actually include do test.
 - Create run concept in database.
 - Belatedly remove check for Template.
 - Add branch_return_sub test.
 - Add finalise_conditions() to collect previously missed coverage.
 - Fix incorrect coverage results associated with "and" conditions.
 - Add all_versions utility script.
 - Put /usr/bin/perl on all shebang lines.

A couple of tests fail on Win32, but they are just rounding errors in
the percentages.  I'll look at that later.

Enjoy,

-- 
Paul Johnson - [EMAIL PROTECTED]
http://www.pjcj.net


Re: Problem during "make test"

2004-01-04 Thread Leopold Toetsch
Harry Jackson <[EMAIL PROTECTED]> wrote:

> Can someone tell me if there is an error in the code below.

The code is fine.

> it repeatedly from the command line it sometimes freezes ie it prints
> the contents of the array and then just stops and I need to do a CTRL-C
> to get back to the command line.

Your are sure that there is no hardware problem? Run memcheck for a
couple of hours for example.

> ...I have noticed that

> set a[0], "one"

> or

> a[0] = "one"

> appear to do the same thing. I cannot confirm that they do due to the
> bug above.

They are the same. The first one is PASM syntax, the second is PIR
syntax.

E.g. running your "imc trouble" code
$ parrot -o- hj.imc
_MAIN:
new P16, 31  # .PerlArray
new P17, 36  # .PerlString
set P16, 10
set P16[0], "Zero"
...

yields the generated PASM code (with variables names allocated to Parrot
registers).

> Harry

leo


NCI callback functions

2004-01-04 Thread Leopold Toetsch
Its a bit complicated and brain-mangling, the more in the absence of any 
examples, but the current design in pdd16 seems to lack some flexibility 
and is IMHO missing the proper handling of the (library-provided) 
external_data. The latter will be passed to the Sub somehow, but what then?

Here is (I think) a more flexible approach:

1) A new opcode "callback" (or "register_cb") or such, which is working 
like the current dlfunc opcode:

  callback (out Pcb, in Psub, in Sig)

Pcb ... NCI function object for that callback function
Psub ... Parrot Sub PMC to be called on behalf of that C callback
Sig ... Signature of the C-callback function
Sig allows additionally one special signature char "U" user-data, which 
is "Z" in pdd16, but I can remember ser-data better ;)

So void (*PQnoticeProcessor)(void *, const char*) would have Sig "vUt" 
and call a Parrot function f(P, S). Pdd16 type C callback is e.g. "vpU".

2) Actually registering the callback.

  dlfunc (out Pfunc, in Plib, "func_with_cb", "vCU")
  .pcc_begin prototpyed
  .arg Pcb
  .arg P_user_data
  .nci_call Pfunc
That is instead of passing in the callback and the Parrot Sub ("CY" in 
pdd16) the PMC obtained from 1) is passed with signature "C". The 
calling signature matches again the C-function which we call.

When now calling this function the action behind the scene is the same: 
The passed user_data PMC is combined with the callback PMC obtained from 
1) and passed on to the C function. When the C function is doing the 
callback, the NCI-stub generated in 1) is called, which extracts the 
Parrot subroutine from the passed user data and passes on the original 
user PMC and finally calls the PASM callback function.
But as the generated NCI stub in 1) knows the callback signature, this 
scheme should be appropriate for all callback functons, that have at 
least one "void *" user parameter to be passed on transparently.

Comments welcome,
leo


Re: Threads Design. A Win32 perspective.

2004-01-04 Thread Dan Sugalski
At 11:59 PM -0500 1/3/04, Gordon Henriksen wrote:
On Saturday, January 3, 2004, at 04:32 , Nigel Sandever wrote:

Transparent interlocking of VHLL fat structures performed 
automatically by the VM itself. No need for :shared or lock().
Completely specious, and repeatedly proven unwise. Shouldn't even be pursued.
Erm... that turns out not to be the case. A lot. (Yeah, I know, I 
said I wasn't paying attention)

An interpreter *must* lock any shared data structure, including PMCs, 
when accessing them. Otherwise they may be in an inconsistent state 
when being accessed, which will lead to data corruption or process 
crashing, which is unacceptable.

These locks do not have to correspond to HLL-level locks, though it'd 
be a reasonable argument that they share the same mutex.
--
Dan

--"it's like this"---
Dan Sugalski  even samurai
[EMAIL PROTECTED] have teddy bears and even
  teddy bears get drunk


Re: Problem during "make test"

2004-01-04 Thread Harry Jackson
Dan Sugalski wrote:
Let us know either way -- if upgrading gcc works then we're going to 
have to figure out how RH/GCC2.96 is breaking things so we can make it 
not happen. :(
I have now upgraded gcc to 3.3.2 and I am getting the same error. We are 
still freezing during test.

I have also noticed something that might be my crap "imc" or related to 
the problem

Can someone tell me if there is an error in the code below. When I run 
it repeatedly from the command line it sometimes freezes ie it prints 
the contents of the array and then just stops and I need to do a CTRL-C 
to get back to the command line.

.pcc_sub _MAIN prototyped
.param pmc argv
.local PerlArray a
a = new PerlArray
.local PerlString s
s = new PerlString
a =  10
a[0] = "Zero"
a[1] = "One"
a[2] = "Two"
a[3] = "Three"
a[4] = "Four"
a[5] = "Five"
a[6] = "Six"
a[7] = "Seven"
a[8] = "Eight"
s =  a[2]
print "\n"
print s
print "\n"
end
.end
I have also tried the above code using the "set" syntax and I get the 
same problem.

Are there any recommended examples of IMC in the source tree and which 
docs are the most recent. I have noticed that there are a lot of 
different ways of doing things (typical perl). I am trying to pick it up 
from the FAQ, some examples and the docs but its an uphill struggle. For 
instance I have noticed that

set a[0], "one"

or

a[0] = "one"

appear to do the same thing. I cannot confirm that they do due to the 
bug above.

I have got to the point where I am trying to put rows from Postgres into 
arrays and this is slowing me down a bit.

Harry



Re: Threads Design. A Win32 perspective.

2004-01-04 Thread Elizabeth Mattijsen
At 14:47 +0100 1/4/04, Leopold Toetsch wrote:
Elizabeth Mattijsen <[EMAIL PROTECTED]> wrote:
 > When you use an external library in Perl, such as e.g. libxml, you
 have Perl data-structures and libxml data-structures.  The Perl
 > data-structures contain pointers to the libxml data-structures.
 > In comes the starting of an ithread and Perl clones all of the Perl
 data-structures.  But it copies _only_ does things it knows about.
 And thus leaves the pointers to the libxml data-structures untouched.
 Now you have 2 Perl data-structures that point to the _same_ libxml
 > data-structures.  Voila, instant sharing.
I see. Our library loading code should take care of that. On thread
creation we call again the _init code, so that the external lib can
prepare itself to be used from multiple threads. But don't ask me about
details ;)
What you need, is basically being able to:

- register a class method to be called on cloning
- register an object method that is called whenever an _object_ is cloned
The CLONE sub that Perl5 has, is the class method.  The object method 
is missing from Perl (Thread::Bless is a way to remedy this problem).

I don't know what the _init code does, but judging by its name, its 
not giving enough info to be able to properly clone an object with 
external data structures.

Liz


Re: Threads Design. A Win32 perspective.

2004-01-04 Thread Leopold Toetsch
Elizabeth Mattijsen <[EMAIL PROTECTED]> wrote:

> When you use an external library in Perl, such as e.g. libxml, you
> have Perl data-structures and libxml data-structures.  The Perl
> data-structures contain pointers to the libxml data-structures.

> In comes the starting of an ithread and Perl clones all of the Perl
> data-structures.  But it copies _only_ does things it knows about.
> And thus leaves the pointers to the libxml data-structures untouched.
> Now you have 2 Perl data-structures that point to the _same_ libxml
> data-structures.  Voila, instant sharing.

I see. Our library loading code should take care of that. On thread
creation we call again the _init code, so that the external lib can
prepare itself to be used from multiple threads. But don't ask me about
details ;)

> Liz

leo


Re: Extendiers interface

2004-01-04 Thread Leopold Toetsch
Mattia Barbon <[EMAIL PROTECTED]> wrote:

>   AFAIR nothing but Parrot sources should #include parrot/parrot.h.
> The public interface is available through parrot/embed.h and
> parrot/extend.h. Correct?

Yep. But by far not all necessary interface functions & types are done.

> Mattia

leo


Re: Thread Question and Suggestion -- Matt

2004-01-04 Thread Leopold Toetsch
Matt Fowles <[EMAIL PROTECTED]> wrote:

> Why not have the threads that share everything share interpreters.  We
> can have these threads be within the a single interpreter thus
> eliminating the need for complicated GC locking and resource sharing
> complexity.  Because all of these threads will be one kernel level
> thread, they will not actually run concurrently and there will be no
> need to lock them.  We will have to implement a rudimentary scheduler in
> the interpreter, but I don't think that is actually that hard.

Jeff already answered that. Above model is e.g. implemented in Ruby. But
we want preemptive threads that can take advantage of mulitple
processors.

> Matt

leo


Re: Thread Question and Suggestion -- Matt

2004-01-04 Thread Leopold Toetsch
Jeff Clites <[EMAIL PROTECTED]> wrote:
> On Jan 3, 2004, at 5:24 PM, Matt Fowles wrote:

>> Why must each thread have its own interpreter?

> The short answer is that the bulk of the state of the virtual machine
> (including, and most importantly, its registers and register stacks)
> needs to be per-thread, since it represents the "execution context"
> which is logically thread-local.

Yep. A struct Parrot_Interp has all the information to run one thread of
execution. When you start a new VM thread, you need a new Parrot_Interp
to run the code.
But it depends on the thread type how this new Parrot_Interp is created.
The range is from all is new except the opcode stream (type 1 - nothing
shared thread) to only register + stacks + some more is distinct
for type 4 - the shared everything case.

Perl5 doesn't have a real interpreter structure, its mainly a bunch of
globals. But when compiled with threads enabled tons of macros convert
these to the thread context, which is then passed around as first
argument of API calls - mostly (that's at least how I understand the
src).
This thread context is our interpreter structure with all the necessary
information or state to run a piece of code as the only one or as a
distinct thread.

> That said, I do think we have a terminology problem, ...

> ... It would be clearer to say that we
> have two "threads" in one "interpreter", and just note that almost all
> of our state lives in the "thread" structure. (That would mean that the
> thing which is being passed into all of our API would be called the
> thread, not the interpreter,

Yep. But the thing passed around happens to be named interpreter, so
that's our thread state, if you run single-threaded or not doesn't
matter. A thread-enabled interpreter is created by filling one
additional structure "thread_data" with thread-specific items like
thread handle or thread ID. But anyway the state is called interpreter.

> JEff

leo


Re: Thread Question and Suggestion -- Matt

2004-01-04 Thread Dave Mitchell
On Sat, Jan 03, 2004 at 08:24:06PM -0500, Matt Fowles wrote:
> All~
> 
> I have a naive question:
> 
> Why must each thread have its own interpreter?
> 
> 
> I understand that this suggestion will likely be disregarded because of 
> the answer to the above question.  But here goes anyway...
> 
> Why not have the threads that share everything share interpreters.  We 
> can have these threads be within the a single interpreter thus 
> eliminating the need for complicated GC locking and resource sharing 
> complexity.  Because all of these threads will be one kernel level 
> thread

Why on earth would they be all one kernel-level thread?

-- 
Monto Blanco... scorchio!


Re: Threads Design. A Win32 perspective.

2004-01-04 Thread Gordon Henriksen
On Saturday, January 3, 2004, at 04:32 , Nigel Sandever wrote:

Transparent interlocking of VHLL fat structures performed automatically 
by the VM itself. No need for :shared or lock().
Completely specious, and repeatedly proven unwise. Shouldn't even be 
pursued.

Atomic guarantees on collections (or other data structures) are rarely 
meaningful; providing them is simply a waste of time. Witness the 
well-deserved death of Java's synchronized Vector class in favor of 
ArrayList. The interpreter absolutely shouldn't crash due to threading 
errors—it should protect itself using standard techniques—but it would 
be a mistake for parrot to mandate that all ops and PMCs be thread-safe.

The details of threaded programming cannot be hidden from the 
programmer. It's tempting to come up with clever ways to try, but the 
engine really has to take a back seat here. Smart programmers will 
narrow the scope of potential conflicts by reducing sharing of data 
structures in their threaded programs. Having done so, any atomicity 
guarantees on individual objects proves to be wasted effort: It will be 
resented by parrot's users as needless overhead, not praised. Consider 
the potential usage cases.

1. All objects in a non-threaded program.
2. Unshared objects in a threaded program.
3. Shared objects in a threaded program.
The first two cases will easily comprise 99% of all usage. In only the 
third case are synchronized objects even conceivably useful, and even 
then the truth of the matter is that they are of extremely limited 
utility: Their guarantees are more often than not too fine-grained to 
provide the high-level guarantees that the programmer truly needs. In 
light of this, the acquisition of a mutex (even a mutex that's 
relatively cheap to acquire) to push an element onto an array, or to 
access a string's data—well, it stops looking so good.

That said, the interpreter can't be allowed to crash due to threading 
errors. It must protect itself. But should a PerlArray written to 
concurrently from 2 threads guarantee its state make sense at the end of 
the program? I say no based upon precedent; the cost is too high.

—

Gordon Henriksen
[EMAIL PROTECTED]


Extendiers interface

2004-01-04 Thread Mattia Barbon
  Hello,
I have some questions about which part of the actual headers are
internal, which ones are for embedders and which ones are for
extenders.

  AFAIR nothing but Parrot sources should #include parrot/parrot.h.
The public interface is available through parrot/embed.h and
parrot/extend.h. Correct?

* types
  parrot/embed.h uses Parrot_Interp, Parrot_String, ...
  parrot/extend.h uses Parrot_INTERP, Parrot_STRING. I like the
  former more (metched Parrot_Int, Parrot_Float), but the two
  interfaces must at least agree on type names, whatever they
  are.

* ParrotIO
  It is currently not available. I think that API functions
  (PIO_open, PIO_read, ...) should be available for embedders
  (maybe not all af them, but at least some ought to), while
  the layer stuff should be available to extenders. Correct?
  Should they be available as PIO_* or as Parrot_io_*/Parrot_IO_*?

* custom PMC and vtables
  I assume custom PMC are for extenders. If this is correct,
  the vtable structure needs to be available to extenders,
  together with APIs for accessing PMCs flags, data pointer, etc
  (shoud that use macros as in Parrot-provided PMCs (PMC_data),
  or function calls?)

* Various functions:
  accessing globals, Parrot_runops_fromc*, mem_sys_*, Parrot_load_bytecode and
  registering IMCC compilers are some examples of functions that (I think)
  ought to be available in one form or the other, but aren't. Should they
  be?

Thanks!
Mattia



Re: Threads Design. A Win32 perspective.

2004-01-04 Thread Elizabeth Mattijsen
At 00:49 +0100 1/4/04, Leopold Toetsch wrote:
Elizabeth Mattijsen <[EMAIL PROTECTED]> wrote:
 > Indeed.  But as soon as there is something special such as a
 datastructure external to Perl between threads (which happens
 automatically "shared" automatically, because Perl doesn't know about
 > the datastructure,
Why is it shared automatically? Do you have an example for that?
When you use an external library in Perl, such as e.g. libxml, you 
have Perl data-structures and libxml data-structures.  The Perl 
data-structures contain pointers to the libxml data-structures.

In comes the starting of an ithread and Perl clones all of the Perl 
data-structures.  But it copies _only_ does things it knows about. 
And thus leaves the pointers to the libxml data-structures untouched. 
Now you have 2 Perl data-structures that point to the _same_ libxml 
data-structures.  Voila, instant sharing.

With disastrous results.  Because as soon as the thread ends, the 
cloned Perl object in the thread goes out of scope.  Perl then calls 
the DESTROY method on the object, which then frees up the libxml 
data-structures.  That's what it's supposed to do.  Meanwhile, back 
in the original thread, the pointers in the Perl object now point at 
freed memory, rather than a live libxml data-structure.  And chaos 
ensues sooner or later.  Of course, chaos could well ensue before the 
thread is ended, because both threads think they have exclusive 
access to the libxml data-structure.

Hope this explanation made sense.


 > ... so the cloned objects point to the same memory
 address), then you're in trouble.  Simply because you now have
 multiple DESTROYs called on the same external data-structure.  If the
 function of the DESTROY is to free the memory of the external
 data-structure, you're in trouble as soon as the first thread is
 > done.  ;-(
Maybe that DOD/GC can help here. A shared object can and will be
destroyed only, when the last holder of that object has released it.
But do you see now how complicated this can become if thread === interpreter?



Liz


Re: Thread Question and Suggestion -- Matt

2004-01-04 Thread Harry Jackson
Matt Fowles wrote:
I understand if this suggestion is dismissed for violating the rules, 
but I would like an answer to the question simply because I do not know 
the answer.
The most admiral reason for asking a question and I doubt it will be 
dismissed.

H