On 19/01/11 10:50, Stefan Marr wrote:
> On 18 Jan 2011, at 22:16, Sam Vilain wrote:
>> there doesn't seem to
>> be an interpreter under the sun which has successfully pulled off
>> threading with shared data.
> Could you explain what you mean with that statement?
>
> Sorry, but that's my topic, and the most well know interpreters that 'pulled 
> off' threading with shared data are for Java. The interpreter I am working on 
> is for manycore systems (running on a 64-core Tilera chip) and executes 
> Smalltalk (https://github.com/smarr/RoarVM).

You raise a very good point.  My statement is too broad and should
probably apply only to dynamic languages, executed on reference counted
VMs.  Look at some major ones - PHP, Python, Ruby, Perl, most JS engines
- none of them actually thread properly.  Well, Perl's "threading" does
run full speed, but actually copies every variable on the heap for each
new thread, massively bloating the process.

So the question is why should this be so, if C++ and Java, even
interpreted on a JVM, can do it?

In general, Java's basic types typically correspond with types that can
be dealt with atomically by processors, or are small enough to be passed
by value.  This already makes things a lot easier.

I've had another reason for the differences explained to me.  I'm not
sure I understand it fully enough to be able to re-explain it, but I'll
try anyway.  As I grasped the concept, the key to making VMs fully
threadable with shared state, is to first allow reference addresses to
change, such as via generational garbage collection.  This allows you to
have much clearer "stack frames", perhaps even really stored on the
thread-local/C stack, as opposed to most dynamic language interpreters
which barely use the C stack at all.  Then, when the long-lived objects
are discovered at scope exit time they can be safely moved into the next
memory pool, as well as letting access to "old" objects be locked (or
copied, in the case of Software Transactional Memory).  Access to
objects in your own frame can therefore be fast, and the number of locks
that have to be held reduced.

Perhaps to support/refute this argument, in your JVM, how do you handle:

- memory allocation: object references' timeline and garbage collection
- call stack frames and/or return continuations - the C stack or the heap?
- atomicity of functions (that's the "synchronized" keyword?)
- timely object destruction

I put it forward that the overall design of the interpreter, and
therefore what is possible in terms of threading, is highly influenced
by these factors.

When threading in C or C++ for instance (and this includes HipHop-TBB),
the call stack frame is on the C stack, so shared state is possible so
long as you pass heap pointers around and synchronise appropriately. 
The "virtual" machine is of a different nature, and it can work.  For
JVMs, as far as I know references are temporary and again the nature of
the execution environment is different.

For VMs where there is basically nothing on the stack, and everything on
the heap, it becomes a lot harder.  To talk about a VM I know better,
Perl has about 6 internal stacks all represented on the heap; a function
call/return stack, a lexical scope stack to represent what is in scope,
a variable stack (the "tmps" stack) for variables declared in those
scopes and for timely destruction, a stack to implement local($var)
called the "save" stack, a "mark" stack used for garbage collection, ok
well only 5 but I think you get my point.  From my reading of the PHP
internals so far there are similar set there too, so comparisons are
quite likely to be instructive.  It's a bit hard figuring out everything
that is going on internally (all these internal void* types don't help
either), and whether or not there is some inherent property of reference
counting, or whether it just makes a shared state model harder, is a
question I'm not sure is easy to answer.

In any case, full shared state is not required for a large set of useful
parallelism APIs, and in fact contains a number of pitfalls which are
difficult to explain, debug and fix.  I'm far more interested in simple
acceleration of tight loops - to make use of otherwise idle CPU cores
(perhaps virtual as in hyperthreading) to increase throughput - and APIs
like "map" express this well.  The idea is that the executor can start
up with no variables in scope, though hopefully shared code segments,
call some function on the data it is passed in, and pass the answers
back to the main thread and then set about cleaning itself up.

Sam

-- 
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php

Reply via email to