This should be a reply to Daniel Ruoso's post above, but I cannot persuade
my nntp reader
to reply to a post made before I subscribed here. Sorry
On Wed, 12 May 2010 14:16:35 +0100, Daniel Ruoso <dan...@ruoso.com> wrote:
I have 3 main problems with your thinking.
1: You are conflating two fundamentally different views of the problem.
a) The Perl 6 programmers semantic view.
b) The P6 compiler (writers) implementation view.
These two views need to be kept cleanly separated in order that reference
implementation does not define the *only possible* implementation.
But, it is important that when designing the semantic view, that it is done
with a considerable regard for what /can/ be implemented.
2: You appear to be taking your references at face value.
For example, you've cited Erlang as one of your reference points.
And the Erlang docs describe the units of concurrency as "processes";
with "the parallelism is provided by Erlang and not the host operating
system."
But, if I run one of the Erlang examples,
http://www.erlang.org/examples/small_examples/tetris.erl
it uses two procesess: one with 13 OS threads, and the other with 5 OS
threads;
even if I only run tetris:start(1).
Whilst until recently, Erlang did not use OS threads, relying instead upon
and internal, user-space scheduler--green threading, though you may find
some denials
of that by Erlangers because of the unfavorable comparison with Java green
threading
in Java version 1 thru 4.
But recent versions have implemented multiple OS trheads each running a
coroutine scheduler.
The had to do this in order to achieve SMP scaling. Here is a little
salient information:
"The Erlang VM without SMP support has 1 scheduler which runs in the
main process thread. The scheduler picks runnable Erlang processes and
IO-jobs from the run-queue and there is no need to lock data structures
since
there is only one thread accessing them.
The Erlang VM with SMP support can have 1 to many schedulers which are
run in 1 thread each. The schedulers pick runnable Erlang processes
and IO-jobs from one common run-queue. In the SMP VM all shared data
structures are protected with locks, the run-queue is one example of a
data structure protected with locks."
Lock-free at the semantic level is a nice-to-have. But, whenever you have
kernel threads
talking to each other through shared memory--and you have to have if you
are going to achieve
SMP scalability--then there will be some form of locking required.
All talk of "message passing protocols" is simply disguising the realities
of the implementation.
That is not a bad thing from the applications programmer's point of
view--nor even the language
designer's POV--but it still leaves the problem to be dealt with by the
language & system implementers.
Whilst lock-free queues are possible--there are implementations of these
available for Java 5
(which, of necessity, and to great effect, has now moved away from green
threads and gone the
Kernel threading route.)--they are very, very hardware dependant. Relying
as they do upon CAS, which
not all processor architectures support and not all languages give
adequate access to.
For a very interesting, if rather long, insight into some of this, see
Cliff Click's video about
"Fast Wait-free Hashtables":
http://www.youtube.com/watch?v=WYXgtXWejRM&feature=player_embedded#!
One thing to note if you watch it all the way through is that your claim
(in an earlier revision?)
that "shared memory doesn't scale" is incorrect in the light of this video
where 786 SMP processors
are using a hash for caching at very high speed.
3: By conflating the POVs of the sematic design and implementation, you
are in danger of reinventing
several bad wheels, badly.
a) A green threading scheduler:
The Java guys spent a long time trying to get their's right before
abandoning it.
The Erlang guys have taken a long time tuning their's, but due to Moore's
Law running out,
they have had to bow to the inevitability of kernel threading. And are
now having to go
through the pain of understanding and addressing how multiple event
driven and cooperative
schedulers running under the control of (various) preemptive scheduler(s)
interact.
Even Haskell has to use kernel threading:
"In GHC, threads created by forkIO are lightweight threads, and are
managed entirely by the
GHC runtime. Typically Haskell threads are an order of magnitude or two
more efficient
(in terms of both time and space) than operating system threads.
The downside of having lightweight threads is that only one can run at a
time, so if one thread
blocks in a foreign call, for example, the other threads cannot continue.
The GHC runtime works
around this by making use of full OS threads where necessary. When the
program is built with the
-threaded option (to link against the multithreaded version of the
runtime), a thread making a
safe foreign call will not block the other threads in the system; another
OS thread will take
over running Haskell threads until the original call returns. The runtime
maintains a pool of these
worker threads so that multiple Haskell threads can be involved in
external calls simultaneously."
b) Cooperative scheduling:
From memories of Windows 3/95/98/ME; Java version 1.1, 1.2, 1.3, 1.4;
co-operative (pseudo) multi-tasking is fraught with problems.
Besides the requirement for "good behavior", it leads to an extremely
unnatural and difficult way
of programming. Unless the programming environment takes care of
maintaining application program state
between yields, this falls to the application programmer. And they
usually get it wrong.
CPS can alleviate this, but to my knowledge, as yet, only Scheme has
successfully adopted the CPS.
You have to ask yourself: why is that?
c) Event loops:
Event loops only work for processing where events are generated
naturally. For GUIs and Web servers,
they work well--until the response to the event takes longer to produce
than the target maximum response
time for the loop.
The image takes longer to render than the GUIs 1/10 second response
time.
The genome takes longer to process than the Web users 3-10 second
patience.
----------
Currently, the most promising idea for resolving Perl 6's ambitions, with
the realities of the ongoing move to
compensate for the inability to keeping ramping up the processor clocks by
moving to medium and large-scale
multi-core processors, is:
http://lamp.epfl.ch/~phaller/doc/haller07coord.pdf
(Note: I'm been trying to digest and assimilate that paper (and several of
its precursors) for a while now.
I'm not yet entirely comfortable that it is a silver bullet for all of
Perl 6's concurrency issues.
But to date, it is the most promising, and it does at least offer some
experimental results. )
And at the core of that, is the need for preemptive (kernel) threading and
shared memory.
These can (and should!) be hidden from the application programmer, through
the use of language and/or library
level abstractions, of which there are many promising [sic] candidates.
But fundamentally, they all require that:
1) Preemptive scheduling be utilised.
2) The core interpreter be fully reentrant.
3) The core runtime libraries be fully reentrant.
4) That the language distinguishes between, and handles appropriately,
- process-global entites: IO handles; environment; pwd etc.
- runtime stack-based(*) (lexical) entities: locals (my vars in perl's
terms).
(*)Regardless of whether these actually live on the thread-owned(**),
processor stack or in a heap-based logical 'stack'.
(**)Even in a single threaded process, the unit of (kernel) dispatch is
the thread,
and the thread owns the processor stack.
Please note. Despite these criticisms of what you have so far, I applaud
you for picking this hot
potato up and running with it. It desperately needed to be brought out
into the light and addressed.
Thank you for being the one :)
In this reply I have not detailed any alternatives to your proposals--I'm
working on it furiously,
but like you, I keep conflating the semantics and implementation. I hope
to have something for you
and others to summarily dismiss within a few days. Subject to Jenson
winning in Monaco :)
Buk.
Em Ter, 2010-05-11 às 21:45 -0300, Daniel Ruoso escreveu:
he threading model topic still needs lots of thinking, so I decided to
try out some ideas.
After I sent the second version, I just realized I could make it simpler
by just assuming "one OS thread per Coroutine Group"... so here goes the
new version.
0 - No memory is shared between threads, so no locking is necessary.
At one level, Perl 6 pretty much gives you this for free.
As perl 6 does all it can to
1 - A value and a coroutine always belong to a thread, thus naturally
implementing synchronized access to data.
2 - Coroutines are, conceptually, the equivalent to green threads,
running in the same OS thread. Coroutines waiting for a return
value are "blocked".
3 - The interpreter implements a scheduler, which will pick one of the
"waiting" coroutines, it may also suspend a coroutine in order to
implement time-sharing.
4 - In order to implement inter-thread communication, there are:
4.1 - A "MessageQueue" works just like an Unix Pipe, it looks like
a slurpy array. It has a configurable buffer size and
coroutines might block when trying to read and/or write to
it.
4.2 - A "RemoteInvocation" is an object that has a identifier, a
capture (which might, optionally, point to a "MessageQueue"
as input) and another "MessageQueue" to be used as output.
New coroutines are created in the target thread to execute
that invocation.
4.3 - An "InvocationQueue" is a special type of "MessageQueue"
that accepts only "RemoteInvocation" objects.
4.4 - A "RemoteValue" is an object that proxies requests to
another coroutine group through a "RemoteInvocation".
5 - The thread group boundary is drawn by language constructs such
as async, the feed operator, junctions, hyper operators.
6 - A value might have its ownership transferred to another thread if
it can be detected that this value is in use only for that
invocation or return value, in order to reduce the amount of
"RemoteInvocation"s.
7 - A value might do a special "ThreadSafe" role if it is thread-safe
(such as implementing bindings to thread-safe native libraries) In
which case it is sent as-is to a different thread.
8 - A value might do a special "ThreadCloneable" role if it should be
cloned instead of being proxied through a "RemoteValue" when sent
in a "RemoteInvocation".
9 - Exception handling gets a bit hairy, since exceptions might only
be raised at the calling scope when the value is consumed.
10 - List assignment and Sink context might result in synchronized
behavior.
daniel
--
Using Opera's revolutionary e-mail client: http://www.opera.com/mail/