This should be a reply to Daniel Ruoso's post above, but I cannot persuade my nntp reader
to reply to a post made before I subscribed here. Sorry

On Wed, 12 May 2010 14:16:35 +0100, Daniel Ruoso <dan...@ruoso.com> wrote:

I have 3 main problems with your thinking.

1: You are conflating two fundamentally different views of the problem.
  a) The Perl 6 programmers semantic view.
  b) The P6 compiler (writers) implementation view.

These two views need to be kept cleanly separated in order that reference
implementation does not define the *only possible* implementation.

But, it is important that when designing the semantic view, that it is done
with a considerable regard for what /can/ be implemented.

2: You appear to be taking your references at face value.

For example, you've cited Erlang as one of your reference points.
And the Erlang docs describe the units of concurrency as "processes";
with "the parallelism is provided by Erlang and not the host operating system."

But, if I run one of the Erlang examples,
    http://www.erlang.org/examples/small_examples/tetris.erl
it uses two procesess: one with 13 OS threads, and the other with 5 OS threads;
even if I only run tetris:start(1).

Whilst until recently, Erlang did not use OS threads, relying instead upon
and internal, user-space scheduler--green threading, though you may find some denials of that by Erlangers because of the unfavorable comparison with Java green threading
in Java version 1 thru 4.

But recent versions have implemented multiple OS trheads each running a coroutine scheduler. The had to do this in order to achieve SMP scaling. Here is a little salient information:

        "The Erlang VM without SMP support has 1 scheduler which runs in the
        main process thread. The scheduler picks runnable Erlang processes and
IO-jobs from the run-queue and there is no need to lock data structures since
        there is only one thread accessing them.

        The Erlang VM with SMP support can have 1 to many schedulers which are
        run in 1 thread each. The schedulers pick runnable Erlang processes
        and IO-jobs from one common run-queue. In the SMP VM all shared data
        structures are protected with locks, the run-queue is one example of a
        data structure protected with locks."

Lock-free at the semantic level is a nice-to-have. But, whenever you have kernel threads talking to each other through shared memory--and you have to have if you are going to achieve
SMP scalability--then there will be some form of locking required.

All talk of "message passing protocols" is simply disguising the realities of the implementation. That is not a bad thing from the applications programmer's point of view--nor even the language designer's POV--but it still leaves the problem to be dealt with by the language & system implementers.

Whilst lock-free queues are possible--there are implementations of these available for Java 5 (which, of necessity, and to great effect, has now moved away from green threads and gone the Kernel threading route.)--they are very, very hardware dependant. Relying as they do upon CAS, which not all processor architectures support and not all languages give adequate access to.

For a very interesting, if rather long, insight into some of this, see Cliff Click's video about
"Fast Wait-free Hashtables":

        http://www.youtube.com/watch?v=WYXgtXWejRM&feature=player_embedded#!

One thing to note if you watch it all the way through is that your claim (in an earlier revision?) that "shared memory doesn't scale" is incorrect in the light of this video where 786 SMP processors
are using a hash for caching at very high speed.

3: By conflating the POVs of the sematic design and implementation, you are in danger of reinventing
several bad wheels, badly.

a) A green threading scheduler:
The Java guys spent a long time trying to get their's right before abandoning it.

The Erlang guys have taken a long time tuning their's, but due to Moore's Law running out, they have had to bow to the inevitability of kernel threading. And are now having to go through the pain of understanding and addressing how multiple event driven and cooperative schedulers running under the control of (various) preemptive scheduler(s) interact.

        Even Haskell has to use kernel threading:

"In GHC, threads created by forkIO are lightweight threads, and are managed entirely by the GHC runtime. Typically Haskell threads are an order of magnitude or two more efficient
        (in terms of both time and space) than operating system threads.

The downside of having lightweight threads is that only one can run at a time, so if one thread blocks in a foreign call, for example, the other threads cannot continue. The GHC runtime works around this by making use of full OS threads where necessary. When the program is built with the -threaded option (to link against the multithreaded version of the runtime), a thread making a safe foreign call will not block the other threads in the system; another OS thread will take over running Haskell threads until the original call returns. The runtime maintains a pool of these worker threads so that multiple Haskell threads can be involved in external calls simultaneously."

b) Cooperative scheduling:
        From memories of Windows 3/95/98/ME; Java version 1.1, 1.2, 1.3, 1.4;
        co-operative (pseudo) multi-tasking     is fraught with problems.

Besides the requirement for "good behavior", it leads to an extremely unnatural and difficult way of programming. Unless the programming environment takes care of maintaining application program state between yields, this falls to the application programmer. And they usually get it wrong.

CPS can alleviate this, but to my knowledge, as yet, only Scheme has successfully adopted the CPS.
        You have to ask yourself: why is that?

c) Event loops:
Event loops only work for processing where events are generated naturally. For GUIs and Web servers, they work well--until the response to the event takes longer to produce than the target maximum response
        time for the loop.

        The image takes longer to render than the GUIs 1/10 second response 
time.

The genome takes longer to process than the Web users 3-10 second patience.
        
----------

Currently, the most promising idea for resolving Perl 6's ambitions, with the realities of the ongoing move to compensate for the inability to keeping ramping up the processor clocks by moving to medium and large-scale
multi-core processors, is:

        http://lamp.epfl.ch/~phaller/doc/haller07coord.pdf

(Note: I'm been trying to digest and assimilate that paper (and several of its precursors) for a while now. I'm not yet entirely comfortable that it is a silver bullet for all of Perl 6's concurrency issues. But to date, it is the most promising, and it does at least offer some experimental results. )

And at the core of that, is the need for preemptive (kernel) threading and shared memory.

These can (and should!) be hidden from the application programmer, through the use of language and/or library
level abstractions, of which there are many promising [sic] candidates.

But fundamentally, they all require that:

        1) Preemptive scheduling be utilised.
        2) The core interpreter be fully reentrant.
        3) The core runtime libraries be fully reentrant.
        4) That the language distinguishes between, and handles appropriately,
                - process-global entites: IO handles; environment; pwd etc.
- runtime stack-based(*) (lexical) entities: locals (my vars in perl's terms).

(*)Regardless of whether these actually live on the thread-owned(**),
   processor stack or in a heap-based logical 'stack'.
(**)Even in a single threaded process, the unit of (kernel) dispatch is the thread,
    and the thread owns the processor stack.

Please note. Despite these criticisms of what you have so far, I applaud you for picking this hot potato up and running with it. It desperately needed to be brought out into the light and addressed.
Thank you for being the one :)

In this reply I have not detailed any alternatives to your proposals--I'm working on it furiously, but like you, I keep conflating the semantics and implementation. I hope to have something for you and others to summarily dismiss within a few days. Subject to Jenson winning in Monaco :)

Buk.



Em Ter, 2010-05-11 às 21:45 -0300, Daniel Ruoso escreveu:
he threading model topic still needs lots of thinking, so I decided to
try out some ideas.

After I sent the second version, I just realized I could make it simpler
by just assuming "one OS thread per Coroutine Group"... so here goes the
new version.

0 - No memory is shared between threads, so no locking is necessary.

At one level, Perl 6 pretty much gives you this for free.
As perl 6 does all it can to


1 - A value and a coroutine always belong to a thread, thus naturally
    implementing synchronized access to data.

2 - Coroutines are, conceptually, the equivalent to green threads,
    running in the same OS thread. Coroutines waiting for a return
    value are "blocked".

3 - The interpreter implements a scheduler, which will pick one of the
    "waiting" coroutines, it may also suspend a coroutine in order to
    implement time-sharing.

4 - In order to implement inter-thread communication, there are:

    4.1 - A "MessageQueue" works just like an Unix Pipe, it looks like
          a slurpy array. It has a configurable buffer size and
          coroutines might block when trying to read and/or write to
          it.

    4.2 - A "RemoteInvocation" is an object that has a identifier, a
          capture (which might, optionally, point to a "MessageQueue"
          as input) and another "MessageQueue" to be used as output.
          New coroutines are created in the target thread to execute
          that invocation.

    4.3 - An "InvocationQueue" is a special type of "MessageQueue"
          that accepts only "RemoteInvocation" objects.
   4.4 - A "RemoteValue" is an object that proxies requests to
          another coroutine group through a "RemoteInvocation".

5 - The thread group boundary is drawn by language constructs such
    as async, the feed operator, junctions, hyper operators.

6 - A value might have its ownership transferred to another thread if
    it can be detected that this value is in use only for that
    invocation or return value, in order to reduce the amount of
    "RemoteInvocation"s.

7 - A value might do a special "ThreadSafe" role if it is thread-safe
    (such as implementing bindings to thread-safe native libraries) In
    which case it is sent as-is to a different thread.

8 - A value might do a special "ThreadCloneable" role if it should be
     cloned instead of being proxied through a "RemoteValue" when sent
     in a "RemoteInvocation".

9 - Exception handling gets a bit hairy, since exceptions might only
     be raised at the calling scope when the value is consumed.

10 - List assignment and Sink context might result in synchronized
     behavior.


daniel



--
Using Opera's revolutionary e-mail client: http://www.opera.com/mail/

Reply via email to