RFC 1 (v4) Implementation of Threads in Perl

Perl6 RFC Librarian Sat, 30 Sep 2000 23:30:01 -0700
This and other RFCs are available on the web at
  http://dev.perl.org/rfc/

=head1 TITLE

Implementation of Threads in Perl

=head1 VERSION

  Maintainer: Bryan C. Warnock <[EMAIL PROTECTED]>
  Date: 1 Aug 2000
  Last Modified: 28 September 2000
  Mailing List: [EMAIL PROTECTED]
  Number: 1
  Version: 4
  Status: Frozen

=head1 CHANGES

No significant changes.  Freezing.

=head1 ABSTRACT

Perl 6 should be built around threads from the beginning.

=head1 DESCRIPTION

Perl 5 attempted (with relatively good success) to implement threads
atop the current architecture.  It did, unfortunately, leave several gaps,
traps, and "features" in heavy concurrency uses.  These weaknesses could
be fixed if Perl was built with threading from the start.

All Perl programs are threaded.  Most just only have one.

=head1 MOTIVATORS

Impatience, Hubris, and Laziness, in that order.

=head1 IMPLEMENTATION

Attempt to build-in thread constructs for the internals, while allowing
a Thread module to safely and robustly add user thread constructs, while
not making things bad for the single-threaded folks.

=head2 SUMMARY OF IMPLEMENTATION

The summary is based on the current Perl 5 architecture.  As the internal
structure changes, like using vtables, the thread design will have to 
change.

There has been very little direct discussion about threads or this RFC.
The bulk of the discussion centered around whether thread-global or
program-global global variables should be the default.

There has been a bit of indirect discussion on threads, like how various
structures will fit into a multi-threaded program.  No attempt has been made
to cross-reference these fringe discussions.

=over 4

=item 1

Create an additional pseudo-global stash, one per thread created, that is
local to that thread.  This stash would be the default space for non-
lexical variables.  C<$main::foo> == C<$foo> within one thread, while 
C<$main::foo> != C<$main::foo> in different threads.  There need be no way to 
specify the particular thread-space, as it should be visible only to the 
owning thread.

Rationale: This allows the bulk of the modules, which don't rely on sharing 
data across multiple threads, to work as written for a single-threaded
program.

=item 2

The Thread module should add a C<global> keyword or function that explicitly
accesses a variable in the program-global stash.

    global $main::foo = $foo;  # Let another thread know what my $foo is.
    global $main::foo = \$foo; # Share my local foo.  Dangerous!
    $foo = global $main::foo;  # Localize this instance of $main::foo.

Rationale: There needs to be some way to distinguish between the program-
global stash and the thread-global stashes.  It should undeniably mark the
data that is being shared.

=item 3

The Thread module should, on inclusion, also set the optree flag that triggers
mutex locking on variables within the perl core itself.  (As differentiated
by a user-created and controlled mutex.)  This is to guarantee that the
above constructs will actually work - user created race conditions aside.

Rationale: We don't want to bog single-threaded programs with needless mutex
operations, let alone attempt to do so on platforms that have no such 
beastie.

=item 4

Populate the thread-space stash with the built-ins, vice the program global 
stash.  Very few of the built-ins are meaningless in this threaded construct,
most are truly independent, and those that aren't, like $^O, should probably
be read-only anyway.  

=item 5

Threads are a runtime construct only.  Lexical scoping and compile issues 
are independent of any thread boundaries.  The initial interpretation is
done by a single thread.  C<use Threads> may set up the thread constructs,
but threads will not be spawned until runtime.

Rationale: It would probably be too difficult to have the interpreter try to
figure out what thread is which, without actual threads.  A cart and horse
problem.

=item 6

Modules should be loaded into the thread-global space when C<use>d.  As this
occurs during the compile, there is only one thread in existence to receive 
the module.  Perl should continue to track these modules (as it currently
does to prevent multiple inclusions), although it may need its own bin.
Subsequent threads should then reslurp these modules back in on their start
up.

Discussion: This is a potential problem child.  Your main thread C<use>s a
module, but may have its own data.  It may or may not have changed the
module's data before a secondary thread was started.  The second thread
can't simply copy the first thread's space, because it may have thread-
specific information contained there.  But you can't blindly globalize
each module, because then you lose the ability for a standard module to
work without its own explicit and robust reentrent handling.  Therefore,
each thread needs to I<reuse> the original modules upon creation.

        #!/your/path/to/perl -w

        use English;  # Lexical feel.  Will work across all threads.
        use Threads;
        use Foo;
        use Bar;

        # the main thread has all four above in its arena

        my $thread2 = Threads->new(\&start_thread2);  
        ...

        sub start_thread2
        {
            ...
            # Before this sub was called, the second thread was created, and it
            # reused English, Threads, Foo, and Bar, pulling them into its spaces.
        }


This, of course, could lead to massive program bloat, as each of ten threads
may each have their own copy of the same 50 modules.  Here, however, I'm
hoping that better engineering will take place, and programmers won't
needlessly C<use> modules globally.  See item 7.

Rationale: This is consistent with items 1, 4, and 7.  

=item 7

Threads should be able to C<use> and C<no> their own modules, outside of
the global ones.  This allows each thread to only use the modules they need,
saving on the global system bloat described above, and giving each thread the
most control over its environment - such as letting two threads use two 
different versions of the same module.

Discussion: Threads are mainly used in one of two ways: parallel processing
of the same dataspace, such as loop processing with non-iteration-dependent
data members; or "assembly line" parallel processing, with each thread
doing a different function on a database, like thread signals conversion,
with a buffer read, byte/nybble/bit swapping, data conversion, and buffer
write split across multiple threads.  I won't pretend to know which is truly
more common, or more deserving of the threading model, but the second is
certainly easier to use from an end-user perspective, and the first can
be converted to the second with relative ease.  This model allows different
threads to do different things with little impact on the footprint of the
program, by allowing each thread to C<use> its own modules.

        use English;  # In main thread at compilation
        use Threads;  # In main thread at compilation

        Threads->use("Foo");  # Loads Foo into main thread at runtime.

        my $thread2 = Threads->new(\&start_thread2);  
        ...

        sub start_thread2
        {
            ...
            # Before this sub was called, the second thread was created, and it
            # reused English and Threads.
            Threads->use("Bar");  # Remember, this is this thread's Threads
        }


Rationale: Consistent with above, with some pleasant side-effects.  With
inclusion rules being the same, a thread won't C<use> something it already
received globally, and modules themselves can use the original, standard
syntax, since the inclusion is deferred to runtime, a C<use Modules> will
load into the thread-space of the thread that invoked its C<use> method.
The downside, of course, is that module inclusion is deffered to downside.
Perhaps a separate RFC on program invariant checking is necessary....

=item 8

C<BEGIN> and C<END> blocks will continue to be single-threaded compile time
constructs.  

=item 9

Automatic mutex locking will occur on variable manipulation only.  Anything
longer will require explicit locking.

=back

=head2 IMPACT

=over 4

=item *

Impact on Perl on a non-thread-supporting architecture.  None.  (The mutex
locking code would be no-opped out, the Thread module would fail on inclusion,
preventing any of the global semantics from being invoked.  The thread
space would appear to the program to be a standard global stash.)

=item *

Impact on Perl built for non-threaded use.  None.  Same as above.

=item *

Impact on a single-threaded program under a multi-threaded Perl.  None, most
likely, for the above reasons.  (There would be an additional flag check, 
vice, I believe, automatic mutex locking under the current scheme.)

=item *

Impact on multi-threaded scripts under a multi-threaded Perl.  Some.  Mutex
locking would occur much as it does today.  Current Perl scripts, without 
the knowledge of global versus thread space would find data-sharing broken.
Threads have been declared experimental, and I believe the benefits of 
simplifying threads in general outweigh the heartache of those (who would
benefit) that would have to change their programs.

=item *

Impact on Perl 5.  Possible mutual compatibility between Perl 5 and Perl 6, 
with the exception of C<use Thread> and the sematics it would add.  See the
notes below about module inclusion. (Obviously, other changes to the language 
notwithstanding.)

=back

=head2 UNKNOWNS

=over 4

=item *

Mutex locking of a hash or array, and the scalars they contain, and vice
versa?

=item *

Mutex locking of a reference and the referree.

=item *

Limitations or assumptions on threading schemes other than those in pthreads,
due to the author's lack of experience with anything but.

=back

=head1 REFERENCES

RFC 86: IPC Mailboxes for Threads and Signals

RFC 178: Lightweight Threads

RFC 185: Thread Programming Model

RFC 293: MT-Safe Autovariables in perl 5.005 Threading
RFC 1 (v4) Implementation of Threads in Perl

Reply via email to