clip

allison Tue, 18 Apr 2006 14:06:51 -0700

Author: allison
Date: Tue Apr 18 13:45:23 2006
New Revision: 12354

Added:
   trunk/docs/pdds/clip/pdd22_io.pod
   trunk/docs/pdds/clip/pdd23_exceptions.pod   (contents, props changed)
   trunk/docs/pdds/clip/pdd24_events.pod
   trunk/docs/pdds/clip/pdd25_threads.pod
Removed:
   trunk/docs/pdds/clip/pddXX_events.pod
   trunk/docs/pdds/clip/pddXX_exceptions.pod
   trunk/docs/pdds/clip/pddXX_io.pod
   trunk/docs/pdds/clip/pddXX_threads.pod


Changes in other areas also in this revision:
Modified:
   trunk/   (props changed)
   trunk/MANIFEST

Log:
Blessing the draft PDDs with numbers.

Added: trunk/docs/pdds/clip/pdd22_io.pod
==============================================================================
--- (empty file)
+++ trunk/docs/pdds/clip/pdd22_io.pod   Tue Apr 18 13:45:23 2006
@@ -0,0 +1,773 @@
+# Copyright: 2001-2006 The Perl Foundation.
+# $Id $
+
+=head1 NAME
+
+docs/pdds/pddXX_io.pod - Parrot I/O
+
+=head1 ABSTRACT
+
+Parrot's I/O subsystem.
+
+=head1 VERSION
+
+$Revision $
+
+=head1 SYNOPSIS
+
+    open P0, "data.txt", ">"
+    print P0, "sample data\n"
+    close P0
+
+    open P1, "data.txt", "<"
+    S0 = read P1, 12
+    P2 = getstderr
+    print P2, S0
+    close P1
+
+    ...
+
+=head1 DEFINITIONS
+
+A "stream" allows input or output operations on a source/destination
+such as a file, keyboard, or text console. Streams are also called
+"filehandles", though only some of them have anything to do with files.
+
+=head1 DESCRIPTION
+
+This is a draft document defining Parrot's I/O subsystem, for both
+streams and network I/O. Parrot has both synchronous and asynchronous
+I/O operations. This section describes the interface, and the
+L<IMPLEMENTATION> section provides more details on general
+implementation questions and error handling. 
+
+The signatures for the asynchronous operations are nearly identical to
+the synchronous operations, but the asynchronous operations take an
+additional argument for a callback, and the only return value from the
+asynchronous operations is a status object. The callbacks take the
+status object as their first argument, and any return values as their
+remaining arguments. 
+
+The listing below says little about whether the opcodes return error
+information. For now assume that they can either return a status object,
+or return nothing. Error handling is discussed more thoroughly in the
+implementation section.
+
+=head2 I/O Stream Opcodes
+
+=head3 Opening and closing streams
+
+=over 4
+
+=item *
+
+C<open> opens a stream object based on a string path. It takes an
+optional string argument specifying the mode of the stream (read, write,
+append, read/write, etc.), and returns a stream object. Currently the
+mode of the stream is set with a string argument similar to Perl 5
+syntax, but a set of defined constants may fit better with Parrot's
+general architecture. 
+
+  0    PIOMODE_READ (default)
+  1    PIOMODE_WRITE
+  2    PIOMODE_APPEND
+  3    PIOMODE_READWRITE
+  4    PIOMODE_PIPE (read)
+  5    PIOMODE_PIPEWRITE
+
+The asynchronous version takes a PMC callback as an additional final
+argument. When the open operation is complete, it invokes the callback
+with two arguments: a status object and the opened stream object.
+
+=item *
+
+C<close> closes a stream object. It takes a single string object
+argument and returns a status object.
+
+The asynchronous version takes an additional final PMC callback
+argument. When the close operation is complete, it invokes the callback,
+passing it a status object.
+
+=back
+
+=head3 Retrieving existing streams
+
+These opcodes do not have asynchronous variants.
+
+=over 4
+
+=item *
+
+C<getstdin>, C<getstdout>, and C<getstderr> return a stream object for
+standard input, standard output, and standard error.
+
+=item *
+
+C<fdopen> converts an existing and already open UNIX integer file
+descriptor into a stream object. It also takes a string argument to
+specify the mode.
+
+=back
+
+=head3 Writing to streams
+
+=over 4
+
+=item *
+
+C<print> writes an integer, float, string, or PMC value to a stream.  It
+writes to standard output by default, but optionally takes a PMC
+argument to select another stream to write to.
+
+The asynchronous version takes an additional final PMC callback
+argument. When the print operation is complete, it invokes the callback,
+passing it a status object.
+
+=item *
+
+C<printerr> writes an integer, float, string, or PMC value to standard
+error.
+
+There is no asynchronous variant of C<printerr>. [It's just a shortcut.
+If they want an asynchronous version, they can use C<print>.]
+
+=back
+
+=head3 Reading from streams
+
+=over 4
+
+=item *
+
+C<read> retrieves a specified number of bytes from a stream into a
+string. [Note this is bytes, not codepoints.] By default it reads from
+standard input, but it also takes an alternate stream object source as
+an optional argument.
+
+The asynchronous version takes an additional final PMC callback
+argument, and only returns a status object. When the read operation is
+complete, it invokes the callback, passing it a status object and a
+string of bytes.
+
+=item *
+
+C<readline> retrieves a single line from a stream into a string. Calling
+C<readline> flags the stream as operating in line-buffer mode (see
+C<pioctl> below).
+
+The asynchronous version takes an additional final PMC callback
+argument, and only returns a status object. When the readline operation
+is complete, it invokes the callback, passing it a status object and a
+string of bytes.
+
+=item *
+
+C<peek> retrieves the next byte from a stream into a string, but doesn't
+remove it from the stream. By default it reads from standard input, but
+it also takes a stream object argument for an alternate source.
+
+There is no asynchronous version of C<peek>. [Does anyone have a line
+of reasoning why one might be needed? The concept of "next byte" seems
+to be a synchronous one.]
+
+=back
+
+=head3 Retrieving and setting stream properties
+
+=over 4
+
+=item *
+
+C<seek> sets the current file position of a stream object to an integer
+byte offset from an integer starting position (0 for the start of the
+file, 1 for the current position, and 2 for the end of the file). It
+also has a 64-bit variant that sets the byte offset by two integer
+arguments (one for the first 32 bits of the 64-bit offset, and one for
+the second 32 bits). [The two-register emulation for 64-bit integers may
+be deprecated in the future.]
+
+The asynchronous version takes an additional final PMC callback
+argument. When the seek operation is complete, it invokes the callback,
+passing it a status object and the stream object it was called on.
+
+=item *
+
+C<tell> retrieves the current file position of a stream object.  It also
+has a 64-bit variant that returns the byte offset as two integers (one
+for the first 32 bits of the 64-bit offset, and one for the second 32
+bits). [The two-register emulation for 64-bit integers may be deprecated
+in the future.]
+
+No asynchronous version.
+
+=item *
+
+C<getfd> retrieves the UNIX integer file descriptor of a stream object.
+
+No asynchronous version.
+
+=item *
+
+C<pioctl> provides low-level access to the attributes of a stream
+object. It takes a stream object, an integer flag to select a command,
+and a single integer argument for the command. It returns an integer
+indicating the success or failure of the command.
+
+The following constants are defined for the commands that C<pioctl> can
+execute:
+
+  0    PIOCTL_CMDRESERVED
+           No documentation available.
+  1    PIOCTL_CMDSETRECSEP
+           Set the record separator. [This doesn't actually work at the
+           moment.]
+  2    PIOCTL_CMDGETRECSEP
+           Get the record separator.
+  3    PIOCTL_CMDSETBUFTYPE
+           Set the buffer type.
+  4    PIOCTL_CMDGETBUFTYPE
+           Get the buffer type
+  5    PIOCTL_CMDSETBUFSIZE
+           Set the buffer size.
+  6    PIOCTL_CMDGETBUFSIZE
+           Get the buffer size.
+
+The following constants are defined as argument/return values for the
+buffer-type commands:
+
+  0    PIOCTL_NONBUF
+           Unbuffered I/O. Bytes are sent as soon as possible.
+  1    PIOCTL_LINEBUF
+          Line buffered I/O. Bytes are sent when a newline is
+           encountered.
+  2    PIOCTL_BLKBUF
+          Fully buffered I/O. Bytes are sent when the buffer is full.
+          [Called "BLKBUF" because bytes are sent as a block, but line
+          buffering also sends them as a block, so "FULBUF" might make
+           more sense.]
+
+[This opcode may be deprecated and replaced with methods on stream
+objects.]
+
+=item *
+
+C<poll> polls a stream or socket object for particular types of events
+(an integer flag) at a frequency set by seconds and microseconds (the
+final two integer arguments). [At least, that's what the documentation
+in src/io/io.c says. In actual fact, the final two arguments seem to be
+setting the timeout, exactly the same as the corresponding argument to
+the system version of C<poll>.]
+
+See the system documentation for C<poll> to see the constants for event
+types and return status.
+
+This opcode is inherently synchronous (poll is "synchronous I/O
+multiplexing"), but it can retreive status information from a stream or
+socket object whether the object is being used synchronously or
+asynchronously.
+
+=back
+
+=head3 Deprecated opcodes
+
+=over
+
+=item *
+
+C<write> prints to standard output but it cannot select another stream.
+It only accepts a PMC value to write. This is redundant with the
+C<print> opcode, so it will be deprecated.
+
+=back
+
+=head2 Filesystem Opcodes
+
+=over 4
+
+=item *
+
+C<stat> retrieves information about a file on the filesystem. It takes a
+string filename or an integer argument of a UNIX file descriptor [or an
+already opened stream object?], and an integer flag for the type of
+information requested. It returns an integer containing the requested
+information.  The following constants are defined for the type of
+information requested (see F<runtime/parrot/include/stat.pasm>):
+
+  0    STAT_EXISTS
+           Whether the file exists.
+  1    STAT_FILESIZE
+           The size of the file.
+  2    STAT_ISDIR
+           Whether the file is a directory.
+  3    STAT_ISDEV
+           Whether the file is a device such as a terminal or a disk.
+  4    STAT_CREATETIME
+           The time the file was created.
+           (Currently just returns -1.)
+  5    STAT_ACCESSTIME
+           The last time the file was accessed.
+  6    STAT_MODIFYTIME
+           The last time the file data was changed.
+  7    STAT_CHANGETIME
+           The last time the file metadata was changed.
+  8    STAT_BACKUPTIME
+          The last time the file was backed up. 
+           (Currently just returns -1.)
+  9    STAT_UID
+           The user ID of the file.
+  10   STAT_GID
+           The group ID of the file.
+
+The asynchronous version takes an additional final PMC callback
+argument, and only returns a status object. When the stat operation is
+complete, it invokes the callback, passing it a status object and an
+integer containing the status information.
+
+=item *
+
+C<unlink> deletes a file from the filesystem. It takes a single string
+argument of a filename (including the path).
+
+The asynchronous version takes an additional final PMC callback
+argument. When the unlink operation is complete, it invokes the
+callback, passing it a status object.
+
+=item *
+
+C<rmdir> deletes a directory from the filesystem if that directory is
+empty. It takes a single string argument of a directory name (including
+the path).
+
+The asynchronous version takes an additional final PMC callback
+argument. When the rmdir operation is complete, it invokes the callback,
+passing it a status object.
+
+=item *
+
+C<opendir> opens a stream object for a directory. It takes a single
+string argument of a directory name (including the path) and returns a
+stream object.
+
+The asynchronous version takes an additional final PMC callback
+argument, and only returns a status object. When the opendir operation
+is complete, it invokes the callback, passing it a status object and a
+newly created stream object.
+
+=item *
+
+C<readdir> reads a single item from an open directory stream object. It
+takes a single stream object argument and returns a string containing
+the path and filename/directory name of the current item. (i.e. the
+directory stream object acts as an iterator.)
+
+The asynchronous version takes an additional final PMC callback
+argument, and only returns a status object. When the readdir operation
+is complete, it invokes the callback, passing it a status object and the
+string result.
+
+=item *
+
+C<telldir> returns the current position of C<readdir> operations on a
+directory stream object.
+
+No asynchronous version.
+
+=item *
+
+C<seekdir> sets the current position of C<readdir> operations on a
+directory stream object. It takes a stream object argument and an
+integer for the position. [The system C<seekdir> requires that the
+position argument be the result of a previous C<telldir> operation.]
+
+The asynchronous version takes an additional final PMC callback
+argument. When the seekdir operation is complete, it invokes the
+callback, passing it a status object and the directory stream object it
+was called on.
+
+=item *
+
+C<rewinddir> sets the current position of C<readdir> operations on a
+directory stream object back to the beginning of the directory. It takes
+a stream object argument.
+
+No asynchronous version.
+
+=item *
+
+C<closedir> closes a directory stream object. It takes a single stream
+object argument.
+
+The asynchronous version takes an additional final PMC callback
+argument. When the closedir operation is complete, it invokes the
+callback, passing it a status object.
+
+=back
+
+=head2 Network I/O Opcodes
+
+Most of these opcodes conform to the standard UNIX interface, but the
+layer API allows alternate implementations for each.
+
+=over 4
+
+=item *
+
+C<socket> returns a new socket object from a given address family,
+socket type, and protocol number (all integers). The socket object's
+boolean value can be tested for whether the socket was created.
+
+The asynchronous version takes an additional final PMC callback
+argument, and only returns a status object. When the socket operation is
+complete, it invokes the callback, passing it a status object and a new
+socket object.
+
+=item *
+
+C<sockaddr> returns an object representing a socket address, generated
+from a port number (integer) and an address (string).
+
+No asynchronous version.
+
+=item *
+
+C<connect> connects a socket object to an address.
+
+The asynchronous version takes an additional final PMC callback
+argument, and only returns a status object. When the socket operation is
+complete, it invokes the callback, passing it a status object and the
+socket object it was called on. [If you want notification when a connect
+operation is completed, you probably want to do something with that
+connected socket object.]
+
+=item *
+
+C<recv> receives a message from a connected socket object. It returns
+the message in a string.
+
+The asynchronous version takes an additional final PMC callback
+argument, and only returns a status object. When the recv operation is
+complete, it invokes the callback, passing it a status object and a
+string containing the received message.
+
+=item *
+
+C<send> sends a message string to a connected socket object. 
+
+The asynchronous version takes an additional final PMC callback
+argument, and only returns a status object. When the send operation is
+complete, it invokes the callback, passing it a status object.
+
+=item *
+
+C<sendto> sends a message string to an address specified in an address
+object (first connecting to the address).
+
+The asynchronous version takes an additional final PMC callback
+argument, and only returns a status object. When the sendto operation is
+complete, it invokes the callback, passing it a status object.
+
+
+=item *
+
+C<bind> binds a socket object to the port and address specified by an
+address object (the packed result of C<sockaddr>).
+
+The asynchronous version takes an additional final PMC callback
+argument, and only returns a status object. When the bind operation is
+complete, it invokes the callback, passing it a status object and the
+socket object it was called on. [If you want notification when a bind
+operation is completed, you probably want to do something with that
+bound socket object.]
+
+=item *
+
+C<listen> specifies that a socket object is willing to accept incoming
+connections. The integer argument gives the maximum size of the queue
+for pending connections.
+
+There is no asynchronous version. C<listen> marks a set of attributes on
+the socket object.
+
+=item *
+
+C<accept> accepts a new connection on a given socket object, and returns
+a newly created socket object for the connection. 
+
+The asynchronous version takes an additional final PMC callback
+argument, and only returns a status object. When the accept operation
+receives a new connection, it invokes the callback, passing it a status
+object and a newly created socket object for the connection. [While the
+synchronous C<accept> has to be called repeatedly in a loop (once for
+each connection received), the asynchronous version is only called once,
+but continues to send new connection events until the socket is closed.]
+
+=item *
+
+C<shutdown> closes a socket object for reading, for writing, or for all
+I/O. It takes a socket object argument and an integer argument for the
+type of shutdown:
+
+  0    PIOSHUTDOWN_READ
+           Close the socket object for reading.
+  1    PIOSHUTDOWN_WRITE
+           Close the socket object for writing.
+  2    PIOSHUTDOWN
+           Close the socket object.
+
+=back
+
+
+=head1 IMPLEMENTATION
+
+The Parrot I/O subsystem uses a per-interpreter stack to provide a
+layer-based approach to I/O. Each layer implements a subset of the
+C<ParrotIOLayerAPI> vtable. To find an I/O function, the layer stack is
+searched downwards until a non-NULL function pointer is found for
+that particular slot.
+
+=head2 Synchronous and Asynchronous Operations
+
+Currently, Parrot only implements synchronous I/O operations.
+Asynchronous operations are essentially the same as the synchronous
+operations, but each asynchronous operation runs in its own thread.
+
+Note: this is a deviation from the existing plan, which had all I/O
+operations run internally as asynchronous, and the synchronous
+operations as a compatibility layer on top of the asynchronous
+operations. This conceptual simplification means that all I/O operations
+are possible without threading support (for example, in a stripped-down
+version of Parrot running on a PDA). [Asynchronous operations don't have
+to use Parrot threads, they could use some alternate threading
+implementation. But it's overkill to develop two threading
+implementations. If Parrot threads turn out to be too heavyweight, we
+may want to look into a lighter weight variation for asynchronous
+operations.]
+
+The asynchronous I/O implementation will use Parrot's I/O layer
+architecture so some platforms can take advantage of their built-in
+asynchronous operations instead of using Parrot threads.
+
+Communication between the calling code and the asynchronous operation
+thread will be handled by a shared status object. The operation thread
+will update the status object whenever the status changes, and the
+calling code can check the status object at any time. [Twisted has an
+interesting variation on this, in that it replaces the status object
+with the returned result of the asynchronous call when the call is
+complete. That is probably too confusing, but we might give the status
+object a reference to the returned result.]
+
+The current strategy for differentating the synchronous calls from
+asynchronous ones relies on the presence of a callback argument in the
+asynchronous calls. If we wanted asynchronous calls that don't supply
+callbacks (perhaps if the user wants to manually check later if the
+operation succeded) we would need another strategy to differentiate the
+two. This is probably enough of a fringe case that we don't need to
+provide opcodes for it, provided they can access the functionality via
+methods on ParrotIO objects.
+
+=head2 Error Handling
+
+Currently some of the networking opcodes (C<connect>, C<recv>, C<send>,
+C<poll>, C<bind>, and C<listen>) return an integer indicating the status
+of the call, -1 or a system error code if unsuccessful. Other I/O
+opcodes (such as C<getfd> and C<accept>) have various different
+strategies for error notification, and others have no way of marking
+errors at all. We want to unify all I/O opcodes so they use a consistent
+strategy for error notification. There are several options in how we do
+this.
+
+=head3 Integer status codes
+
+One approach is to have every I/O operation return an integer status
+code indicating success or failure. This approach has the advantage of
+being lightweight: returning a single additional integer is cheap. The
+disadvantage is that it's not very flexible: the only way to look for
+errors is to check the integer return value, possibly comparing it to a
+predefined set of error constants.
+
+=head3 Exceptions
+
+Another option is to have all I/O operations throw exceptions on errors.
+The advantage is that it keeps the error tracking information
+out-of-band, so it doesn't affect the arguments or return values of the
+calls (some opcodes that have a return value plus an integer status code
+have odd looking signatures). One disadvantage of this approach is that
+it forces all users to handle exceptions from I/O operations even if
+they aren't using exceptions otherwise. 
+
+A more significant disadvantage is that exeptions don't work well with
+asynchronous operations. Exception handlers are set for a particular
+dynamic scope, but with an asynchronous operation, by the time an
+exception is thrown execution has already left the dynamic scope where
+the exception handler was set. [Though, this partly depends on how
+exceptions are implemented.]
+
+=head3 Error callbacks
+
+A minor variation on the exceptions option is to pass an error callback
+into each I/O opcode. This solves the problem of asynchronous operations
+because the operation has its own custom error handling code rather than
+relying on an exception handler in its dynamic scope.
+
+The disadvantage is that the user has to define a custom error handler
+routine for every call. It also doesn't cope well with cases where
+multiple different kinds of errors may be returned by a single opcode.
+(The one error handler would have to cope with all possible types of
+errors.) There is an easier way.
+
+=head3 Hybrid solution
+
+Another option is to return a status object from each I/O operation. The
+status object could be used to get an integer status code, string
+status/error message, or boolean success value. It could also provide a
+method to throw an exception on error conditions. There could even be a
+global option (or an option set on a particular I/O object) that tells
+Parrot to always throw exceptions on errors in synchronous I/O
+operations, implemented by calling this method on the status object
+before returning from the I/O opcode.
+
+The advantages are that this works well with asynchronous and
+synchronous operations, and provides flexibility for multiple different
+uses.  Also, something like a status object will be needed anyway to
+allow users to check on the status of a particular asynchronous call in
+progress, so this is a nice unification.
+
+The disadvantage is that a status object involves more overhead than a
+simple integer status code.
+
+=head2 IPv6 Support
+
+The transition from IPv4 to IPv6 is in progress, though not likely to be
+complete anytime soon. Most operating systems today offer at least
+dual-stack IPv6 implementations, so they can use either IPv4 or IPv6,
+depending on what's available. Parrot also needs to support either
+protocol. For the most part, the network I/O opcodes should internally
+handle either addressing scheme, without requiring the user to specify
+which scheme is being used.
+
+IETF recommends defaulting to IPv6 connections and falling back to IPv4
+connections when IPv6 fails. This would give us more solid testing of
+Parrot's compatibility IPv6, but may be too slow. Either way, it's a
+good idea to make setting the default (or selecting one exclusively) an
+option when compiling Parrot.
+
+The most important issues for Parrot to consider with IPv6 are:
+
+=over 4
+
+=item *
+
+Support 128 bit addresses. IPv6 addresses are colon-separated
+hexadecimal numbers, such as C<20a:95ff:fef5:7e5e>.
+
+=item *
+
+Any address parsing should be able to support the address separated from
+a port number or prefix/length by brackets: C<[20a:95ff:fef5:7e5e]:80>
+and C<[20a:95ff::]/64>.
+
+=item *
+
+Packed addresses, such as the result of the C<sockaddr> opcode, should
+be passed around as an object (or at least a structure) rather than as a
+string.
+
+=back
+
+See the relevant IETF RFCs: "Application Aspects of IPv6 Transition"
+(http://www.ietf.org/rfc/rfc4038.txt) and "Basic Socket Interface
+Extensions for IPv6" (http://www.ietf.org/rfc/rfc3493.txt).
+
+=head2 Excerpt
+
+[Below is an excerpt from "Perl 6 and Parrot Essentials", included to
+seed discussion.]
+
+Parrot's base I/O system is fully asynchronous I/O with callbacks and
+per-request private data. Since this is massive overkill in many cases,
+we have a plain vanilla synchronous I/O layer that your programs can use
+if they don't need the extra power.
+
+Asynchronous I/O is conceptually pretty simple. Your program makes an
+I/O request. The system takes that request and returns control to your
+program, which keeps running. Meanwhile the system works on satisfying
+the I/O request. When the request is satisfied, the system notifies
+your program in some way. Since there can be multiple requests
+outstanding, and you can't be sure exactly what your program will be
+doing when a request is satisfied, programs that make use of
+asynchronous I/O can be complex.
+
+Synchronous I/O is even simpler. Your program makes a request to the
+system and then waits until that request is done. There can be only
+one request in process at a time, and you always know what you're
+doing (waiting) while the request is being processed. It makes your
+program much simpler, since you don't have to do any sort of
+coordination or synchronization.
+
+The big benefit of asynchronous I/O systems is that they generally
+have a much higher throughput than a synchronous system. They move
+data around much faster--in some cases three or four times faster.
+This is because the system can be busy moving data to or from disk
+while your program is busy processing data that it got from a previous
+request.
+
+For disk devices, having multiple outstanding requests--especially on
+a busy system--allows the system to order read and write requests to
+take better advantage of the underlying hardware. For example, many
+disk devices have built-in track buffers. No matter how small a
+request you make to the drive, it always reads a full track. With
+synchronous I/O, if your program makes two small requests to the same
+track, and they're separated by a request for some other data, the
+disk will have to read the full track twice. With asynchronous I/O, on
+the other hand, the disk may be able to read the track just once, and
+satisfy the second request from the track buffer.
+
+Parrot's I/O system revolves around a request. A request has three
+parts: a buffer for data, a completion routine, and a piece of data
+private to the request. Your program issues the request, then goes about
+its business. When the request is completed, Parrot will call the
+completion routine, passing it the request that just finished. The
+completion routine extracts out the buffer and the private data, and
+does whatever it needs to do to handle the request. If your request
+doesn't have a completion routine, then your program will have to
+explicitly check to see if the request was satisfied.
+
+Your program can choose to sleep and wait for the request to finish,
+essentially blocking. Parrot will continue to process events while
+your program is waiting, so it isn't completely unresponsive. This is
+how Parrot implements synchronous I/O--it issues the asynchronous
+request, then immediately waits for that request to complete.
+
+The reason we made Parrot's I/O system asynchronous by default was
+sheer pragmatism. Network I/O is all asynchronous, as is GUI
+programming, so we knew we had to deal with asynchrony in some form.
+It's also far easier to make an asynchronous system pretend to be
+synchronous than it is the other way around. We could have decided to
+treat GUI events, network I/O, and file I/O all separately, but there
+are plenty of systems around that demonstrate what a bad idea that is.
+
+=head1 ATTACHMENTS
+
+None.
+
+=head1 FOOTNOTES
+
+None.
+
+=head1 REFERENCES
+
+  src/io/io.c
+  src/ops/io.ops
+  include/parrot/io.h
+  runtime/parrot/library/Stream/*
+  src/io/io_unix.c
+  src/io/io_win32.c
+  Perl 5's IO::AIO
+  Perl 5's POE
+
+=cut
+
+__END__
+Local Variables:
+  fill-column:78
+End:

Added: trunk/docs/pdds/clip/pdd23_exceptions.pod
==============================================================================
--- (empty file)
+++ trunk/docs/pdds/clip/pdd23_exceptions.pod   Tue Apr 18 13:45:23 2006
@@ -0,0 +1,325 @@
+# Copyright: 2001-2006 The Perl Foundation.
+# $Id$
+
+=head1 NAME
+
+docs/pdds/clip/pddXX_exceptions.pod - Parrot Exceptions
+
+=head1 ABSTRACT
+
+This document defines the requirements and implementation strategy for
+Parrot's exception system.
+
+=head1 VERSION
+
+$Revision$
+
+=head1 DESCRIPTION
+
+An exception system gives user-developed code control over how run-time
+error conditions are handled. Exceptions are errors or unusual
+conditions that require special processing. An exception handler
+performs the necessary steps to appropriately respond to a particular
+kind of exception.
+
+=head2 Exception Opcodes
+
+These are the opcodes relevant to exceptions and exception handlers:
+
+=over
+
+=item *
+
+C<push_eh> creates an exception handler and pushes it onto the control
+stack. It takes a label (the location of the exception handler) as its
+only argument. [Is this right? Treating exception handlers as label
+jumps rather than full subroutines is error-prone.]
+
+=item *
+
+C<clear_eh> removes the most recently added exception from the control
+stack.
+
+=item *
+
+C<throw> throws an exception object.
+
+=item *
+
+C<rethrow> rethrows an exception object. It can only be called from
+inside an exception handler.
+
+=item *
+
+C<die> throws an exception. It takes two arguments, one for the severity
+of the exception and one for the type of exception.
+
+If the severity is C<EXCEPT_DOOMED>, it exits via a call to
+C<_exit($2)>, which is not a catchable exception.
+
+These are the constants defined for severity:
+
+  0    EXCEPT_NORMAL
+  1    EXCEPT_WARNING
+  2    EXCEPT_ERROR
+  3    EXCEPT_SEVERE
+  4    EXCEPT_FATAL
+  5    EXCEPT_DOOMED
+  6    EXCEPT_EXIT
+
+These are the constants defined for exception types:
+
+  0    E_Exception
+  1    E_SystemExit
+  2    E_StopIteration
+  3    E_StandardError
+  4    E_KeyboardInterrupt
+  5    E_ImportError
+  6    E_EnvironmentError
+  7    E_IOError
+  8    E_OSError
+  9    E_WindowsError
+  10   E_VMSError
+  11   E_EOFError
+  12   E_RuntimeError
+  13   E_NotImplementedError
+  14   E_LibraryNotLoadedError
+  15   E_NameError
+  16   E_UnboundLocalError
+  17   E_AttributeError
+  18   E_SyntaxError
+  19   E_IndentationError
+  20   E_TabError
+  21   E_TypeError
+  22   E_AssertionError
+  23   E_LookupError
+  24   E_IndexError
+  25   E_KeyError
+  26   E_ArithmeticError
+  27   E_OverflowError
+  28   E_ZeroDivisionError
+  29   E_FloatingPointError
+  30   E_ValueError
+  31   E_UnicodeError
+  32   E_UnicodeEncodeError
+  33   E_UnicodeDecodeError
+  34   E_UnicodeTranslateError
+  35   E_ReferenceError
+  36   E_SystemError
+  37   E_MemoryError
+  37   E_LAST_PYTHON_E
+  38   BAD_BUFFER_SIZE
+  39   MISSING_ENCODING_NAME
+  40   INVALID_STRING_REPRESENTATION
+  41   ICU_ERROR
+  42   UNIMPLEMENTED
+  43   NULL_REG_ACCESS
+  44   NO_REG_FRAMES
+  45   SUBSTR_OUT_OF_STRING
+  46   ORD_OUT_OF_STRING
+  47   MALFORMED_UTF8
+  48   MALFORMED_UTF16
+  49   MALFORMED_UTF32
+  50   INVALID_CHARACTER
+  51   INVALID_CHARTYPE
+  52   INVALID_ENCODING
+  53   INVALID_CHARCLASS
+  54   NEG_REPEAT
+  55   NEG_SUBSTR
+  56   NEG_SLEEP
+  57   NEG_CHOP
+  58   INVALID_OPERATION
+  59   ARG_OP_NOT_HANDLED
+  60   KEY_NOT_FOUND
+  61   JIT_UNAVAILABLE
+  62   EXEC_UNAVAILABLE
+  63   INTERP_ERROR
+  64   PREDEREF_LOAD_ERROR
+  65   PARROT_USAGE_ERROR
+  66   PIO_ERROR
+  67   PARROT_POINTER_ERROR
+  68   DIV_BY_ZERO
+  69   PIO_NOT_IMPLEMENTED
+  70   ALLOCATION_ERROR
+  71   INTERNAL_PANIC
+  72   OUT_OF_BOUNDS
+  73   JIT_ERROR
+  74   EXEC_ERROR
+  75   ILL_INHERIT
+  76   NO_PREV_CS
+  77   NO_CLASS
+  78   LEX_NOT_FOUND
+  79   PAD_NOT_FOUND
+  80   ATTRIB_NOT_FOUND
+  81   GLOBAL_NOT_FOUND
+  82   METH_NOT_FOUND
+  83   WRITE_TO_CONSTCLASS
+  84   NOSPAWN
+  85   INTERNAL_NOT_IMPLEMENTED
+  86   ERR_OVERFLOW
+  87   LOSSY_CONVERSION
+
+=item *
+
+C<exit> throws an exception of severity C<EXCEPT_EXIT>. It takes a
+single argument for the exception type.
+
+=item *
+
+C<pushaction> pushes a subroutine object onto the control stack. If the
+control stack is unwound due to an exception (or C<popmark>, or
+subroutine return), the subroutine is invoked with an integer argument:
+C<0> means a normal return; C<1> means an exception has been raised.
+[Seems like there's lots of room for dangerous collisions here.]
+
+=back
+
+=head1 IMPLEMENTATION
+
+[I'm not convinced the control stack is the right way to handle
+exceptions. Most of Parrot is based on the continuation-passing style of
+control, shouldn't exceptions be based on it too? See bug #38850.]
+
+=head2 Opcodes that Throw Exceptions
+
+Exceptions have been incorporated into built-in opcodes in a limited
+way, but they aren't used consistently.
+
+Divide by zero exceptions are thrown by C<div>, C<fdiv>, and C<cmod>.
+
+The C<ord> opcode throws an exception when it's passed an empty
+argument, or passed a string index that's outside the length of the
+string.
+
+The C<classoffset> opcode throws an exception when it's asked to
+retrieve the attribute offset for a class that isn't in the object's
+inheritance hierarchy.
+
+The C<find_charset> opcode throws an exception if the charset name it's
+looking up doesn't exist. The C<trans_charset> opcode throws an
+exception on "information loss" (presumably, this means when one charset
+doesn't have a one-to-one correspondence in the other charset). 
+
+The C<find_encoding> opcode throws an exception if the encoding name
+it's looking up doesn't exist. The C<trans_encoding> opcode throws an
+exception on "information loss" (presumably, this means when one
+encoding doesn't have a one-to-one correspondence in the other
+encoding). 
+
+Parrot's default version of the C<LexPad> PMC uses exceptions, though
+other implementations can choose to return error values instead.
+C<store_lex> throws an exception when asked to store a lexical variable
+in a name that doesn't exist. C<find_lex> throws an exception when asked
+to retrieve a lexical name that doesn't exist.
+
+Other opcodes respond to an C<errorson> setting to decide whether to
+throw an exception or return an error value. C<find_global> throws an
+exception (or returns a Null PMC) if the global name requested doesn't
+exist. C<find_name> throws an exception (or returns a Null PMC) if the
+name requested doesn't exist in a lexical, current, global, or built-in
+namespace.
+
+It's a little odd that so few opcodes throw exceptions (these are the
+ones that are documented, but a few others throw exceptions internally
+even though they aren't documented as doing so). It's worth considering
+either expanding the use of exceptions consistently throughout the
+opcode set, or eliminating exceptions from the opcode set entirely. The
+strategy for error handling should be consistent, whatever it is. [I
+like the way C<LexPad>s and the C<errorson> settings provide the option
+for exception-based or non-exception-based implementations, rather than
+forcing one or the other.]
+
+=head2 Excerpt
+
+[Excerpt from "Perl 6 and Parrot Essentials" to seed discussion.
+Out-of-date in some ways, and in others it was simply speculative.]
+
+Exceptions provide a way of calling a piece of code outside the normal
+flow of control. They are mainly used for error reporting or cleanup
+tasks, but sometimes exceptions are just a funny way to branch from
+one code location to another one. 
+
+Exceptions are objects that hold all the information needed to handle
+the exception: the error message, the severity and type of the error,
+etc. The class of an exception object indicates the kind of exception
+it is.
+
+Exception handlers are derived from continuations. They are ordinary
+subroutines that follow the Parrot calling conventions, but are never
+explicitly called from within user code. User code pushes an exception
+handler onto the control stack with the C<push_eh> opcode. The system
+calls the installed exception handler only when an exception is thrown.
+
+    push_eh _handler            # push handler on control stack
+    find_global P10, "none"     # may throw exception
+    clear_eh                    # pop the handler off the stack
+    ...
+
+  _handler:                     # if not, execution continues here
+    get_results '(0,0)', P0, S0  # handler is called with (exception, message)
+    ...
+
+If the global variable is found, the next statement
+(C<clear_eh>) pops the exception handler off the control stack and
+normal execution continues. If the C<find_global> call doesn't find
+C<none> it throws an exception by passing an exception object to the
+exception handler. 
+
+The first exception handler in the control stack sees every exception
+thrown. The handler has to examine the exception object and decide
+whether it can handle it (or discard it) or whether it should
+C<rethrow> the exception to pass it along to an exception handler
+deeper in the stack. The C<rethrow> opcode is only valid in exception
+handlers. It pushes the exception object back onto the control stack so
+Parrot knows to search for the next exception handler in the stack. The
+process continues until some exception handler deals with the exception
+and returns normally, or until there are no more exception handlers on
+the control stack. When the system finds no installed exception handlers
+it defaults to a final action, which normally means it prints an
+appropriate message and terminates the program.
+
+When the system installs an exception handler, it creates a return
+continuation with a snapshot of the current interpreter context. If
+the exception handler just returns (that is, if the exception is
+cleanly caught) the return continuation restores the control stack
+back to its state when the exception handler was called, cleaning up
+the exception handler and any other changes that were made in the
+process of handling the exception.
+
+Exceptions thrown by standard Parrot opcodes (like the one thrown by
+C<find_global> above or by the C<throw> opcode) are always resumable,
+so when the exception handler function returns normally it continues
+execution at the opcode immediately after the one that threw the
+exception. Other exceptions at the run-loop level are also generally
+resumable.
+
+  new P10, Exception            # create new Exception object
+  set P10["_message"], "I die"  # set message attribute
+  throw P10                     # throw it
+
+Exceptions are designed to work with the Parrot calling conventions.
+Since the return addresses of C<bsr> subroutine calls and exception
+handlers are both pushed onto the control stack, it's generally a bad
+idea to combine the two.
+
+=head1 ATTACHMENTS
+
+None.
+
+=head1 FOOTNOTES
+
+None.
+
+=head1 REFERENCES
+
+  src/ops/core.ops
+  src/exceptions.c
+  runtime/parrot/include/except_types.pasm
+  runtime/parrot/include/except_severity.pasm
+
+=cut
+
+__END__
+Local Variables:
+  fill-column:78
+End:

Added: trunk/docs/pdds/clip/pdd24_events.pod
==============================================================================
--- (empty file)
+++ trunk/docs/pdds/clip/pdd24_events.pod       Tue Apr 18 13:45:23 2006
@@ -0,0 +1,156 @@
+# Copyright: 2001-2006 The Perl Foundation.
+# $Id: $
+
+=head1 NAME
+
+docs/pdds/clip/pddXX_events.pod - Parrot Events
+
+=head1 ABSTRACT
+
+This document defines the requirements and implementation strategy for
+Parrot's event subsystem.
+
+=head1 VERSION
+
+$Revision: $
+
+=head1 DESCRIPTION
+
+Description of the subject.
+
+=head1 DEFINITIONS
+
+Definitions of important terms. (optional)
+
+=head1 IMPLEMENTATION
+
+[Excerpt from Perl 6 and Parrot Essentials to seed discussion.]
+
+An event is a notification that something has happened: the user has
+manipulated a GUI element, an I/O request has completed, a signal has
+been triggered, or a timer has expired.  Most systems these days have an
+event handler (often two or three, which is something of a problem),
+because handling events is so fundamental to modern GUI programming.
+Unfortunately, the event handling system is not integrated, or poorly
+integrated, with the I/O system, leading to nasty code and unpleasant
+workarounds to try and make a program responsive to network, file, and
+GUI events simultaneously. Parrot presents a unified event handling
+system, integrated with its I/O system, which makes it possible to write
+cross-platform programs that work well in a complex environment.
+
+Parrot's events are fairly simple. An event has an event type, some
+event data, an event handler, and a priority. Each thread has an event
+queue, and when an event happens it's put into the right thread's
+queue (or the default thread queue in those cases where we can't tell
+which thread an event was destined for) to wait for something to
+process it.
+
+Any operation that would potentially block drains the event queue
+while it waits, as do a number of the cleanup opcodes that Parrot uses
+to tidy up on scope exit. Parrot doesn't check each opcode for an
+outstanding event for pure performance reasons, as that check gets
+expensive quickly. Still, Parrot generally ensures timely event
+handling, and events shouldn't sit in a queue for more than a few
+milliseconds unless event handling has been explicitly disabled.
+
+When Parrot does extract an event from the event queue, it calls that
+event's event handler, if it has one. If an event doesn't have a
+handler, Parrot instead looks for a generic handler for the event type
+and calls it instead. If for some reason there's no handler for the
+event type, Parrot falls back to the generic event handler, which
+throws an exception when it gets an event it doesn't know how to
+handle.  You can override the generic event handler if you want Parrot
+to do something else with unhandled events, perhaps silently
+discarding them instead.
+
+Because events are handled in mainline code, they don't have the
+restrictions commonly associated with interrupt-level code. It's safe
+and acceptable for an event handler to throw an exception, allocate
+memory, or manipulate thread or global state safely. Event handlers
+can even acquire locks if they need to, though it's not a good idea to
+have an event handler blocking on lock acquisition.
+
+Parrot uses the priority on events for two purposes. First, the
+priority is used to order the events in the event queue. Events for a
+particular priority are handled in a FIFO manner, but higher-priority
+events are always handled before lower-priority events. Parrot also
+allows a user program or event handler to set a minimum event priority
+that it will handle. If an event with a priority lower than the
+current minimum arrives, it won't be handled, instead sitting in the
+queue until the minimum priority level is dropped. This allows an
+event handler that's dealing with a high-priority event to ignore
+lower-priority events.
+
+User code generally doesn't need to deal with prioritized events, so
+programmers should adjust event priorities with care. Adjusting the
+default priority of an event, or adjusting the current minimum
+priority level, is a rare occurrence.  It's almost always a mistake to
+change them, but the capability is there for those rare occasions
+where it's the correct thing to
+do.
+
+=head2 Signals
+
+Signals are a special form of event, based on the Unix signal mechanism.
+Parrot presents them as mildly special, as a remnant of Perl's Unix
+heritage, but under the hood they're not treated any differently from
+any other event.
+
+The Unix signaling mechanism is something of a mash, having been
+extended and worked on over the years by a small legion of undergrad
+programmers. At this point, signals can be divided into two
+categories, those that are fatal, and those that aren't.
+
+Fatal signals are things like 
+SIGKILL, which unconditionally kills a process, or SIGSEGV, which
+indicates that the process has tried to access memory that isn't part
+of your process.  There's no good way for Parrot to catch these
+signals, so they remain fatal and will kill your process.  On some
+systems it's possible to catch some of the fatal signals, but
+Parrot code itself operates at too high a level for a user program to
+do anything with them--they must be handled with special-purpose code
+written in C or some other low-level language.  Parrot itself may
+catch them in special circumstances for its own use, but that's an
+implementation detail that isn't exposed to a user program.
+
+Non-fatal signals are things like SIGCHLD, indicating that a
+child process has died, or SIGINT, indicating that the user
+has hit C<^C> on the keyboard. Parrot turns these signals into events
+and puts them in the event queue.  Your program's event handler for the
+signal will be called as soon as Parrot gets to the event in the queue,
+and your code can do what it needs to with it.
+
+SIGALRM, the timer expiration signal, is treated specially by
+Parrot. Generated by an expiring alarm() system call, this signal is
+normally used to provide timeouts for system calls that would
+otherwise block forever, which is very useful. The big downside to
+this is that on most systems there can only be one outstanding
+alarm() request, and while you can get around this somewhat with the
+setitimer call (which allows up to three pending alarms) it's still
+quite limited.
+
+Since Parrot's IO system is fully asynchronous and never blocks--even
+what looks like a blocking request still drains the event queue--the
+alarm signal isn't needed for this. Parrot instead grabs SIGALRM for
+its own use, and provides a fully generic timer system which allows
+any number of timer events, each with their own callback functions
+and private data, to be outstanding.
+
+=head1 ATTACHMENTS
+
+None.
+
+=head1 FOOTNOTES
+
+None.
+
+=head1 REFERENCES
+
+None.
+
+=cut
+
+__END__
+Local Variables:
+  fill-column:78
+End:

Added: trunk/docs/pdds/clip/pdd25_threads.pod
==============================================================================
--- (empty file)
+++ trunk/docs/pdds/clip/pdd25_threads.pod      Tue Apr 18 13:45:23 2006
@@ -0,0 +1,134 @@
+# Copyright: 2001-2006 The Perl Foundation.
+# $Id: $
+
+=head1 NAME
+
+docs/pdds/clip/pddXX_threads.pod - Parrot Threads
+
+=head1 ABSTRACT
+
+This document defines the requirements and implementation strategy for
+Parrot's threading model.
+
+=head1 VERSION
+
+$Revision: $
+
+=head1 DEFINITIONS
+
+Concurrency
+
+=head1 DESCRIPTION
+
+Description of the subject.
+
+=head1 IMPLEMENTATION
+
+[Excerpt from Perl 6 and Parrot Essentials to seed discussion.]
+
+Threads are a means of splitting a process into multiple pieces that
+execute simultaneously.  It's a relatively easy way to get some
+parallelism without too much work. Threads don't solve all the
+parallelism problems your program may have. Sometimes multiple
+processes on a single system, multiple processes on a cluster, or
+processes on multiple separate systems are better. But threads do
+present a good solution for many common cases.
+
+All the resources in a threaded process are shared between threads.
+This is simultaneously the great strength and great weakness of
+threads. Easy sharing is fast sharing, making it far faster to
+exchange data between threads or access shared global data than to
+share data between processes on a single system or on multiple
+systems. Easy sharing is dangerous, though, since without some sort of
+coordination between threads it's easy to corrupt that shared data.
+And, because all the threads are contained within a single process, if
+any one of them fails for some reason the entire process, with all its
+threads, dies.
+
+With a low-level language such as C, these issues are manageable. The
+core data types, integers, floats, and pointers are all small enough
+to be handled atomically. Composite data can be protected with
+mutexes, special structures that a thread can get exclusive access to.
+The composite data elements that need protecting can each have a mutex
+associated with them, and when a thread needs to touch the data it
+just acquires the mutex first. By default there's very little data
+that must be shared between threads, so it's relatively easy, barring
+program errors, to write thread-safe code if a little thought is given
+to the program structure.
+
+Things aren't this easy for Parrot, unfortunately. A PMC, Parrot's
+native data type, is a complex structure, so we can't count on the
+hardware to provide us atomic access. That means Parrot has to provide
+atomicity itself, which is expensive. Getting and releasing a mutex
+isn't really that expensive in itself. It has been heavily optimized by
+platform vendors because they want threaded code to run quickly. It's
+not free, though, and when you consider that running flat-out Parrot
+does one PMC operation per 100 CPU cycles, even adding an additional 10
+cycles per operation can slow Parrot down by 10%.
+
+For any threading scheme, it's important that your program isn't
+hindered by the platform and libraries it uses. This is a common
+problem with writing threaded code in C, for example. Many libraries
+you might use aren't thread-safe, and if you aren't careful with them
+your program will crash. While we can't make low-level libraries any
+safer, we can make sure that Parrot itself won't be a danger. There is
+very little data shared between Parrot interpreters and threads, and
+access to all the shared data is done with coordinating mutexes. This
+is invisible to your program, and just makes sure that Parrot itself
+is thread-safe.
+
+When you think about it, there are really three different threading
+models. In the first one, multiple threads have no interaction among
+themselves. This essentially does with threads the same thing that's
+done with processes. This works very well in Parrot, with the
+isolation between interpreters helping to reduce the overhead of this
+scheme. There's no possibility of data sharing at the user level, so
+there's no need to lock anything.
+
+In the second threading model, multiple threads run and pass messages
+back and forth between each other. Parrot supports this as well, via
+the event mechanism. The event queues are thread-safe, so one thread
+can safely inject an event into another thread's event queue. This is
+similar to a multiple-process model of programming, except that
+communication between threads is much faster, and it's easier to pass
+around structured data.
+
+In the third threading model, multiple threads run and share data
+between themselves. While Parrot can't guarantee that data at the user
+level remains consistent, it can make sure that access to shared data
+is at least safe. We do this with two mechanisms.
+
+First, Parrot presents an advisory lock system to user code. Any piece
+of user code running in a thread can lock a variable. Any attempt to
+lock a variable that another thread has locked will block until the
+lock is released. Locking a variable only blocks other lock attempts.
+It does I<not> block plain access. This may seem odd, but it's the
+same scheme used by threading systems that obey the POSIX thread
+standard, and has been well tested in practice.
+
+Secondly, Parrot forces all shared PMCs to be marked as such, and all
+access to shared PMCs must first acquire that PMC's private lock. This
+is done by installing an alternate vtable for shared PMCs, one that
+acquires locks on all its parameters. These locks are held only for
+the duration of the vtable function, but ensure that the PMCs affected
+by the operation aren't altered by another thread while the vtable
+function is in progress.
+
+=head1 ATTACHMENTS
+
+None.
+
+=head1 FOOTNOTES
+
+None.
+
+=head1 REFERENCES
+
+None.
+
+=cut
+
+__END__
+Local Variables:
+  fill-column:78
+End:

[svn:parrot-pdd] r12354 - in trunk: . docs/pdds/clip

Reply via email to