Author: allison Date: Tue Apr 18 13:45:23 2006 New Revision: 12354 Added: trunk/docs/pdds/clip/pdd22_io.pod trunk/docs/pdds/clip/pdd23_exceptions.pod (contents, props changed) trunk/docs/pdds/clip/pdd24_events.pod trunk/docs/pdds/clip/pdd25_threads.pod Removed: trunk/docs/pdds/clip/pddXX_events.pod trunk/docs/pdds/clip/pddXX_exceptions.pod trunk/docs/pdds/clip/pddXX_io.pod trunk/docs/pdds/clip/pddXX_threads.pod
Changes in other areas also in this revision: Modified: trunk/ (props changed) trunk/MANIFEST Log: Blessing the draft PDDs with numbers. Added: trunk/docs/pdds/clip/pdd22_io.pod ============================================================================== --- (empty file) +++ trunk/docs/pdds/clip/pdd22_io.pod Tue Apr 18 13:45:23 2006 @@ -0,0 +1,773 @@ +# Copyright: 2001-2006 The Perl Foundation. +# $Id $ + +=head1 NAME + +docs/pdds/pddXX_io.pod - Parrot I/O + +=head1 ABSTRACT + +Parrot's I/O subsystem. + +=head1 VERSION + +$Revision $ + +=head1 SYNOPSIS + + open P0, "data.txt", ">" + print P0, "sample data\n" + close P0 + + open P1, "data.txt", "<" + S0 = read P1, 12 + P2 = getstderr + print P2, S0 + close P1 + + ... + +=head1 DEFINITIONS + +A "stream" allows input or output operations on a source/destination +such as a file, keyboard, or text console. Streams are also called +"filehandles", though only some of them have anything to do with files. + +=head1 DESCRIPTION + +This is a draft document defining Parrot's I/O subsystem, for both +streams and network I/O. Parrot has both synchronous and asynchronous +I/O operations. This section describes the interface, and the +L<IMPLEMENTATION> section provides more details on general +implementation questions and error handling. + +The signatures for the asynchronous operations are nearly identical to +the synchronous operations, but the asynchronous operations take an +additional argument for a callback, and the only return value from the +asynchronous operations is a status object. The callbacks take the +status object as their first argument, and any return values as their +remaining arguments. + +The listing below says little about whether the opcodes return error +information. For now assume that they can either return a status object, +or return nothing. Error handling is discussed more thoroughly in the +implementation section. + +=head2 I/O Stream Opcodes + +=head3 Opening and closing streams + +=over 4 + +=item * + +C<open> opens a stream object based on a string path. It takes an +optional string argument specifying the mode of the stream (read, write, +append, read/write, etc.), and returns a stream object. Currently the +mode of the stream is set with a string argument similar to Perl 5 +syntax, but a set of defined constants may fit better with Parrot's +general architecture. + + 0 PIOMODE_READ (default) + 1 PIOMODE_WRITE + 2 PIOMODE_APPEND + 3 PIOMODE_READWRITE + 4 PIOMODE_PIPE (read) + 5 PIOMODE_PIPEWRITE + +The asynchronous version takes a PMC callback as an additional final +argument. When the open operation is complete, it invokes the callback +with two arguments: a status object and the opened stream object. + +=item * + +C<close> closes a stream object. It takes a single string object +argument and returns a status object. + +The asynchronous version takes an additional final PMC callback +argument. When the close operation is complete, it invokes the callback, +passing it a status object. + +=back + +=head3 Retrieving existing streams + +These opcodes do not have asynchronous variants. + +=over 4 + +=item * + +C<getstdin>, C<getstdout>, and C<getstderr> return a stream object for +standard input, standard output, and standard error. + +=item * + +C<fdopen> converts an existing and already open UNIX integer file +descriptor into a stream object. It also takes a string argument to +specify the mode. + +=back + +=head3 Writing to streams + +=over 4 + +=item * + +C<print> writes an integer, float, string, or PMC value to a stream. It +writes to standard output by default, but optionally takes a PMC +argument to select another stream to write to. + +The asynchronous version takes an additional final PMC callback +argument. When the print operation is complete, it invokes the callback, +passing it a status object. + +=item * + +C<printerr> writes an integer, float, string, or PMC value to standard +error. + +There is no asynchronous variant of C<printerr>. [It's just a shortcut. +If they want an asynchronous version, they can use C<print>.] + +=back + +=head3 Reading from streams + +=over 4 + +=item * + +C<read> retrieves a specified number of bytes from a stream into a +string. [Note this is bytes, not codepoints.] By default it reads from +standard input, but it also takes an alternate stream object source as +an optional argument. + +The asynchronous version takes an additional final PMC callback +argument, and only returns a status object. When the read operation is +complete, it invokes the callback, passing it a status object and a +string of bytes. + +=item * + +C<readline> retrieves a single line from a stream into a string. Calling +C<readline> flags the stream as operating in line-buffer mode (see +C<pioctl> below). + +The asynchronous version takes an additional final PMC callback +argument, and only returns a status object. When the readline operation +is complete, it invokes the callback, passing it a status object and a +string of bytes. + +=item * + +C<peek> retrieves the next byte from a stream into a string, but doesn't +remove it from the stream. By default it reads from standard input, but +it also takes a stream object argument for an alternate source. + +There is no asynchronous version of C<peek>. [Does anyone have a line +of reasoning why one might be needed? The concept of "next byte" seems +to be a synchronous one.] + +=back + +=head3 Retrieving and setting stream properties + +=over 4 + +=item * + +C<seek> sets the current file position of a stream object to an integer +byte offset from an integer starting position (0 for the start of the +file, 1 for the current position, and 2 for the end of the file). It +also has a 64-bit variant that sets the byte offset by two integer +arguments (one for the first 32 bits of the 64-bit offset, and one for +the second 32 bits). [The two-register emulation for 64-bit integers may +be deprecated in the future.] + +The asynchronous version takes an additional final PMC callback +argument. When the seek operation is complete, it invokes the callback, +passing it a status object and the stream object it was called on. + +=item * + +C<tell> retrieves the current file position of a stream object. It also +has a 64-bit variant that returns the byte offset as two integers (one +for the first 32 bits of the 64-bit offset, and one for the second 32 +bits). [The two-register emulation for 64-bit integers may be deprecated +in the future.] + +No asynchronous version. + +=item * + +C<getfd> retrieves the UNIX integer file descriptor of a stream object. + +No asynchronous version. + +=item * + +C<pioctl> provides low-level access to the attributes of a stream +object. It takes a stream object, an integer flag to select a command, +and a single integer argument for the command. It returns an integer +indicating the success or failure of the command. + +The following constants are defined for the commands that C<pioctl> can +execute: + + 0 PIOCTL_CMDRESERVED + No documentation available. + 1 PIOCTL_CMDSETRECSEP + Set the record separator. [This doesn't actually work at the + moment.] + 2 PIOCTL_CMDGETRECSEP + Get the record separator. + 3 PIOCTL_CMDSETBUFTYPE + Set the buffer type. + 4 PIOCTL_CMDGETBUFTYPE + Get the buffer type + 5 PIOCTL_CMDSETBUFSIZE + Set the buffer size. + 6 PIOCTL_CMDGETBUFSIZE + Get the buffer size. + +The following constants are defined as argument/return values for the +buffer-type commands: + + 0 PIOCTL_NONBUF + Unbuffered I/O. Bytes are sent as soon as possible. + 1 PIOCTL_LINEBUF + Line buffered I/O. Bytes are sent when a newline is + encountered. + 2 PIOCTL_BLKBUF + Fully buffered I/O. Bytes are sent when the buffer is full. + [Called "BLKBUF" because bytes are sent as a block, but line + buffering also sends them as a block, so "FULBUF" might make + more sense.] + +[This opcode may be deprecated and replaced with methods on stream +objects.] + +=item * + +C<poll> polls a stream or socket object for particular types of events +(an integer flag) at a frequency set by seconds and microseconds (the +final two integer arguments). [At least, that's what the documentation +in src/io/io.c says. In actual fact, the final two arguments seem to be +setting the timeout, exactly the same as the corresponding argument to +the system version of C<poll>.] + +See the system documentation for C<poll> to see the constants for event +types and return status. + +This opcode is inherently synchronous (poll is "synchronous I/O +multiplexing"), but it can retreive status information from a stream or +socket object whether the object is being used synchronously or +asynchronously. + +=back + +=head3 Deprecated opcodes + +=over + +=item * + +C<write> prints to standard output but it cannot select another stream. +It only accepts a PMC value to write. This is redundant with the +C<print> opcode, so it will be deprecated. + +=back + +=head2 Filesystem Opcodes + +=over 4 + +=item * + +C<stat> retrieves information about a file on the filesystem. It takes a +string filename or an integer argument of a UNIX file descriptor [or an +already opened stream object?], and an integer flag for the type of +information requested. It returns an integer containing the requested +information. The following constants are defined for the type of +information requested (see F<runtime/parrot/include/stat.pasm>): + + 0 STAT_EXISTS + Whether the file exists. + 1 STAT_FILESIZE + The size of the file. + 2 STAT_ISDIR + Whether the file is a directory. + 3 STAT_ISDEV + Whether the file is a device such as a terminal or a disk. + 4 STAT_CREATETIME + The time the file was created. + (Currently just returns -1.) + 5 STAT_ACCESSTIME + The last time the file was accessed. + 6 STAT_MODIFYTIME + The last time the file data was changed. + 7 STAT_CHANGETIME + The last time the file metadata was changed. + 8 STAT_BACKUPTIME + The last time the file was backed up. + (Currently just returns -1.) + 9 STAT_UID + The user ID of the file. + 10 STAT_GID + The group ID of the file. + +The asynchronous version takes an additional final PMC callback +argument, and only returns a status object. When the stat operation is +complete, it invokes the callback, passing it a status object and an +integer containing the status information. + +=item * + +C<unlink> deletes a file from the filesystem. It takes a single string +argument of a filename (including the path). + +The asynchronous version takes an additional final PMC callback +argument. When the unlink operation is complete, it invokes the +callback, passing it a status object. + +=item * + +C<rmdir> deletes a directory from the filesystem if that directory is +empty. It takes a single string argument of a directory name (including +the path). + +The asynchronous version takes an additional final PMC callback +argument. When the rmdir operation is complete, it invokes the callback, +passing it a status object. + +=item * + +C<opendir> opens a stream object for a directory. It takes a single +string argument of a directory name (including the path) and returns a +stream object. + +The asynchronous version takes an additional final PMC callback +argument, and only returns a status object. When the opendir operation +is complete, it invokes the callback, passing it a status object and a +newly created stream object. + +=item * + +C<readdir> reads a single item from an open directory stream object. It +takes a single stream object argument and returns a string containing +the path and filename/directory name of the current item. (i.e. the +directory stream object acts as an iterator.) + +The asynchronous version takes an additional final PMC callback +argument, and only returns a status object. When the readdir operation +is complete, it invokes the callback, passing it a status object and the +string result. + +=item * + +C<telldir> returns the current position of C<readdir> operations on a +directory stream object. + +No asynchronous version. + +=item * + +C<seekdir> sets the current position of C<readdir> operations on a +directory stream object. It takes a stream object argument and an +integer for the position. [The system C<seekdir> requires that the +position argument be the result of a previous C<telldir> operation.] + +The asynchronous version takes an additional final PMC callback +argument. When the seekdir operation is complete, it invokes the +callback, passing it a status object and the directory stream object it +was called on. + +=item * + +C<rewinddir> sets the current position of C<readdir> operations on a +directory stream object back to the beginning of the directory. It takes +a stream object argument. + +No asynchronous version. + +=item * + +C<closedir> closes a directory stream object. It takes a single stream +object argument. + +The asynchronous version takes an additional final PMC callback +argument. When the closedir operation is complete, it invokes the +callback, passing it a status object. + +=back + +=head2 Network I/O Opcodes + +Most of these opcodes conform to the standard UNIX interface, but the +layer API allows alternate implementations for each. + +=over 4 + +=item * + +C<socket> returns a new socket object from a given address family, +socket type, and protocol number (all integers). The socket object's +boolean value can be tested for whether the socket was created. + +The asynchronous version takes an additional final PMC callback +argument, and only returns a status object. When the socket operation is +complete, it invokes the callback, passing it a status object and a new +socket object. + +=item * + +C<sockaddr> returns an object representing a socket address, generated +from a port number (integer) and an address (string). + +No asynchronous version. + +=item * + +C<connect> connects a socket object to an address. + +The asynchronous version takes an additional final PMC callback +argument, and only returns a status object. When the socket operation is +complete, it invokes the callback, passing it a status object and the +socket object it was called on. [If you want notification when a connect +operation is completed, you probably want to do something with that +connected socket object.] + +=item * + +C<recv> receives a message from a connected socket object. It returns +the message in a string. + +The asynchronous version takes an additional final PMC callback +argument, and only returns a status object. When the recv operation is +complete, it invokes the callback, passing it a status object and a +string containing the received message. + +=item * + +C<send> sends a message string to a connected socket object. + +The asynchronous version takes an additional final PMC callback +argument, and only returns a status object. When the send operation is +complete, it invokes the callback, passing it a status object. + +=item * + +C<sendto> sends a message string to an address specified in an address +object (first connecting to the address). + +The asynchronous version takes an additional final PMC callback +argument, and only returns a status object. When the sendto operation is +complete, it invokes the callback, passing it a status object. + + +=item * + +C<bind> binds a socket object to the port and address specified by an +address object (the packed result of C<sockaddr>). + +The asynchronous version takes an additional final PMC callback +argument, and only returns a status object. When the bind operation is +complete, it invokes the callback, passing it a status object and the +socket object it was called on. [If you want notification when a bind +operation is completed, you probably want to do something with that +bound socket object.] + +=item * + +C<listen> specifies that a socket object is willing to accept incoming +connections. The integer argument gives the maximum size of the queue +for pending connections. + +There is no asynchronous version. C<listen> marks a set of attributes on +the socket object. + +=item * + +C<accept> accepts a new connection on a given socket object, and returns +a newly created socket object for the connection. + +The asynchronous version takes an additional final PMC callback +argument, and only returns a status object. When the accept operation +receives a new connection, it invokes the callback, passing it a status +object and a newly created socket object for the connection. [While the +synchronous C<accept> has to be called repeatedly in a loop (once for +each connection received), the asynchronous version is only called once, +but continues to send new connection events until the socket is closed.] + +=item * + +C<shutdown> closes a socket object for reading, for writing, or for all +I/O. It takes a socket object argument and an integer argument for the +type of shutdown: + + 0 PIOSHUTDOWN_READ + Close the socket object for reading. + 1 PIOSHUTDOWN_WRITE + Close the socket object for writing. + 2 PIOSHUTDOWN + Close the socket object. + +=back + + +=head1 IMPLEMENTATION + +The Parrot I/O subsystem uses a per-interpreter stack to provide a +layer-based approach to I/O. Each layer implements a subset of the +C<ParrotIOLayerAPI> vtable. To find an I/O function, the layer stack is +searched downwards until a non-NULL function pointer is found for +that particular slot. + +=head2 Synchronous and Asynchronous Operations + +Currently, Parrot only implements synchronous I/O operations. +Asynchronous operations are essentially the same as the synchronous +operations, but each asynchronous operation runs in its own thread. + +Note: this is a deviation from the existing plan, which had all I/O +operations run internally as asynchronous, and the synchronous +operations as a compatibility layer on top of the asynchronous +operations. This conceptual simplification means that all I/O operations +are possible without threading support (for example, in a stripped-down +version of Parrot running on a PDA). [Asynchronous operations don't have +to use Parrot threads, they could use some alternate threading +implementation. But it's overkill to develop two threading +implementations. If Parrot threads turn out to be too heavyweight, we +may want to look into a lighter weight variation for asynchronous +operations.] + +The asynchronous I/O implementation will use Parrot's I/O layer +architecture so some platforms can take advantage of their built-in +asynchronous operations instead of using Parrot threads. + +Communication between the calling code and the asynchronous operation +thread will be handled by a shared status object. The operation thread +will update the status object whenever the status changes, and the +calling code can check the status object at any time. [Twisted has an +interesting variation on this, in that it replaces the status object +with the returned result of the asynchronous call when the call is +complete. That is probably too confusing, but we might give the status +object a reference to the returned result.] + +The current strategy for differentating the synchronous calls from +asynchronous ones relies on the presence of a callback argument in the +asynchronous calls. If we wanted asynchronous calls that don't supply +callbacks (perhaps if the user wants to manually check later if the +operation succeded) we would need another strategy to differentiate the +two. This is probably enough of a fringe case that we don't need to +provide opcodes for it, provided they can access the functionality via +methods on ParrotIO objects. + +=head2 Error Handling + +Currently some of the networking opcodes (C<connect>, C<recv>, C<send>, +C<poll>, C<bind>, and C<listen>) return an integer indicating the status +of the call, -1 or a system error code if unsuccessful. Other I/O +opcodes (such as C<getfd> and C<accept>) have various different +strategies for error notification, and others have no way of marking +errors at all. We want to unify all I/O opcodes so they use a consistent +strategy for error notification. There are several options in how we do +this. + +=head3 Integer status codes + +One approach is to have every I/O operation return an integer status +code indicating success or failure. This approach has the advantage of +being lightweight: returning a single additional integer is cheap. The +disadvantage is that it's not very flexible: the only way to look for +errors is to check the integer return value, possibly comparing it to a +predefined set of error constants. + +=head3 Exceptions + +Another option is to have all I/O operations throw exceptions on errors. +The advantage is that it keeps the error tracking information +out-of-band, so it doesn't affect the arguments or return values of the +calls (some opcodes that have a return value plus an integer status code +have odd looking signatures). One disadvantage of this approach is that +it forces all users to handle exceptions from I/O operations even if +they aren't using exceptions otherwise. + +A more significant disadvantage is that exeptions don't work well with +asynchronous operations. Exception handlers are set for a particular +dynamic scope, but with an asynchronous operation, by the time an +exception is thrown execution has already left the dynamic scope where +the exception handler was set. [Though, this partly depends on how +exceptions are implemented.] + +=head3 Error callbacks + +A minor variation on the exceptions option is to pass an error callback +into each I/O opcode. This solves the problem of asynchronous operations +because the operation has its own custom error handling code rather than +relying on an exception handler in its dynamic scope. + +The disadvantage is that the user has to define a custom error handler +routine for every call. It also doesn't cope well with cases where +multiple different kinds of errors may be returned by a single opcode. +(The one error handler would have to cope with all possible types of +errors.) There is an easier way. + +=head3 Hybrid solution + +Another option is to return a status object from each I/O operation. The +status object could be used to get an integer status code, string +status/error message, or boolean success value. It could also provide a +method to throw an exception on error conditions. There could even be a +global option (or an option set on a particular I/O object) that tells +Parrot to always throw exceptions on errors in synchronous I/O +operations, implemented by calling this method on the status object +before returning from the I/O opcode. + +The advantages are that this works well with asynchronous and +synchronous operations, and provides flexibility for multiple different +uses. Also, something like a status object will be needed anyway to +allow users to check on the status of a particular asynchronous call in +progress, so this is a nice unification. + +The disadvantage is that a status object involves more overhead than a +simple integer status code. + +=head2 IPv6 Support + +The transition from IPv4 to IPv6 is in progress, though not likely to be +complete anytime soon. Most operating systems today offer at least +dual-stack IPv6 implementations, so they can use either IPv4 or IPv6, +depending on what's available. Parrot also needs to support either +protocol. For the most part, the network I/O opcodes should internally +handle either addressing scheme, without requiring the user to specify +which scheme is being used. + +IETF recommends defaulting to IPv6 connections and falling back to IPv4 +connections when IPv6 fails. This would give us more solid testing of +Parrot's compatibility IPv6, but may be too slow. Either way, it's a +good idea to make setting the default (or selecting one exclusively) an +option when compiling Parrot. + +The most important issues for Parrot to consider with IPv6 are: + +=over 4 + +=item * + +Support 128 bit addresses. IPv6 addresses are colon-separated +hexadecimal numbers, such as C<20a:95ff:fef5:7e5e>. + +=item * + +Any address parsing should be able to support the address separated from +a port number or prefix/length by brackets: C<[20a:95ff:fef5:7e5e]:80> +and C<[20a:95ff::]/64>. + +=item * + +Packed addresses, such as the result of the C<sockaddr> opcode, should +be passed around as an object (or at least a structure) rather than as a +string. + +=back + +See the relevant IETF RFCs: "Application Aspects of IPv6 Transition" +(http://www.ietf.org/rfc/rfc4038.txt) and "Basic Socket Interface +Extensions for IPv6" (http://www.ietf.org/rfc/rfc3493.txt). + +=head2 Excerpt + +[Below is an excerpt from "Perl 6 and Parrot Essentials", included to +seed discussion.] + +Parrot's base I/O system is fully asynchronous I/O with callbacks and +per-request private data. Since this is massive overkill in many cases, +we have a plain vanilla synchronous I/O layer that your programs can use +if they don't need the extra power. + +Asynchronous I/O is conceptually pretty simple. Your program makes an +I/O request. The system takes that request and returns control to your +program, which keeps running. Meanwhile the system works on satisfying +the I/O request. When the request is satisfied, the system notifies +your program in some way. Since there can be multiple requests +outstanding, and you can't be sure exactly what your program will be +doing when a request is satisfied, programs that make use of +asynchronous I/O can be complex. + +Synchronous I/O is even simpler. Your program makes a request to the +system and then waits until that request is done. There can be only +one request in process at a time, and you always know what you're +doing (waiting) while the request is being processed. It makes your +program much simpler, since you don't have to do any sort of +coordination or synchronization. + +The big benefit of asynchronous I/O systems is that they generally +have a much higher throughput than a synchronous system. They move +data around much faster--in some cases three or four times faster. +This is because the system can be busy moving data to or from disk +while your program is busy processing data that it got from a previous +request. + +For disk devices, having multiple outstanding requests--especially on +a busy system--allows the system to order read and write requests to +take better advantage of the underlying hardware. For example, many +disk devices have built-in track buffers. No matter how small a +request you make to the drive, it always reads a full track. With +synchronous I/O, if your program makes two small requests to the same +track, and they're separated by a request for some other data, the +disk will have to read the full track twice. With asynchronous I/O, on +the other hand, the disk may be able to read the track just once, and +satisfy the second request from the track buffer. + +Parrot's I/O system revolves around a request. A request has three +parts: a buffer for data, a completion routine, and a piece of data +private to the request. Your program issues the request, then goes about +its business. When the request is completed, Parrot will call the +completion routine, passing it the request that just finished. The +completion routine extracts out the buffer and the private data, and +does whatever it needs to do to handle the request. If your request +doesn't have a completion routine, then your program will have to +explicitly check to see if the request was satisfied. + +Your program can choose to sleep and wait for the request to finish, +essentially blocking. Parrot will continue to process events while +your program is waiting, so it isn't completely unresponsive. This is +how Parrot implements synchronous I/O--it issues the asynchronous +request, then immediately waits for that request to complete. + +The reason we made Parrot's I/O system asynchronous by default was +sheer pragmatism. Network I/O is all asynchronous, as is GUI +programming, so we knew we had to deal with asynchrony in some form. +It's also far easier to make an asynchronous system pretend to be +synchronous than it is the other way around. We could have decided to +treat GUI events, network I/O, and file I/O all separately, but there +are plenty of systems around that demonstrate what a bad idea that is. + +=head1 ATTACHMENTS + +None. + +=head1 FOOTNOTES + +None. + +=head1 REFERENCES + + src/io/io.c + src/ops/io.ops + include/parrot/io.h + runtime/parrot/library/Stream/* + src/io/io_unix.c + src/io/io_win32.c + Perl 5's IO::AIO + Perl 5's POE + +=cut + +__END__ +Local Variables: + fill-column:78 +End: Added: trunk/docs/pdds/clip/pdd23_exceptions.pod ============================================================================== --- (empty file) +++ trunk/docs/pdds/clip/pdd23_exceptions.pod Tue Apr 18 13:45:23 2006 @@ -0,0 +1,325 @@ +# Copyright: 2001-2006 The Perl Foundation. +# $Id$ + +=head1 NAME + +docs/pdds/clip/pddXX_exceptions.pod - Parrot Exceptions + +=head1 ABSTRACT + +This document defines the requirements and implementation strategy for +Parrot's exception system. + +=head1 VERSION + +$Revision$ + +=head1 DESCRIPTION + +An exception system gives user-developed code control over how run-time +error conditions are handled. Exceptions are errors or unusual +conditions that require special processing. An exception handler +performs the necessary steps to appropriately respond to a particular +kind of exception. + +=head2 Exception Opcodes + +These are the opcodes relevant to exceptions and exception handlers: + +=over + +=item * + +C<push_eh> creates an exception handler and pushes it onto the control +stack. It takes a label (the location of the exception handler) as its +only argument. [Is this right? Treating exception handlers as label +jumps rather than full subroutines is error-prone.] + +=item * + +C<clear_eh> removes the most recently added exception from the control +stack. + +=item * + +C<throw> throws an exception object. + +=item * + +C<rethrow> rethrows an exception object. It can only be called from +inside an exception handler. + +=item * + +C<die> throws an exception. It takes two arguments, one for the severity +of the exception and one for the type of exception. + +If the severity is C<EXCEPT_DOOMED>, it exits via a call to +C<_exit($2)>, which is not a catchable exception. + +These are the constants defined for severity: + + 0 EXCEPT_NORMAL + 1 EXCEPT_WARNING + 2 EXCEPT_ERROR + 3 EXCEPT_SEVERE + 4 EXCEPT_FATAL + 5 EXCEPT_DOOMED + 6 EXCEPT_EXIT + +These are the constants defined for exception types: + + 0 E_Exception + 1 E_SystemExit + 2 E_StopIteration + 3 E_StandardError + 4 E_KeyboardInterrupt + 5 E_ImportError + 6 E_EnvironmentError + 7 E_IOError + 8 E_OSError + 9 E_WindowsError + 10 E_VMSError + 11 E_EOFError + 12 E_RuntimeError + 13 E_NotImplementedError + 14 E_LibraryNotLoadedError + 15 E_NameError + 16 E_UnboundLocalError + 17 E_AttributeError + 18 E_SyntaxError + 19 E_IndentationError + 20 E_TabError + 21 E_TypeError + 22 E_AssertionError + 23 E_LookupError + 24 E_IndexError + 25 E_KeyError + 26 E_ArithmeticError + 27 E_OverflowError + 28 E_ZeroDivisionError + 29 E_FloatingPointError + 30 E_ValueError + 31 E_UnicodeError + 32 E_UnicodeEncodeError + 33 E_UnicodeDecodeError + 34 E_UnicodeTranslateError + 35 E_ReferenceError + 36 E_SystemError + 37 E_MemoryError + 37 E_LAST_PYTHON_E + 38 BAD_BUFFER_SIZE + 39 MISSING_ENCODING_NAME + 40 INVALID_STRING_REPRESENTATION + 41 ICU_ERROR + 42 UNIMPLEMENTED + 43 NULL_REG_ACCESS + 44 NO_REG_FRAMES + 45 SUBSTR_OUT_OF_STRING + 46 ORD_OUT_OF_STRING + 47 MALFORMED_UTF8 + 48 MALFORMED_UTF16 + 49 MALFORMED_UTF32 + 50 INVALID_CHARACTER + 51 INVALID_CHARTYPE + 52 INVALID_ENCODING + 53 INVALID_CHARCLASS + 54 NEG_REPEAT + 55 NEG_SUBSTR + 56 NEG_SLEEP + 57 NEG_CHOP + 58 INVALID_OPERATION + 59 ARG_OP_NOT_HANDLED + 60 KEY_NOT_FOUND + 61 JIT_UNAVAILABLE + 62 EXEC_UNAVAILABLE + 63 INTERP_ERROR + 64 PREDEREF_LOAD_ERROR + 65 PARROT_USAGE_ERROR + 66 PIO_ERROR + 67 PARROT_POINTER_ERROR + 68 DIV_BY_ZERO + 69 PIO_NOT_IMPLEMENTED + 70 ALLOCATION_ERROR + 71 INTERNAL_PANIC + 72 OUT_OF_BOUNDS + 73 JIT_ERROR + 74 EXEC_ERROR + 75 ILL_INHERIT + 76 NO_PREV_CS + 77 NO_CLASS + 78 LEX_NOT_FOUND + 79 PAD_NOT_FOUND + 80 ATTRIB_NOT_FOUND + 81 GLOBAL_NOT_FOUND + 82 METH_NOT_FOUND + 83 WRITE_TO_CONSTCLASS + 84 NOSPAWN + 85 INTERNAL_NOT_IMPLEMENTED + 86 ERR_OVERFLOW + 87 LOSSY_CONVERSION + +=item * + +C<exit> throws an exception of severity C<EXCEPT_EXIT>. It takes a +single argument for the exception type. + +=item * + +C<pushaction> pushes a subroutine object onto the control stack. If the +control stack is unwound due to an exception (or C<popmark>, or +subroutine return), the subroutine is invoked with an integer argument: +C<0> means a normal return; C<1> means an exception has been raised. +[Seems like there's lots of room for dangerous collisions here.] + +=back + +=head1 IMPLEMENTATION + +[I'm not convinced the control stack is the right way to handle +exceptions. Most of Parrot is based on the continuation-passing style of +control, shouldn't exceptions be based on it too? See bug #38850.] + +=head2 Opcodes that Throw Exceptions + +Exceptions have been incorporated into built-in opcodes in a limited +way, but they aren't used consistently. + +Divide by zero exceptions are thrown by C<div>, C<fdiv>, and C<cmod>. + +The C<ord> opcode throws an exception when it's passed an empty +argument, or passed a string index that's outside the length of the +string. + +The C<classoffset> opcode throws an exception when it's asked to +retrieve the attribute offset for a class that isn't in the object's +inheritance hierarchy. + +The C<find_charset> opcode throws an exception if the charset name it's +looking up doesn't exist. The C<trans_charset> opcode throws an +exception on "information loss" (presumably, this means when one charset +doesn't have a one-to-one correspondence in the other charset). + +The C<find_encoding> opcode throws an exception if the encoding name +it's looking up doesn't exist. The C<trans_encoding> opcode throws an +exception on "information loss" (presumably, this means when one +encoding doesn't have a one-to-one correspondence in the other +encoding). + +Parrot's default version of the C<LexPad> PMC uses exceptions, though +other implementations can choose to return error values instead. +C<store_lex> throws an exception when asked to store a lexical variable +in a name that doesn't exist. C<find_lex> throws an exception when asked +to retrieve a lexical name that doesn't exist. + +Other opcodes respond to an C<errorson> setting to decide whether to +throw an exception or return an error value. C<find_global> throws an +exception (or returns a Null PMC) if the global name requested doesn't +exist. C<find_name> throws an exception (or returns a Null PMC) if the +name requested doesn't exist in a lexical, current, global, or built-in +namespace. + +It's a little odd that so few opcodes throw exceptions (these are the +ones that are documented, but a few others throw exceptions internally +even though they aren't documented as doing so). It's worth considering +either expanding the use of exceptions consistently throughout the +opcode set, or eliminating exceptions from the opcode set entirely. The +strategy for error handling should be consistent, whatever it is. [I +like the way C<LexPad>s and the C<errorson> settings provide the option +for exception-based or non-exception-based implementations, rather than +forcing one or the other.] + +=head2 Excerpt + +[Excerpt from "Perl 6 and Parrot Essentials" to seed discussion. +Out-of-date in some ways, and in others it was simply speculative.] + +Exceptions provide a way of calling a piece of code outside the normal +flow of control. They are mainly used for error reporting or cleanup +tasks, but sometimes exceptions are just a funny way to branch from +one code location to another one. + +Exceptions are objects that hold all the information needed to handle +the exception: the error message, the severity and type of the error, +etc. The class of an exception object indicates the kind of exception +it is. + +Exception handlers are derived from continuations. They are ordinary +subroutines that follow the Parrot calling conventions, but are never +explicitly called from within user code. User code pushes an exception +handler onto the control stack with the C<push_eh> opcode. The system +calls the installed exception handler only when an exception is thrown. + + push_eh _handler # push handler on control stack + find_global P10, "none" # may throw exception + clear_eh # pop the handler off the stack + ... + + _handler: # if not, execution continues here + get_results '(0,0)', P0, S0 # handler is called with (exception, message) + ... + +If the global variable is found, the next statement +(C<clear_eh>) pops the exception handler off the control stack and +normal execution continues. If the C<find_global> call doesn't find +C<none> it throws an exception by passing an exception object to the +exception handler. + +The first exception handler in the control stack sees every exception +thrown. The handler has to examine the exception object and decide +whether it can handle it (or discard it) or whether it should +C<rethrow> the exception to pass it along to an exception handler +deeper in the stack. The C<rethrow> opcode is only valid in exception +handlers. It pushes the exception object back onto the control stack so +Parrot knows to search for the next exception handler in the stack. The +process continues until some exception handler deals with the exception +and returns normally, or until there are no more exception handlers on +the control stack. When the system finds no installed exception handlers +it defaults to a final action, which normally means it prints an +appropriate message and terminates the program. + +When the system installs an exception handler, it creates a return +continuation with a snapshot of the current interpreter context. If +the exception handler just returns (that is, if the exception is +cleanly caught) the return continuation restores the control stack +back to its state when the exception handler was called, cleaning up +the exception handler and any other changes that were made in the +process of handling the exception. + +Exceptions thrown by standard Parrot opcodes (like the one thrown by +C<find_global> above or by the C<throw> opcode) are always resumable, +so when the exception handler function returns normally it continues +execution at the opcode immediately after the one that threw the +exception. Other exceptions at the run-loop level are also generally +resumable. + + new P10, Exception # create new Exception object + set P10["_message"], "I die" # set message attribute + throw P10 # throw it + +Exceptions are designed to work with the Parrot calling conventions. +Since the return addresses of C<bsr> subroutine calls and exception +handlers are both pushed onto the control stack, it's generally a bad +idea to combine the two. + +=head1 ATTACHMENTS + +None. + +=head1 FOOTNOTES + +None. + +=head1 REFERENCES + + src/ops/core.ops + src/exceptions.c + runtime/parrot/include/except_types.pasm + runtime/parrot/include/except_severity.pasm + +=cut + +__END__ +Local Variables: + fill-column:78 +End: Added: trunk/docs/pdds/clip/pdd24_events.pod ============================================================================== --- (empty file) +++ trunk/docs/pdds/clip/pdd24_events.pod Tue Apr 18 13:45:23 2006 @@ -0,0 +1,156 @@ +# Copyright: 2001-2006 The Perl Foundation. +# $Id: $ + +=head1 NAME + +docs/pdds/clip/pddXX_events.pod - Parrot Events + +=head1 ABSTRACT + +This document defines the requirements and implementation strategy for +Parrot's event subsystem. + +=head1 VERSION + +$Revision: $ + +=head1 DESCRIPTION + +Description of the subject. + +=head1 DEFINITIONS + +Definitions of important terms. (optional) + +=head1 IMPLEMENTATION + +[Excerpt from Perl 6 and Parrot Essentials to seed discussion.] + +An event is a notification that something has happened: the user has +manipulated a GUI element, an I/O request has completed, a signal has +been triggered, or a timer has expired. Most systems these days have an +event handler (often two or three, which is something of a problem), +because handling events is so fundamental to modern GUI programming. +Unfortunately, the event handling system is not integrated, or poorly +integrated, with the I/O system, leading to nasty code and unpleasant +workarounds to try and make a program responsive to network, file, and +GUI events simultaneously. Parrot presents a unified event handling +system, integrated with its I/O system, which makes it possible to write +cross-platform programs that work well in a complex environment. + +Parrot's events are fairly simple. An event has an event type, some +event data, an event handler, and a priority. Each thread has an event +queue, and when an event happens it's put into the right thread's +queue (or the default thread queue in those cases where we can't tell +which thread an event was destined for) to wait for something to +process it. + +Any operation that would potentially block drains the event queue +while it waits, as do a number of the cleanup opcodes that Parrot uses +to tidy up on scope exit. Parrot doesn't check each opcode for an +outstanding event for pure performance reasons, as that check gets +expensive quickly. Still, Parrot generally ensures timely event +handling, and events shouldn't sit in a queue for more than a few +milliseconds unless event handling has been explicitly disabled. + +When Parrot does extract an event from the event queue, it calls that +event's event handler, if it has one. If an event doesn't have a +handler, Parrot instead looks for a generic handler for the event type +and calls it instead. If for some reason there's no handler for the +event type, Parrot falls back to the generic event handler, which +throws an exception when it gets an event it doesn't know how to +handle. You can override the generic event handler if you want Parrot +to do something else with unhandled events, perhaps silently +discarding them instead. + +Because events are handled in mainline code, they don't have the +restrictions commonly associated with interrupt-level code. It's safe +and acceptable for an event handler to throw an exception, allocate +memory, or manipulate thread or global state safely. Event handlers +can even acquire locks if they need to, though it's not a good idea to +have an event handler blocking on lock acquisition. + +Parrot uses the priority on events for two purposes. First, the +priority is used to order the events in the event queue. Events for a +particular priority are handled in a FIFO manner, but higher-priority +events are always handled before lower-priority events. Parrot also +allows a user program or event handler to set a minimum event priority +that it will handle. If an event with a priority lower than the +current minimum arrives, it won't be handled, instead sitting in the +queue until the minimum priority level is dropped. This allows an +event handler that's dealing with a high-priority event to ignore +lower-priority events. + +User code generally doesn't need to deal with prioritized events, so +programmers should adjust event priorities with care. Adjusting the +default priority of an event, or adjusting the current minimum +priority level, is a rare occurrence. It's almost always a mistake to +change them, but the capability is there for those rare occasions +where it's the correct thing to +do. + +=head2 Signals + +Signals are a special form of event, based on the Unix signal mechanism. +Parrot presents them as mildly special, as a remnant of Perl's Unix +heritage, but under the hood they're not treated any differently from +any other event. + +The Unix signaling mechanism is something of a mash, having been +extended and worked on over the years by a small legion of undergrad +programmers. At this point, signals can be divided into two +categories, those that are fatal, and those that aren't. + +Fatal signals are things like +SIGKILL, which unconditionally kills a process, or SIGSEGV, which +indicates that the process has tried to access memory that isn't part +of your process. There's no good way for Parrot to catch these +signals, so they remain fatal and will kill your process. On some +systems it's possible to catch some of the fatal signals, but +Parrot code itself operates at too high a level for a user program to +do anything with them--they must be handled with special-purpose code +written in C or some other low-level language. Parrot itself may +catch them in special circumstances for its own use, but that's an +implementation detail that isn't exposed to a user program. + +Non-fatal signals are things like SIGCHLD, indicating that a +child process has died, or SIGINT, indicating that the user +has hit C<^C> on the keyboard. Parrot turns these signals into events +and puts them in the event queue. Your program's event handler for the +signal will be called as soon as Parrot gets to the event in the queue, +and your code can do what it needs to with it. + +SIGALRM, the timer expiration signal, is treated specially by +Parrot. Generated by an expiring alarm() system call, this signal is +normally used to provide timeouts for system calls that would +otherwise block forever, which is very useful. The big downside to +this is that on most systems there can only be one outstanding +alarm() request, and while you can get around this somewhat with the +setitimer call (which allows up to three pending alarms) it's still +quite limited. + +Since Parrot's IO system is fully asynchronous and never blocks--even +what looks like a blocking request still drains the event queue--the +alarm signal isn't needed for this. Parrot instead grabs SIGALRM for +its own use, and provides a fully generic timer system which allows +any number of timer events, each with their own callback functions +and private data, to be outstanding. + +=head1 ATTACHMENTS + +None. + +=head1 FOOTNOTES + +None. + +=head1 REFERENCES + +None. + +=cut + +__END__ +Local Variables: + fill-column:78 +End: Added: trunk/docs/pdds/clip/pdd25_threads.pod ============================================================================== --- (empty file) +++ trunk/docs/pdds/clip/pdd25_threads.pod Tue Apr 18 13:45:23 2006 @@ -0,0 +1,134 @@ +# Copyright: 2001-2006 The Perl Foundation. +# $Id: $ + +=head1 NAME + +docs/pdds/clip/pddXX_threads.pod - Parrot Threads + +=head1 ABSTRACT + +This document defines the requirements and implementation strategy for +Parrot's threading model. + +=head1 VERSION + +$Revision: $ + +=head1 DEFINITIONS + +Concurrency + +=head1 DESCRIPTION + +Description of the subject. + +=head1 IMPLEMENTATION + +[Excerpt from Perl 6 and Parrot Essentials to seed discussion.] + +Threads are a means of splitting a process into multiple pieces that +execute simultaneously. It's a relatively easy way to get some +parallelism without too much work. Threads don't solve all the +parallelism problems your program may have. Sometimes multiple +processes on a single system, multiple processes on a cluster, or +processes on multiple separate systems are better. But threads do +present a good solution for many common cases. + +All the resources in a threaded process are shared between threads. +This is simultaneously the great strength and great weakness of +threads. Easy sharing is fast sharing, making it far faster to +exchange data between threads or access shared global data than to +share data between processes on a single system or on multiple +systems. Easy sharing is dangerous, though, since without some sort of +coordination between threads it's easy to corrupt that shared data. +And, because all the threads are contained within a single process, if +any one of them fails for some reason the entire process, with all its +threads, dies. + +With a low-level language such as C, these issues are manageable. The +core data types, integers, floats, and pointers are all small enough +to be handled atomically. Composite data can be protected with +mutexes, special structures that a thread can get exclusive access to. +The composite data elements that need protecting can each have a mutex +associated with them, and when a thread needs to touch the data it +just acquires the mutex first. By default there's very little data +that must be shared between threads, so it's relatively easy, barring +program errors, to write thread-safe code if a little thought is given +to the program structure. + +Things aren't this easy for Parrot, unfortunately. A PMC, Parrot's +native data type, is a complex structure, so we can't count on the +hardware to provide us atomic access. That means Parrot has to provide +atomicity itself, which is expensive. Getting and releasing a mutex +isn't really that expensive in itself. It has been heavily optimized by +platform vendors because they want threaded code to run quickly. It's +not free, though, and when you consider that running flat-out Parrot +does one PMC operation per 100 CPU cycles, even adding an additional 10 +cycles per operation can slow Parrot down by 10%. + +For any threading scheme, it's important that your program isn't +hindered by the platform and libraries it uses. This is a common +problem with writing threaded code in C, for example. Many libraries +you might use aren't thread-safe, and if you aren't careful with them +your program will crash. While we can't make low-level libraries any +safer, we can make sure that Parrot itself won't be a danger. There is +very little data shared between Parrot interpreters and threads, and +access to all the shared data is done with coordinating mutexes. This +is invisible to your program, and just makes sure that Parrot itself +is thread-safe. + +When you think about it, there are really three different threading +models. In the first one, multiple threads have no interaction among +themselves. This essentially does with threads the same thing that's +done with processes. This works very well in Parrot, with the +isolation between interpreters helping to reduce the overhead of this +scheme. There's no possibility of data sharing at the user level, so +there's no need to lock anything. + +In the second threading model, multiple threads run and pass messages +back and forth between each other. Parrot supports this as well, via +the event mechanism. The event queues are thread-safe, so one thread +can safely inject an event into another thread's event queue. This is +similar to a multiple-process model of programming, except that +communication between threads is much faster, and it's easier to pass +around structured data. + +In the third threading model, multiple threads run and share data +between themselves. While Parrot can't guarantee that data at the user +level remains consistent, it can make sure that access to shared data +is at least safe. We do this with two mechanisms. + +First, Parrot presents an advisory lock system to user code. Any piece +of user code running in a thread can lock a variable. Any attempt to +lock a variable that another thread has locked will block until the +lock is released. Locking a variable only blocks other lock attempts. +It does I<not> block plain access. This may seem odd, but it's the +same scheme used by threading systems that obey the POSIX thread +standard, and has been well tested in practice. + +Secondly, Parrot forces all shared PMCs to be marked as such, and all +access to shared PMCs must first acquire that PMC's private lock. This +is done by installing an alternate vtable for shared PMCs, one that +acquires locks on all its parameters. These locks are held only for +the duration of the vtable function, but ensure that the PMCs affected +by the operation aren't altered by another thread while the vtable +function is in progress. + +=head1 ATTACHMENTS + +None. + +=head1 FOOTNOTES + +None. + +=head1 REFERENCES + +None. + +=cut + +__END__ +Local Variables: + fill-column:78 +End: