Author: allison
Date: Fri Mar 17 19:59:05 2006
New Revision: 11923

Modified:
   trunk/docs/pdds/clip/pddXX_io.pod

Changes in other areas also in this revision:
Modified:
   trunk/   (props changed)

Log:
Cleanup, expand, and integrate the last few comments.

Modified: trunk/docs/pdds/clip/pddXX_io.pod
==============================================================================
--- trunk/docs/pdds/clip/pddXX_io.pod   (original)
+++ trunk/docs/pdds/clip/pddXX_io.pod   Fri Mar 17 19:59:05 2006
@@ -36,118 +36,26 @@
 =head1 DESCRIPTION
 
 This is a draft document defining Parrot's I/O subsystem, for both
-streams and network I/O.
-
-=head2 Synchronous and Asynchronous operations
-
-Currently, Parrot only implements synchronous I/O operations.
-Asynchronous operations are essentially the same as the synchronous
-operations, but each asynchronous operation runs in its own thread.
-
-Note: this is a deviation from the existing plan, which had all I/O
-operations run internally as asynchronous, and the synchronous
-operations as a compatibility layer on top of the asynchronous
-operations. This conceptual simplification means that all I/O operations
-are possible without threading support (for example, in a stripped-down
-version of Parrot running on a PDA). [Though, asynchronous operations
-don't have to use Parrot threads, they could use some alternate
-threading implementation, but it's overkill to develop two threading
-implementations.]
-
-The asynchronous I/O implementation will use Parrot's I/O layer
-architecture so some platforms can take advantage of their built-in
-asynchronous operations instead of using Parrot threads.
-
-Communication between the calling code and the asynchronous operation
-thread will be handled by a shared status object. The operation thread
-will update the status object whenever the status changes, and the
-calling code can check the status object at any time. 
+streams and network I/O. Parrot has both synchronous and asynchronous
+I/O operations. This section describes the interface, and the
+L<IMPLEMENTATION> section provides more details on general
+implementation questions and error handling. 
 
 The signatures for the asynchronous operations are nearly identical to
 the synchronous operations, but the asynchronous operations take an
 additional argument for a callback, and the only return value from the
 asynchronous operations is a status object. The callbacks take the
 status object as their first argument, and any return values as their
-second arguments. [This relies on the presence of a callback argument to
-mark which calls should be asynchronous. If we want asynchronous calls
-that don't supply callbacks (perhaps if the user wants to manually check
-later if the operation succeded) we would need another strategy to
-differentiate the two. This is probably enough of a fringe case that we
-don't need to provide opcodes for it.]
-
-=head2 Error handling
-
-Currently some of the networking opcodes (C<connect>, C<recv>, C<send>,
-C<poll>, C<bind>, and C<listen>) return an integer indicating the status
-of the call, -1 or a system error code if unsuccessful. Other I/O
-opcodes (such as C<getfd> and C<accept>) have various different
-strategies for error notification, and others have no way of marking
-errors at all. We want to unify all I/O opcodes so they use a consistent
-strategy for error notification. There are several options in how we do
-this.
-
-=head3 Integer status codes
-
-One approach is to have every I/O operation return an integer status
-code indicating success or failure. This approach has the advantage of
-being lightweight: returning a single additional integer is cheap. The
-disadvantage is that it's not very flexible: the only way to look for
-errors is to check the integer return value, possibly comparing it to a
-predefined set of error constants.
-
-=head3 Exceptions
-
-Another option is to have all I/O operations throw exceptions on errors.
-The advantage is that it keeps the error tracking information
-out-of-band, so it doesn't affect the arguments or return values of the
-calls (some opcodes that have a return value plus an integer status code
-have odd looking signatures). One disadvantage of this approach is that
-it forces all users to handle exceptions from I/O operations even if
-they aren't using exceptions otherwise. 
-
-A more significant disadvantage is that exeptions don't work well with
-asynchronous operations. Exception handlers are set for a particular
-dynamic scope, but with an asynchronous operation, by the time an
-exception is thrown execution has already left the dynamic scope where
-the exception handler was set. [Though, this partly depends on how
-exceptions are implemented.]
-
-=head3 Error callbacks
-
-A minor variation on the exceptions option is to pass an error callback
-into each I/O opcode. This solves the problem of asynchronous operations
-because the operation has its own custom error handling code rather than
-relying on an exception handler in its dynamic scope.
-
-The disadvantage is that the user has to define a custom error handler
-routine for every call. It also doesn't cope well with cases where
-multiple different kinds of errors may be returned by a single opcode.
-(The one error handler would have to cope with all possible types of
-errors.) There is an easier way.
-
-=head3 Hybrid solution
-
-Another option is to return a status object from each I/O operation. The
-status object could be used to get an integer status code, string
-status/error message, or boolean success value. It could also provide a
-method to throw an exception on error conditions. There could even be a
-global option that tells Parrot to always throw exceptions on errors in
-synchronous I/O operations, implemented by calling this method on the
-status object before returning from the I/O opcode.
-
-The advantages are that this works well with asynchronous and
-synchronous operations, and provides flexibility for multiple different
-uses.  Also, something like a status object will be needed anyway to
-allow users to check on the status of a particular asynchronous call in
-progress, so this is a nice unification.
-
-The disadvantage is that a status object involves more overhead than a
-simple integer status code.
+remaining arguments. 
 
+The listing below says little about whether the opcodes return error
+information. For now assume that they can either return a status object,
+or return nothing. Error handling is discussed more thoroughly in the
+implementation section.
 
 =head2 I/O Stream Opcodes
 
-=head3 Opening and Closing Streams
+=head3 Opening and closing streams
 
 =over 4
 
@@ -182,7 +90,7 @@
 
 =back
 
-=head3 Retrieving Existing Streams
+=head3 Retrieving existing streams
 
 These opcodes do not have asynchronous variants.
 
@@ -201,7 +109,7 @@
 
 =back
 
-=head3 Writing to Streams
+=head3 Writing to streams
 
 =over 4
 
@@ -225,7 +133,7 @@
 
 =back
 
-=head3 Reading From Streams
+=head3 Reading from streams
 
 =over 4
 
@@ -264,7 +172,7 @@
 
 =back
 
-=head3 Retrieving and Setting Stream Properties
+=head3 Retrieving and setting stream properties
 
 =over 4
 
@@ -278,7 +186,9 @@
 the second 32 bits). [The two-register emulation for 64-bit integers may
 be deprecated in the future.]
 
-No asynchronous version.
+The asynchronous version takes an additional final PMC callback
+argument. When the seek operation is complete, it invokes the callback,
+passing it a status object and the stream object it was called on.
 
 =item *
 
@@ -292,9 +202,7 @@
 
 =item *
 
-C<getfd> retrieves the UNIX integer file descriptor of a stream object,
-or 0 if it doesn't have an integer file descriptor. [Maybe -1 would be a
-better code for "undefined", since standard input is 0.]
+C<getfd> retrieves the UNIX integer file descriptor of a stream object.
 
 No asynchronous version.
 
@@ -358,8 +266,6 @@
 socket object whether the object is being used synchronously or
 asynchronously.
 
-[This may be deprecated and replaced with a method on I/O objects.]
-
 =back
 
 =head3 Deprecated opcodes
@@ -374,18 +280,18 @@
 
 =back
 
-=head2 File opcodes
+=head2 Filesystem Opcodes
 
 =over 4
 
 =item *
 
 C<stat> retrieves information about a file on the filesystem. It takes a
-string filename or an integer argument of a UNIX file descriptor, and an
-integer flag for the type of information requested. It returns an
-integer containing the requested information.  The following constants
-are defined for the type of information requested (see
-F<runtime/parrot/include/stat.pasm>):
+string filename or an integer argument of a UNIX file descriptor [or an
+already opened stream object?], and an integer flag for the type of
+information requested. It returns an integer containing the requested
+information.  The following constants are defined for the type of
+information requested (see F<runtime/parrot/include/stat.pasm>):
 
   0    STAT_EXISTS
            Whether the file exists.
@@ -412,8 +318,89 @@
   10   STAT_GID
            The group ID of the file.
 
+The asynchronous version takes an additional final PMC callback
+argument, and only returns a status object. When the stat operation is
+complete, it invokes the callback, passing it a status object and an
+integer containing the status information.
+
+=item *
+
+C<unlink> deletes a file from the filesystem. It takes a single string
+argument of a filename (including the path).
+
+The asynchronous version takes an additional final PMC callback
+argument. When the unlink operation is complete, it invokes the
+callback, passing it a status object.
+
+=item *
+
+C<rmdir> deletes a directory from the filesystem if that directory is
+empty. It takes a single string argument of a directory name (including
+the path).
+
+The asynchronous version takes an additional final PMC callback
+argument. When the rmdir operation is complete, it invokes the callback,
+passing it a status object.
+
+=item *
+
+C<opendir> opens a stream object for a directory. It takes a single
+string argument of a directory name (including the path) and returns a
+stream object.
+
+The asynchronous version takes an additional final PMC callback
+argument, and only returns a status object. When the opendir operation
+is complete, it invokes the callback, passing it a status object and a
+newly created stream object.
+
+=item *
+
+C<readdir> reads a single item from an open directory stream object. It
+takes a single stream object argument and returns a string containing
+the path and filename/directory name of the current item. (i.e. the
+directory stream object acts as an iterator.)
+
+The asynchronous version takes an additional final PMC callback
+argument, and only returns a status object. When the readdir operation
+is complete, it invokes the callback, passing it a status object and the
+string result.
+
+=item *
+
+C<telldir> returns the current position of C<readdir> operations on a
+directory stream object.
+
+No asynchronous version.
+
+=item *
+
+C<seekdir> sets the current position of C<readdir> operations on a
+directory stream object. It takes a stream object argument and an
+integer for the position. [The system C<seekdir> requires that the
+position argument be the result of a previous C<telldir> operation.]
+
+The asynchronous version takes an additional final PMC callback
+argument. When the seekdir operation is complete, it invokes the
+callback, passing it a status object and the directory stream object it
+was called on.
+
+=item *
+
+C<rewinddir> sets the current position of C<readdir> operations on a
+directory stream object back to the beginning of the directory. It takes
+a stream object argument.
+
 No asynchronous version.
 
+=item *
+
+C<closedir> closes a directory stream object. It takes a single stream
+object argument.
+
+The asynchronous version takes an additional final PMC callback
+argument. When the closedir operation is complete, it invokes the
+callback, passing it a status object.
+
 =back
 
 =head2 Network I/O Opcodes
@@ -421,7 +408,6 @@
 Most of these opcodes conform to the standard UNIX interface, but the
 layer API allows alternate implementations for each.
 
-
 =over 4
 
 =item *
@@ -437,7 +423,7 @@
 
 =item *
 
-C<sockaddr> returns a string representing a socket address, generated
+C<sockaddr> returns an object representing a socket address, generated
 from a port number (integer) and an address (string).
 
 No asynchronous version.
@@ -468,14 +454,23 @@
 C<send> sends a message string to a connected socket object. 
 
 The asynchronous version takes an additional final PMC callback
-argument, and only returns a status object. When the recv operation is
+argument, and only returns a status object. When the send operation is
+complete, it invokes the callback, passing it a status object.
+
+=item *
+
+C<sendto> sends a message string to an address specified in an address
+object (first connecting to the address).
+
+The asynchronous version takes an additional final PMC callback
+argument, and only returns a status object. When the sendto operation is
 complete, it invokes the callback, passing it a status object.
 
 
 =item *
 
-C<bind> binds a socket object to the port and address specified by a
-string address (the packed result of C<sockaddr>).
+C<bind> binds a socket object to the port and address specified by an
+address object (the packed result of C<sockaddr>).
 
 The asynchronous version takes an additional final PMC callback
 argument, and only returns a status object. When the bind operation is
@@ -506,6 +501,19 @@
 each connection received), the asynchronous version is only called once,
 but continues to send new connection events until the socket is closed.]
 
+=item *
+
+C<shutdown> closes a socket object for reading, for writing, or for all
+I/O. It takes a socket object argument and an integer argument for the
+type of shutdown:
+
+  0    PIOSHUTDOWN_READ
+           Close the socket object for reading.
+  1    PIOSHUTDOWN_WRITE
+           Close the socket object for writing.
+  2    PIOSHUTDOWN
+           Close the socket object.
+
 =back
 
 
@@ -517,9 +525,163 @@
 searched downwards until a non-NULL function pointer is found for
 that particular slot.
 
+=head2 Synchronous and Asynchronous Operations
+
+Currently, Parrot only implements synchronous I/O operations.
+Asynchronous operations are essentially the same as the synchronous
+operations, but each asynchronous operation runs in its own thread.
+
+Note: this is a deviation from the existing plan, which had all I/O
+operations run internally as asynchronous, and the synchronous
+operations as a compatibility layer on top of the asynchronous
+operations. This conceptual simplification means that all I/O operations
+are possible without threading support (for example, in a stripped-down
+version of Parrot running on a PDA). [Asynchronous operations don't have
+to use Parrot threads, they could use some alternate threading
+implementation. But it's overkill to develop two threading
+implementations. If Parrot threads turn out to be too heavyweight, we
+may want to look into a lighter weight variation for asynchronous
+operations.]
+
+The asynchronous I/O implementation will use Parrot's I/O layer
+architecture so some platforms can take advantage of their built-in
+asynchronous operations instead of using Parrot threads.
+
+Communication between the calling code and the asynchronous operation
+thread will be handled by a shared status object. The operation thread
+will update the status object whenever the status changes, and the
+calling code can check the status object at any time. [Twisted has an
+interesting variation on this, in that it replaces the status object
+with the returned result of the asynchronous call when the call is
+complete. That is probably too confusing, but we might give the status
+object a reference to the returned result.]
+
+The current strategy for differentating the synchronous calls from
+asynchronous ones relies on the presence of a callback argument in the
+asynchronous calls. If we wanted asynchronous calls that don't supply
+callbacks (perhaps if the user wants to manually check later if the
+operation succeded) we would need another strategy to differentiate the
+two. This is probably enough of a fringe case that we don't need to
+provide opcodes for it, provided they can access the functionality via
+methods on ParrotIO objects.
+
+=head2 Error Handling
+
+Currently some of the networking opcodes (C<connect>, C<recv>, C<send>,
+C<poll>, C<bind>, and C<listen>) return an integer indicating the status
+of the call, -1 or a system error code if unsuccessful. Other I/O
+opcodes (such as C<getfd> and C<accept>) have various different
+strategies for error notification, and others have no way of marking
+errors at all. We want to unify all I/O opcodes so they use a consistent
+strategy for error notification. There are several options in how we do
+this.
+
+=head3 Integer status codes
+
+One approach is to have every I/O operation return an integer status
+code indicating success or failure. This approach has the advantage of
+being lightweight: returning a single additional integer is cheap. The
+disadvantage is that it's not very flexible: the only way to look for
+errors is to check the integer return value, possibly comparing it to a
+predefined set of error constants.
+
+=head3 Exceptions
+
+Another option is to have all I/O operations throw exceptions on errors.
+The advantage is that it keeps the error tracking information
+out-of-band, so it doesn't affect the arguments or return values of the
+calls (some opcodes that have a return value plus an integer status code
+have odd looking signatures). One disadvantage of this approach is that
+it forces all users to handle exceptions from I/O operations even if
+they aren't using exceptions otherwise. 
+
+A more significant disadvantage is that exeptions don't work well with
+asynchronous operations. Exception handlers are set for a particular
+dynamic scope, but with an asynchronous operation, by the time an
+exception is thrown execution has already left the dynamic scope where
+the exception handler was set. [Though, this partly depends on how
+exceptions are implemented.]
+
+=head3 Error callbacks
+
+A minor variation on the exceptions option is to pass an error callback
+into each I/O opcode. This solves the problem of asynchronous operations
+because the operation has its own custom error handling code rather than
+relying on an exception handler in its dynamic scope.
+
+The disadvantage is that the user has to define a custom error handler
+routine for every call. It also doesn't cope well with cases where
+multiple different kinds of errors may be returned by a single opcode.
+(The one error handler would have to cope with all possible types of
+errors.) There is an easier way.
+
+=head3 Hybrid solution
+
+Another option is to return a status object from each I/O operation. The
+status object could be used to get an integer status code, string
+status/error message, or boolean success value. It could also provide a
+method to throw an exception on error conditions. There could even be a
+global option (or an option set on a particular I/O object) that tells
+Parrot to always throw exceptions on errors in synchronous I/O
+operations, implemented by calling this method on the status object
+before returning from the I/O opcode.
+
+The advantages are that this works well with asynchronous and
+synchronous operations, and provides flexibility for multiple different
+uses.  Also, something like a status object will be needed anyway to
+allow users to check on the status of a particular asynchronous call in
+progress, so this is a nice unification.
+
+The disadvantage is that a status object involves more overhead than a
+simple integer status code.
+
+=head2 IPv6 Support
+
+The transition from IPv4 to IPv6 is in progress, though not likely to be
+complete anytime soon. Most operating systems today offer at least
+dual-stack IPv6 implementations, so they can use either IPv4 or IPv6,
+depending on what's available. Parrot also needs to support either
+protocol. For the most part, the network I/O opcodes should internally
+handle either addressing scheme, without requiring the user to specify
+which scheme is being used.
+
+IETF recommends defaulting to IPv6 connections and falling back to IPv4
+connections when IPv6 fails. This would give us more solid testing of
+Parrot's compatibility IPv6, but may be too slow. Either way, it's a
+good idea to make setting the default (or selecting one exclusively) an
+option when compiling Parrot.
+
+The most important issues for Parrot to consider with IPv6 are:
+
+=over 4
+
+=item *
+
+Support 128 bit addresses. IPv6 addresses are colon-separated
+hexadecimal numbers, such as C<20a:95ff:fef5:7e5e>.
+
+=item *
+
+Any address parsing should be able to support the address separated from
+a port number or prefix/length by brackets: C<[20a:95ff:fef5:7e5e]:80>
+and C<[20a:95ff::]/64>.
+
+=item *
+
+Packed addresses, such as the result of the C<sockaddr> opcode, should
+be passed around as an object (or at least a structure) rather than as a
+string.
+
+=back
+
+See the relevant IETF RFCs: "Application Aspects of IPv6 Transition"
+(http://www.ietf.org/rfc/rfc4038.txt) and "Basic Socket Interface
+Extensions for IPv6" (http://www.ietf.org/rfc/rfc3493.txt).
+
+=head2 Excerpt
+
 [Below is an excerpt from "Perl 6 and Parrot Essentials", included to
-seed discussion. Note that while Parrot was originally specified as
-having asynchronous I/O, all current opcodes are synchronous I/O.]
+seed discussion.]
 
 Parrot's base I/O system is fully asynchronous I/O with callbacks and
 per-request private data. Since this is massive overkill in many cases,
@@ -600,6 +762,8 @@
   runtime/parrot/library/Stream/*
   src/io/io_unix.c
   src/io/io_win32.c
+  Perl 5's IO::AIO
+  Perl 5's POE
 
 =cut
 

Reply via email to