Please find attached a patch for PDD 16 that addresses many of the
concerns I am aware of regarding NCI.

Notable changes include:
* a move to PCC-alike signatures
* support for a wider range of types (unsigned, pass by reference,
large types, etc)
* '2', '3', and '4' PMC integer pass by reference types have been
replaced with a more general pass by reference system
* explicit flagging and handling of memory ownership of string buffers
* callbacks split into 2 categories: "thunk" and "closure"
* thunk callbacks no longer modify user supplied userdata parameter
* memory management of callback state explicitly delegated to user (we
currently leak memory to avoid this)
* brief documentation of the shortcomings of the default (static)
frame builder, mention of the possibility of others

Some concerns *not* addressed by this (but mentioned in NCITasklist) are:
* wide characters
* padded strings/buffers
* zero-copy buffers

It is my opinion that these are the realm of the string system and/or
custom PMCs (even though they may be very useful for interacting
*with* NCI).

Note that some of the proposed NCI parameter types may be of dubious
value for only NCI (especially 'Iv" and "In"). However, it is my
intention to make the use of NCI and UnManagedStruct consistent.
Therefore these signature types are also designed to be passed to a
set_string_native vtable on UnManagedStruct to specify the shape of
the structure easily. Some additional types may need to be added to
support this (eg: explicit padding, smaller integer types for bit
packed fields).
Index: docs/pdds/draft/pdd16_native_call.pod
===================================================================
--- docs/pdds/draft/pdd16_native_call.pod	(revision 44977)
+++ docs/pdds/draft/pdd16_native_call.pod	(working copy)
@@ -32,97 +32,165 @@
 signature) the signature must be passed in when the linkage between the C
 function and parrot is made.
 
-=head2 Implementation
-
 =head3 Function signatures
 
-The following list are the valid symbols in the function signatures for
-Parrot's NCI. Note that only letters and numbers are valid, and each symbol
-represents a single parameter passed into the NCI. Note that the symbols are
-case-sensitive, and must be within the base 7-bit ASCII character set.
+NCI function signatures in Parrot are similar to PCC function signatures.
+Signatures are of the form C<<${args}->${ret}>>, and individual signature
+elements are composed of upper-case letters to indicate general type followed
+by non-upper-case modifier flags. C<I>, C<N>, C<P>, and C<S> correspond to
+values similar to registers of the same name. An additional argument type C<U>
+is used for callbacks, and C<J> is used to pass the current Parrot interpreter
+as an argument.
 
-At some point punctuation may be used as modifiers on the function
-parameters, in which case each parameter may be represented by multiple
-symbols.
+It may not be possible to implement all signature type modifiers. For example,
+explicitly sized modifiers are limited by the underlying hardware. In general,
+if your C compiler can support it, it is most likely supported by Parrot's NCI.
 
+While the NCI signature of a function resembles a PCC signature, the two are not
+the same. Pass by reference values, for example, can cause these to be
+significantly different.
+
 In I<no> case should the signature symbols be separated by whitespace. This
 restriction may be lifted in the future, but for now remains as an avenue
 for adding additional functionality.
 
+=head4 Supported 
+
+=head4 Integer Elements (C<I>)
+
+By default, the C<I> NCI type corresponds to a signed integer of type
+C<INTVAL>.
+
 =over 4
 
-=item v
+=item u
 
-Void. As a return type it indicates that there I<is> no return type.
+Unsigned value.
 
-Not valid as a parameter type.
+=item +
 
+Signed value. All integer types (except sometimes C<char>) default to this.
+
+=item r
+
+Pass by reference. The value after the function call will be part of the return
+values in Parrot. Cannot be used as a return type.
+
+=item 3..7
+
+Values sized in bits to powers of 2. (eg: 2**4 or 16bits)
+
 =item c
 
-Char. This is an integer type, taken from (or put into) an I register. NOTE:
-it might be signed or unsigned because that is how an unadorned C 'char'
-works.
+C<char>. Signedness defaults to whatever your C compiler would do.
 
-=item s
+=item s, i, l, ll
 
-short. An integer type, taken from or put into an I register. It is always
-signed, not unsigned.
+C<short>, C<int>, C<long>, C<long long> respectively. C<ll> must not be
+separated by other modifiers or it will be parsed as two C<l> size modifiers.
 
-=item i
+=item p, f
 
-int. An integer type. It is always signed, not unsigned.
+C<void *>, and C<void (*)(void)> respectively.
 
-=item l
+=item v, n
 
-long. An integer type. You know the drill. It is always signed, not unsigned.
+VAX and network order (little- and big-endian) respectively. Defaults to what
+your C compiler would do. Not valid on C<Ip> and C<If> types.
 
-=item f
+=back
 
-float. F register denizen.
+=head4 Float Elements (C<N>)
 
-=item d
+By default, the C<N> NCI type corresponds to a float of type C<FLOATVAL>.
 
-double. F register, double-precision floating point type
+=over 4
 
-=item P
+=item r
 
-A PMC register.
+Pass by reference. The value after the function call will be part of the return
+values in Parrot. Cannot be used as a return type.
 
+=item 5..7
+
+Values sized in bits to powers of 2. (eg: 2**6 = 64bits)
+
+=item f, d, ld
+
+C<float>, C<double>, and C<long double> respectively. C<ld> must not be
+separated by other modifiers.
+
+=back
+
+=head4 Pointer-like Elements (C<P>)
+
+By default, the C<P> NCI type corresponds to a C<PMC *> with no pre- or
+post-processing.
+
+=over 4
+
+=item i
+
+Use the current method invocant PMC.
+
+=item z
+
+Always call with a C<NULL> pointer in this argument position. Does not show up
+in Parrot's arguments. Cannot be used as a return type.
+
 =item p
 
-PMC thingie. A generic pointer, taken from a PMC by using its
-get_pointer vtable function, or NULL for a PMCNULL.
-If this is a return type and the value is NULL, PMCNULL is returned,
-otherwise parrot will create a new UnManagedStruct PMC type, which
-is just a generic "pointer to something" PMC type which Parrot does
-I<no> management of.
+A generic pointer (treated as a C<void *>), taken from a PMC by using its
+C<get_pointer> vtable function, or C<NULL> for C<PMCNULL>. As a return
+value, it will create a new C<UnManagedStruct> PMC to represent a generic
+"pointer to something"; unless the return value is C<NULL>, in which case the
+return value is C<PMCNULL>.
 
-=item 2
+=back
 
-A pointer to a short, taken from an P register of an int-like PMC.
+=head4 String-like Elements (C<S>)
 
-=item 3
+By default, the C<S> NCI type corresponds to a C<STRING *> with no
+pre- or post-processing.
 
-A pointer to an int, taken from an P register of an int-like PMC.
+=item r
 
-=item 4
+Pass by reference. The value after the function call will be part of the return
+values in Parrot. Cannot be used as a return type.
 
-A pointer to a long, taken from an P register of an int-like PMC.
+=item c
 
-=item t
+C<char *>, a null-terminated (C) string.
 
-string pointer. Taken from, or stuck into, a string register. (Converted to a
-null-terminated C string before passing in)
+=item p
 
-=item U
+C<struct { intXX len; char str[] } *>, a counted (Pascal) string. The size of the
+C<len> element is determined by size modifiers. The default size is C<sizeof int>.
 
-This parameter is used for passing user data to a callback creation. More
-explanation in the L<callbacks> section.
+=item 4..6
 
+Size of the C<len> element of a counted string in bits as a power of two. (eg:
+2**6 = 64bits).
+
+=item u, m
+
+Correspond to unmanaged and managed respectively. Used for denoting the memory
+ownership of C<Sc> and C<Sp> type elements. Parrot will deallocate the buffer
+appropriately for managed elements, but will not for unmanaged ones.
+
+Appropriate deallocation is context-specific. For arguments,
+C<Parrot_str_free_cstring> will be used. For return values, C<free> will be
+used.
+
+The default for arguments is managed. The default for return values is unmanaged.
+
+=item b
+
+A fixed-size buffer. When passed by reference, doesn't show up in Parrot return
+values. Cannot be a return type.
+
 =back
 
-Note that not all types are valid as return types.
-
 =head3 Example NCI call
 
 This section describes the simplest example for NCI possible. To every NCI
@@ -155,8 +223,8 @@
      lib = loadlib "hello" # no extension, .so or .dll is assumed
 
      # get a reference to the function from the library just
-     # loaded, called "foo", and signature "void" (and no arguments)
-     func = dlfunc lib, "foo", "v"
+     # loaded, called "foo", and signature "void (*)(void)"
+     func = dlfunc lib, "foo", "->"
 
      # invoke
      func()
@@ -164,6 +232,34 @@
   .end
 
 
+=head3 Pass By Reference
+
+Several NCI categories can be marked pass by reference. Since Parrot does not
+have a concept of pass by reference for value types, but does have the concept
+of multiple return values, which pass by reference is often used to emulate in
+C, the value of the argument after the function call will be used as a return
+value.
+
+Pass by reference return values are placed after the C return value and occur
+in the same order as they are encountered in the arguments signature.
+
+A pass by reference flag in the return position is not permitted.
+
+Pass by reference types as parameters to callbacks work in a similar fashion.
+Their final value is expected as a return value after the C return value.
+
+=head3 Oversized Values
+
+It is possible that a system is capable of supporting values larger than those
+used to define C<INTVAL> and C<FLOATVAL>. It is not possible to represent the
+the full range of these values using Parrot's native types.
+
+Parrot will throw an exception if truncation has occurred translating values
+from C to Parrot.
+
+In the future, wrapping modifiers may be added to the NCI interface to use
+C<BigInt>/C<BigNum> PMCs in stead of Parrot-native values.
+
 =head3 Callbacks
 
 Some libraries, particularly ones implementing more complex functionality such
@@ -172,63 +268,69 @@
 functions must be C functions, and generally are passed parameters to indicate
 what should be done.
 
-Unfortunately there's no good way to generically describe all possible
-callback parameter sets, so in some cases hand-written C will be necessary.
-However, many callback functions share a common signature, and parrot provides
-some ready-made functions for this purpose that should serve for most of the
-callback uses.
+Unfortunately, much in the same way as call-ins, there's no good way to
+generically describe all possible callback parameter sets, so in some cases
+hand-written C will be necessary.
 
-There are two callback functions, Parrot_callback_C and Parrot_callback_D,
-which differ if the passed in C<user_data> is second or first respectively:
+There are 2 posibilities for creating callbacks for C: thunks or closures:
 
-   void (function *)(void *library_data, void *user_data);
+Thunks are static and do not know about your Parrot interpreter and various
+other information required to call into parrot. Therefore it is necessary for
+thunks to take a C<U> (userdata) parameter which will be used by Parrot to
+remember its state. This value will also contain an arbitrary user data PMC
+to pass along to the wrapped sub. The structure of this value should be treated
+as opaque.
 
-   void (function *)(void *user_data, void *library_data);
+Closures encapsulate (close over) some state, in this case the information
+necessary to call back into Parrot. This means that they do not have to take a
+C<U> parameter. It is not generally possible to create closures in C. However,
+some frame builder libraries may support these using black magic.
 
-The information C<library_data> is normally coming from C code and can be
-any C type that Parrot supports as NCI value.
+Thunk callbacks are created using the C<new_callback_p_p_p_p_s> op which takes 3
+arguments: the callback function, an arbitrary user data PMC (part of the user
+data in C<U> arguments), and the signature.
 
-The position of the C<user_data> is specified with the C<U> function
-signature, when creating the callback PMC:
+  cb_PMC, cb_UD = new_callback cb_Sub, user_data, "ScU->"
 
-  cb_PMC = new_callback cb_Sub, user_data, "tU"
-
 Given a Parrot function C<cb_Sub>, and a C<user_data> PMC, this creates a
-callback PMC C<cb_PMC>, which expects the user data as the second argument.
-The information returned by the callback (C<library_data>) is a C string.
+callback PMC C<cb_PMC> and callback userdata C<cb_UD>; which expects a C string
+as the first argument and the C user data as the second argument.
 
-Since parrot needs more than just a pointer to a generic function to figure
-out what to do, it stuffs all the extra information into the C<user_data>
-pointer, which contains a custom PMC holding all the information that Parrot
-needs. This also implies that the C function that installs the callback,
-must not make any assumptions on the C<user_data> argument. This argument
-must be handled transparently by the C code.
+Closure callbacks are created using the C<new_callback_p_p_s. op which takes 2
+arguments: the callback function, and the signature.
 
-The callback function takes care of wrapping the external data pointer into
-an UnManagedStruct PMC, the same as if it were a p return type of a normal
-NCI function.
+  cb_PMC = new_callback cb_Sub, "Sc->"
 
-The signature of the I<parrot> subroutine which is called by the callback
-should be:
+=head3 Memory Management
 
-   void parrotsub(PMC user_data, <type> external_data)
+In general, Parrot is unable to track whether native code has released all
+references to a value. Therefore it is up to the user code to manage the
+lifecycle of the C<cb_UD> PMC in the case of thunk callbacks and C<cb_PMC> in
+the case of closure callbacks.
 
-The sequence for this is:
+The lifetime of these objects can either be handled by keeping a reference to
+the PMC around in Parrot; or by calling C<Parrot_pmc_gc_register>, which can be
+obtained using dlfunc. Calling C<Parrot_pmc_gc_register> and then forgetting
+about a value is a memory leak, caveat emptor.
 
+=head3 Usage Steps
+
+The sequence for this creating and using thunk callbacks is:
+
 =over 4
 
 =item Step 1
 
 Create a callback function.
 
-  new_callback CB_PMC, CB_SUB, USER_DATA, "signature"
+  new_callback CB_PMC, CB_UD, CB_SUB, PMC_UD, "signature"
 
 =item Step 2
 
 Register the callback
 
   dlfunc C_FUNCTION, "function_name", "signature"
-  C_FUNCTION(CP_PMC, USER_DATA)
+  C_FUNCTION(CP_PMC, CB_UD)
 
 =back
 
@@ -267,16 +369,16 @@
     userdata = 42
 
     .local pmc callback_sub
-    callback_sub = new_callback sub, userdata, "vtU"
+    callback_sub, callback_userdata = new_callback sub, userdata, "StU->"
 
     # set up NCI
 
     .local pmc lib, fun
     lib = loadlib "hello"
-    fun = dlfunc lib, "sayhello", "vpP"
+    fun = dlfunc lib, "sayhello", "PpP->"
 
     # do the NCI call, foo_callback is invoked from C
-    fun()
+    fun(callback_sub, callback_userdata)
 
   .end
 
@@ -315,49 +417,41 @@
 The file containing this C code should be compiled as a shared library
 (specifying the C<include> directory so C<<parrot/parrot.h>> can be found.)
 
-=head2 References
+=head2 Default Implementation
 
-L<pdd06_pasm.pod>
+Parrot can be configured at build time to use differing NCI implementations. The
+default implementation relies on thunks compiled by a C compiler ahead of time.
+This has a number of limitations:
 
-=head2 See Also
+=head3 Limited Signatures
 
-L<t/pmc/nci.t>, L<src/nci_test.c>
+New thunks cannot be created at runtime. Dynamic extension libraries of
+addtional thunks can be created using the C<nci_thunks_gen> tool. If a thunk
+is not available, an exception will be raised.
 
-=head2 Version
+=head3 Limited Callback Signatures
 
-=head3 Current
+For similar reasons, only 2 forms of callback thunks are available and
+callback closures are not supported at all.
 
-    Maintainer: Dan Sugalski
-    Class: Internals
-    PDD Number: 16
-    Version: 1.3
-    Status: Developing
-    Last Modified: Feb 26, 2007
-    PDD Format: 1
-    Language: English
+   void (function *)(void *library_data, void *user_data);
 
-=head3 History
+   void (function *)(void *user_data, void *library_data);
 
-=over 4
+The type of C<userdata> is C<U> but that of C<library_data> can be of any pointer
+type or type that is passed equivalently to a pointer by the system's ABI.
 
-=item version 1.3
+If an attempt is made to create a closure callback, or to use thunk signature
+that is not supported; an exception will be raised.
 
-Updated with example for callbacks
+=head2 References
 
-=item version 1.2
+L<pdd06_pasm.pod>
 
-Updated with basic example for NCI.
+=head2 See Also
 
-=item version 1.1
+L<t/pmc/nci.t>, L<src/nci_test.c>, L<tools/dev/nci_thunks_gen.pir>
 
-Changed callback section to reflect current status.
-
-=item version 1
-
-None. First version
-
-=back
-
 =cut
 
 __END__
_______________________________________________
http://lists.parrot.org/mailman/listinfo/parrot-dev

Reply via email to