OK, here's my straw-man proposal for a language interoperability framework; my apologies for sitting on it so long. It's still pretty messy, but I'm sure it will benefit more from other viewpoints at this stage than from polishing.
-- Bob Rogers http://rgrjr.dyndns.org/
# Copyright (C) 2008, The Perl Foundation. # $Id: $ =head1 NAME docs/pdds/pddxx_language_interop.pod - Inter-language calling =head1 VERSION $Revision: 28231 $ =head1 ABSTRACT This PDD describes Parrot's conventions and support for communication between high-level languages (HLLs). It is focused mostly on what implementors should do in order to provide this capability to their users. =head1 DESCRIPTION The ability to mix different high-level languages at runtime has always been an important design goal of Parrot. Another important goal, that of supporting all dynamic languages, makes language interoperability especially interesting -- where "interesting" means the same as it does in the Chinese curse, "May you live in interesting times." It is expected that language implementers, package authors, and package users will have to be aware of language boundaries when writing their code. It is hoped that this will not become too burdensome. None of what follows is binding on language implementors, who may do whatever they please. Nevertheless, we hope they will at least follow the spirit of this document so that the code they produce can be used by the rest of the Parrot community, and save the fancy footwork for intra-language calling. However, this PDD B<is> binding on Parrot implementors, who must provide a stable platform for language interoperability to the language implementors. =head2 Ground rules In order to avoid N**2 complexity and the resulting coordination headaches, each language compiler provides an interface as a target for other languages that should be designed to require a minimum of translation. In the general case, some translation may be required by both the calling language and the called language: | | | Calling sub | | | Language X | | V | Calling stub +================ | | "plain Parrot" | | +================ | | V | Called wrapper | | | | | Language Y V | Called sub | Where necessary, a language may need to provide a "wrapper" sub to interface external calls to the language's internal calling and data representation requirements. Such wrappers are free to do whatever translation is required. Similarly, the caller may need to emit a stub that converts an internal call into something more generic. {{ Of course, "stub" is really too close to "sub", so we should find a better word. Doesn't the C community call these "bounce routines"? Or something? -- rgr, 31-Jul-08. }} {{ I am discovering that there are five different viewpoints here, corresponding to the five layers (including "plain Parrot") of the diagram above. I need to make these viewpoints clearer, and describe the responsibilities of each of these parties to each other. -- rgr, 31-Jul-08. }} Languages are free to implement the stub and wrapper layers (collectively called "glue") as they see fit. In particular, they may be inlined in the caller, or integral to the callee. Ideally, of course, the "plain Parrot" layer will be close enough to the semantics of both languages that glue code is unnecesary, and the call can be made directly. Language implementors are encouraged to dispense with glue whenever possible, even if glue is sometimes required for the general case. In summary: =over 4 =item * Each HLL gets its own namespace subtree, within which C<get_hll_global> and C<set_hll_global> operate. In order to make external calls, the HLL must provide a means of identifying the language, the function, and enough information about the arguments and return values for the calling language to generate the call correctly. This is necessarily language-dependent, and is beyond the scope of this document. =item * When calling across languages, both the caller and the callee should try to use "plain Parrot semantics" to the extent possible. This is explained in more detail below, but essentially means to use the simplest calling conventions and PMC classes possible. Ideally, if an API uses only PMCs that are provided by a "bare Parrot" (i.e. one without any HLL runtime code), then it should be possible to use this API from any other language. =item * It is acceptable for languages to define subs for internal calling that are not suitable for external calling. Such subs should be marked as such, and other languages should respect those distinctions. (Or, if they choose to call intra-language subs, they should be very sure they understand that language's calling conventions. =back =head1 HALF-BAKED IDEAS {{ Every draft PDD should have one of these. ;-} -- rgr, 28-Jul-08. }} =head2 Common syntax for declaring exported functions? I assume we will need some additional namespace support. Not clear yet whether it's better to mark the ones that or OK for external calling, or the ones that are not. (As you can guess, I don't have a strong suggestion for what to call these functions yet. Do we call them "external"? Would that get confused with intra-language public interfaces?) Beyond that, we probably need additional metainformation on the external subs so that calling compilers will know what code to emit. Putting them on the subs means that the calling compiler just needs to load the PBC in order to access the module API (though it may need additional hints). Of course, that also requires a PIR API for accessing this metainformation . . . Crazy idea: This is more or less the same information (typing) required for multimethods. If we encourage the export of multisubs, then the exporting language could provide multiple interfaces, and the calling compiler could query the set of methods for the one most suitable. =head2 More namespace complexity? It might be good to have some way for HLLs to define a separate external definition for a given sub (i.e. one that provides the wrapper) that can be done without too much namespace hair. I.e. .sub foo :extern defines the version that is used by interlanguage calling, and .sub foo defines the version that is seen by other code written in that language (i.e. via C<get_hll_global>). If there is no plain C<foo>, the C<:extern> version is used for internal calls. That way, the compiler can emit both wrapper code and internal code without having to do anything special (much), even if different calling conventions and/or data conversions are required. {{ Of course, this wouldn't be necessary if all external subs were multisubs. -- rgr, 31-Jul-08. }} =head2 Multiple type hierarchies? Different languages will have to "dress up" the Parrot type/class hierarchy differently. For example, Common Lisp specifies that C<STRING> is a subtype of C<VECTOR>, which in turn is a subtype of C<ARRAY>. This is not likely to be acceptable to other languages, so Lisp needs its own view of type relationships, which must affect multimethod dispatch for Lisp generic functions, i.e. a method defined for C<VECTOR> must be considered when passed a string as a parameter. The language that owns the multisub gets to define the type hierarchy and dispatch rules used when it gets called. In order to handle objects from foreign languages, the "owning" language must decide where to graft the foreign class inheritance graph into its own graph. {{ It would be nice if some Parrot class, e.g. C<Object>, could be defined as the conventional place to root language-specific object class hierarchies; that way, a language would only have to include C<Object> in order to incorporate objects from all other conforming languages. -- rgr, 26-Aug-08. }} Note that common Parrot classes will in general appear in different places in different languages' dispatch hierarchies, so it is important to bear in mind which language "owns" the dispatch. =head1 DEFINITIONS {{ Collect definitions of new jargon words here, once we figure out what they should be. -- rgr, 29-Jul-08. }} =head1 IMPLEMENTATION =head2 Plain Parrot Semantics Fortunately, "plain Parrot" is pretty powerful, so the "common denominator" is not in fact the lowest possible. For example, not all Parrot languages support named, optional, or repeated arguments. For the called language, this is never a problem; calling module can only use the subset API anyway. Implementers of subset calling languages are encouraged to provide their users with an extended API for the interlanguage call; typically, this is only required for named arguments. {{ This needs more? -- rgr, 28-Jul-08. }} =head2 Strings {{ I am probably not competent to write this section. At the very least, it requires discussion of languages that expect strings to be mutable versus . . . Java. -- rgr, 28-Jul-08. }} =head2 Other scalar data types All Parrot language implementations should stick to native Parrot PMC types for scalar data, except in case of dire need. To see with this is so, take the particular case of integer division, which differs significantly between languages. In Tcl, "the integer three divided by the integer five" produces the integer value 0. In Perl 5 and Lua, this division produces the floating-point value 0.6. (This happens to be Parrot's native behavior as well.) In Common Lisp, this division produces "3/5", a number of type C<RATIO> with numerator 3 and denominator 5 that represents the mathematically-exact result. Furthermore, no Perl 5 code, when given two integers to divide, will expect a Common Lisp ratio as a result. Any Perl 5 implementation that does this has a bug, even if both those integers happen to come from Common Lisp. Ditto for a floating-point result from Common Lisp code that happens to get two integers from Perl or Lua (or both!). Even though these languages all use "/" to represent division, they do not all mean the same thing by it, and similarly for most (if not all) other built-in arithmetic operators. However, they pretty clearly B<do> mean the same thing by (e.g.) "the integer with value five," so there is no need to represent the inputs to these operations differently; they can all be represented by the same C<Integer> PMC class. {{ Must also discuss morphing: If some languages do it and other do not, then care must be taken at the boundaries. -- rgr, 31-Jul-08. }} =head3 Defining new scalar data types There will be cases where existing Parrot PMC classes cannot represent a primitive HLL scalar type, and so a new PMC class is required. In this case, interoperability cannot be guaranteed, since it may not be possible to define behavior for such objects in other languages. But the choice of a new PMC is forced, so we must make the best of it. A good case in point is that of complex rational numbers in Common Lisp. The C<Complex> type provided by Parrot assumes that its components are floating-point numbers. This is a suitable representation type for C<(COMPLEX REAL)>, but CL partitions "COMPLEX" into C<(COMPLEX REAL)> and C<(COMPLEX RATIONAL)>, with the latter being further divided into C<(COMPLEX RATIO)>, C<(COMPLEX INTEGER)>, etc. The straighforward way to provide this functionality is to define a C<ComplexRational> PMC that is built on C<Complex> and has real and imaginary PMC components that are constrained to be Integer, Bigint, or Ratio PMCs. So how do we make C<(COMPLEX RATIONAL)> arithmetic work as broadly as possible? The first aspect is defining how the new type actually works within its own language. The Lisp arithmetic operators will usually return a ComplexRational if given one, but need to return a RATIONAL subtype if the imaginary part is zero, and that may not be suitable for other languages, so Lisp needs its own set of basic arithmetic operators. We must therefore define methods on these multis that specialize ComplexRational (and probably the generic arithmetic to redispatch on the type of the real and imaginary parts; you know the drill). But, in case we are also passed another operand that is another language's exotic type, we should take care to use the most general possible class to specialize the other operands, in the hope that other exotics are subclasses of these. The other aspect is extending other languages' arithmetic to do something reasonable with our exotic types. If we're lucky, Parrot will provide a basic multisub that takes care of most cases, and we just need to add method(s) to that. If not, we will have to add specialized methods on the other language's multisub, trying to redispatch to the other language's arithmetic ops passing the (hopefully more generic) component PMCs. Doing so is still the responsibility of the language that defines the exotic class, since it is in charge of its internal representation. {{ We can define multimethods on another language without loading it, can't we? If not, then making this work may require negotiation between language implementors, if it is feasible at all. -- rgr, 31-Jul-08. }} This brings us to a number of guidelines for defining language-specific arithmetic so as to maximize interoperability: =over 4 =item 1. Define language-specific operations using multimethods (to avoid conflict with other languages). =item 2. Define them on the highest (most general) possible PMC classes (in order that they continue to work if passed a subclass by a call from a different language). =item 3. Don't define a language-specific PMC class unless there is clear need for a different internal representation. (And even then, you might consider donating it to become part of the Parrot core.) =back The rest of this section details exceptions and caveats in dealing with scalar data types. =head3 "Fuzzy" scalars Some languages are willing to coerce strings to numbers and vice versa without any special action on the part of the programmer and others are not. The problem arises when such "fuzzy" scalars are passed (or returned) to languages that do not support "fuzzy" coercion . . . {{ This section is meant to answer Geoffrey's "What does Lisp do with a Perl 5 Scalar?" question. I gotta think about this more. -- rgr, 29-Jul-08. }} =head3 C<Complex> numbers Not all languages support complex numbers, so if an exported function requires a complex argument, it should either throw a suitable error, or coerce an acceptable numeric argument. In the latter case, be sure to advertise this in the documentation, so that callers without complex numbers can tell their compiler that acceptable numeric type. =head3 C<Ratio> numbers Not all languages support ratios (rather few, actually), so if an exported function requires a ratio as an argument, it should either throw a suitable error, or convert an acceptable numeric value. However, since ratios are rare (and it is rather eccentric for a program to insist on a ratio as a parameter), it is strongly advised to accept a floating point or integer value, and convert it in the wrapper. {{ Parrot does not support these yet, so this is not a current issue. -- rgr, 28-Jul-08. }} =head2 Aggregate data types {{ I probably haven't done these issues justice; I don't know enough Java or Tcl to grok this part of the list discussion. -- rgr, 28-Jul-08. }} Aggregates (hashes, arrays, and struct-like thingies) can either be passed directly, or mapped by wrapper or caller code into something different. The problem with mapping, besides being slow, is that if I<either> the caller or the callee does this, the aggregate is effectively read-only. (It is possible for the wrapper to stuff the changes back in the original structure by side effect, but this has its own set of problems.) In other words, if the mapping is not straightforward, it may not be possible. If the mapping C<is> straightforward it may not be necessary -- and an unnecessary mapping may limit use of the called module's API. Struct-like objects are problematic. They are normally considered as low-level and language-specific, and handled by emitting special code for slot accessor/setter function, which other language compilers won't necessarily know how to do. The choices are therefore to (a) treat them like black boxes in the other language, or (b) provide a separate functional or OO API (or both) for calling from other languages. Several questions arise for languages with multiple representations for aggregate types. Typically, this is because these types are more restricted in some fashion. [finish. -- rgr, 29-Jul-08.] =head2 Functional data types In a sense, functional types (i.e. callable objects) are the easiest things to pass across languages, since they require no mapping at all. On the other hand, if a language doesn't support functional arguments, then there is no hope of using an API written in another language that requires them. =head2 Datum vs. object Some languages present everything to the programmer as an object; in such languages, code only exists in methods. A few languages have no methods, only functions (and/or subroutines) and "passive" data. The remainder have both, and pose no problem calling into the others. But how does an obligate OO language call a non-OO language, or vice versa? An extreme case would be Ruby (which has only objects) and Scheme (which (as far as Ruby is concerned) has none). What good is a Ruby object as a datum to a Scheme program if Scheme can't access any of the methods? Similarly, what could Ruby do with a Scheme list when it can't even get to the Scheme C<car> function? {{ Methinks the right thing would be to define a common introspection API (a good thing in its own right). Scheme and Ruby should each define their own implementation of the same in "plain Parrot semantics" terms, independently. The caller can then use his/her language's binding of the introspection API to poke around in the other module, and find the necessary tools to call the other. For Scheme, this would mean functions for finding Ruby classes and providing functional wrappers around methods. For Ruby, I admit this would probably be even wierder. In any case, it is important that the calling user not need anything out of the ordinary, from either language or the called module author. -- rgr, 29-Jul-08. }} =head3 Defining methods across language boundaries {{ Is the term "unimethod" acceptable here? -- rgr, 29-Jul-08. }} There will be cases where a module user wants to extend that module by defining a new method on an externally-defined class, or add a multimethod to an externally-defined multisub. Since a class with unimethod dispatch belongs wholly to the external language, the calling language (i.e. the one adding the method) must use the semantics of the external language. If the external language uses a significantly different metamodel, simply adding the C<:method> pragma may not cut it. There are two cases: (1) The calling language is adding a new method, which cannot therefore interfere with existing usage in the called language; and (2) the calling language is attempting to extend an existing interface provided by the called language. In the first case, the calling compiler has the option of treating the new method as part of the calling language, and dispensing with the glue altogether. In the second case, the compiler must treat the new method as part of the foreign language, and provide B<both> glue layers (as necessary) around it. It is therefore not expected that all compilers will provide a way to define methods on all foreign classes for all language pairs. Multimethods are easier; although the multisub does belong conceptually to one language (from whose namespace the caller must find the multisub), multis are more loosely coupled to their original language. The cases for multimethods are similar, though: (1) If the calling language method is specialized to classes that appear only in the calling module, then other uses of the multisub will never call the new method, and the calling language can choose to treat as internal. (2) If the calling method is specialized only on Parrot or called-language classes, then the compiler should take care to make it generally usable. =head3 Subclassing across language boundaries {{ This is an important feature, but requires compatible metamodels. -- rgr, 29-Jul-08. }} =head3 Method vs. multimethod {{ This is the issue where some languages (e.g. Common Lisp) use only multimethods, where others (e.g. Ruby) use only unimethods. (S04 says something about MMD "falling back" to unimethods, but so far this is not described in Parrot.) Calling is easy; multimethods look like functions, so the MM language just has to create a function (or MM) wrapper for the UM language, and a UM language can similarly treat a MM call as a normal function call. (Which will require the normal "make the function look like a method" hack for obligate OO languages like Ruby.) Defining methods across the boundary is harder, and may not be worth the trouble. -- rgr, 29-Jul-08. }} =cut __END__ Local Variables: fill-column:78 End: