Re: [perl #57190] HLL Interoperation

Bob Rogers Tue, 26 Aug 2008 19:40:11 -0700

   OK, here's my straw-man proposal for a language interoperability
framework; my apologies for sitting on it so long.  It's still pretty
messy, but I'm sure it will benefit more from other viewpoints at this
stage than from polishing.


                                        -- Bob Rogers
                                           http://rgrjr.dyndns.org/

# Copyright (C) 2008, The Perl Foundation.
# $Id: $

=head1 NAME

docs/pdds/pddxx_language_interop.pod - Inter-language calling

=head1 VERSION

$Revision: 28231 $

=head1 ABSTRACT

This PDD describes Parrot's conventions and support for communication between
high-level languages (HLLs).  It is focused mostly on what implementors should
do in order to provide this capability to their users.

=head1 DESCRIPTION

The ability to mix different high-level languages at runtime has always been
an important design goal of Parrot.  Another important goal, that of
supporting all dynamic languages, makes language interoperability especially
interesting -- where "interesting" means the same as it does in the Chinese
curse, "May you live in interesting times."  It is expected that language
implementers, package authors, and package users will have to be aware of
language boundaries when writing their code.  It is hoped that this will not
become too burdensome.

None of what follows is binding on language implementors, who may do whatever
they please.  Nevertheless, we hope they will at least follow the spirit of
this document so that the code they produce can be used by the rest of the
Parrot community, and save the fancy footwork for intra-language calling.
However, this PDD B<is> binding on Parrot implementors, who must provide a
stable platform for language interoperability to the language implementors.

=head2 Ground rules

In order to avoid N**2 complexity and the resulting coordination headaches,
each language compiler provides an interface as a target for other languages
that should be designed to require a minimum of translation.  In the general
case, some translation may be required by both the calling language and the
called language:

        |
        |
        |                        Calling sub
        |                             |
        |   Language X                |
        |                             V
        |                        Calling stub
        +================             |
                                      |
          "plain Parrot"              |
                                      |
        +================             |
        |                             V
        |                       Called wrapper
        |                             |
        |                             |
        |   Language Y                V
        |                         Called sub
        |                

Where necessary, a language may need to provide a "wrapper" sub to interface
external calls to the language's internal calling and data representation
requirements.  Such wrappers are free to do whatever translation is required.

Similarly, the caller may need to emit a stub that converts an internal call
into something more generic.

{{ Of course, "stub" is really too close to "sub", so we should find a better
word.  Doesn't the C community call these "bounce routines"?  Or something?
-- rgr, 31-Jul-08. }}

{{ I am discovering that there are five different viewpoints here,
corresponding to the five layers (including "plain Parrot") of the diagram
above.  I need to make these viewpoints clearer, and describe the
responsibilities of each of these parties to each other.  -- rgr,
31-Jul-08. }}

Languages are free to implement the stub and wrapper layers (collectively
called "glue") as they see fit.  In particular, they may be inlined in the
caller, or integral to the callee.

Ideally, of course, the "plain Parrot" layer will be close enough to the
semantics of both languages that glue code is unnecesary, and the call can be
made directly.  Language implementors are encouraged to dispense with glue
whenever possible, even if glue is sometimes required for the general case.

In summary:

=over 4

=item *

Each HLL gets its own namespace subtree, within which C<get_hll_global> and
C<set_hll_global> operate.  In order to make external calls, the HLL must
provide a means of identifying the language, the function, and enough
information about the arguments and return values for the calling language to
generate the call correctly.  This is necessarily language-dependent, and is
beyond the scope of this document.

=item *

When calling across languages, both the caller and the callee should try to
use "plain Parrot semantics" to the extent possible.  This is explained in
more detail below, but essentially means to use the simplest calling
conventions and PMC classes possible.  Ideally, if an API uses only PMCs that
are provided by a "bare Parrot" (i.e. one without any HLL runtime code), then
it should be possible to use this API from any other language.

=item *

It is acceptable for languages to define subs for internal calling that are
not suitable for external calling.  Such subs should be marked as such, and
other languages should respect those distinctions.  (Or, if they choose to
call intra-language subs, they should be very sure they understand that
language's calling conventions.

=back

=head1 HALF-BAKED IDEAS

{{ Every draft PDD should have one of these.  ;-}  -- rgr, 28-Jul-08.  }}

=head2 Common syntax for declaring exported functions?

I assume we will need some additional namespace support.  Not clear yet
whether it's better to mark the ones that or OK for external calling, or the
ones that are not.

(As you can guess, I don't have a strong suggestion for what to call these
functions yet.  Do we call them "external"?  Would that get confused with
intra-language public interfaces?)

Beyond that, we probably need additional metainformation on the external subs
so that calling compilers will know what code to emit.  Putting them on the
subs means that the calling compiler just needs to load the PBC in order to
access the module API (though it may need additional hints).  Of course, that
also requires a PIR API for accessing this metainformation . . .

Crazy idea:  This is more or less the same information (typing) required for
multimethods.  If we encourage the export of multisubs, then the exporting
language could provide multiple interfaces, and the calling compiler could
query the set of methods for the one most suitable.

=head2 More namespace complexity?

It might be good to have some way for HLLs to define a separate external
definition for a given sub (i.e. one that provides the wrapper) that can be
done without too much namespace hair.  I.e.

        .sub foo :extern

defines the version that is used by interlanguage calling, and

        .sub foo

defines the version that is seen by other code written in that language
(i.e. via C<get_hll_global>).  If there is no plain C<foo>, the C<:extern>
version is used for internal calls.  That way, the compiler can emit both
wrapper code and internal code without having to do anything special (much),
even if different calling conventions and/or data conversions are required.

{{ Of course, this wouldn't be necessary if all external subs were multisubs.
-- rgr, 31-Jul-08. }}

=head2 Multiple type hierarchies?

Different languages will have to "dress up" the Parrot type/class hierarchy
differently.  For example, Common Lisp specifies that C<STRING> is a subtype
of C<VECTOR>, which in turn is a subtype of C<ARRAY>.  This is not likely to
be acceptable to other languages, so Lisp needs its own view of type
relationships, which must affect multimethod dispatch for Lisp generic
functions, i.e. a method defined for C<VECTOR> must be considered when passed
a string as a parameter.

The language that owns the multisub gets to define the type hierarchy and
dispatch rules used when it gets called.  In order to handle objects from
foreign languages, the "owning" language must decide where to graft the
foreign class inheritance graph into its own graph.  {{ It would be nice if
some Parrot class, e.g. C<Object>, could be defined as the conventional place
to root language-specific object class hierarchies; that way, a language would
only have to include C<Object> in order to incorporate objects from all other
conforming languages.  -- rgr, 26-Aug-08. }}

Note that common Parrot classes will in general appear in different places in
different languages' dispatch hierarchies, so it is important to bear in mind
which language "owns" the dispatch.

=head1 DEFINITIONS

{{ Collect definitions of new jargon words here, once we figure out what they
should be.  -- rgr, 29-Jul-08. }}

=head1 IMPLEMENTATION

=head2 Plain Parrot Semantics

Fortunately, "plain Parrot" is pretty powerful, so the "common denominator" is
not in fact the lowest possible.  For example, not all Parrot languages
support named, optional, or repeated arguments.  For the called language, this
is never a problem; calling module can only use the subset API anyway.
Implementers of subset calling languages are encouraged to provide their users
with an extended API for the interlanguage call; typically, this is only
required for named arguments.

{{ This needs more?  -- rgr, 28-Jul-08. }}

=head2 Strings

    {{ I am probably not competent to write this section.  At the very least,
    it requires discussion of languages that expect strings to be mutable
    versus . . . Java.  -- rgr, 28-Jul-08. }}

=head2 Other scalar data types

All Parrot language implementations should stick to native Parrot PMC types
for scalar data, except in case of dire need.  To see with this is so, take
the particular case of integer division, which differs significantly between
languages.

In Tcl, "the integer three divided by the integer five" produces the integer
value 0.

In Perl 5 and Lua, this division produces the floating-point value 0.6.  (This
happens to be Parrot's native behavior as well.)

In Common Lisp, this division produces "3/5", a number of type C<RATIO> with
numerator 3 and denominator 5 that represents the mathematically-exact result.

Furthermore, no Perl 5 code, when given two integers to divide, will expect a
Common Lisp ratio as a result.  Any Perl 5 implementation that does this has a
bug, even if both those integers happen to come from Common Lisp.  Ditto for a
floating-point result from Common Lisp code that happens to get two integers
from Perl or Lua (or both!).

Even though these languages all use "/" to represent division, they do not all
mean the same thing by it, and similarly for most (if not all) other built-in
arithmetic operators.  However, they pretty clearly B<do> mean the same thing
by (e.g.) "the integer with value five," so there is no need to represent the
inputs to these operations differently; they can all be represented by the
same C<Integer> PMC class.

{{ Must also discuss morphing:  If some languages do it and other do not, then
care must be taken at the boundaries.  -- rgr, 31-Jul-08. }}

=head3 Defining new scalar data types

There will be cases where existing Parrot PMC classes cannot represent a
primitive HLL scalar type, and so a new PMC class is required.  In this case,
interoperability cannot be guaranteed, since it may not be possible to define
behavior for such objects in other languages.  But the choice of a new PMC is
forced, so we must make the best of it.

A good case in point is that of complex rational numbers in Common Lisp.  The
C<Complex> type provided by Parrot assumes that its components are
floating-point numbers.  This is a suitable representation type for C<(COMPLEX
REAL)>, but CL partitions "COMPLEX" into C<(COMPLEX REAL)> and C<(COMPLEX
RATIONAL)>, with the latter being further divided into C<(COMPLEX RATIO)>,
C<(COMPLEX INTEGER)>, etc.  The straighforward way to provide this
functionality is to define a C<ComplexRational> PMC that is built on
C<Complex> and has real and imaginary PMC components that are constrained to
be Integer, Bigint, or Ratio PMCs.

So how do we make C<(COMPLEX RATIONAL)> arithmetic work as broadly as
possible?

The first aspect is defining how the new type actually works within its own
language.  The Lisp arithmetic operators will usually return a ComplexRational
if given one, but need to return a RATIONAL subtype if the imaginary part is
zero, and that may not be suitable for other languages, so Lisp needs its own
set of basic arithmetic operators.  We must therefore define methods on these
multis that specialize ComplexRational (and probably the generic arithmetic to
redispatch on the type of the real and imaginary parts; you know the drill).
But, in case we are also passed another operand that is another language's
exotic type, we should take care to use the most general possible class to
specialize the other operands, in the hope that other exotics are subclasses
of these.

The other aspect is extending other languages' arithmetic to do something
reasonable with our exotic types.  If we're lucky, Parrot will provide a basic
multisub that takes care of most cases, and we just need to add method(s) to
that.  If not, we will have to add specialized methods on the other language's
multisub, trying to redispatch to the other language's arithmetic ops passing
the (hopefully more generic) component PMCs.  Doing so is still the
responsibility of the language that defines the exotic class, since it is in
charge of its internal representation.

{{ We can define multimethods on another language without loading it, can't
we?  If not, then making this work may require negotiation between language
implementors, if it is feasible at all.  -- rgr, 31-Jul-08. }}

This brings us to a number of guidelines for defining language-specific
arithmetic so as to maximize interoperability:

=over 4

=item 1.

Define language-specific operations using multimethods (to avoid conflict with
other languages).

=item 2.

Define them on the highest (most general) possible PMC classes (in order that
they continue to work if passed a subclass by a call from a different
language).

=item 3.

Don't define a language-specific PMC class unless there is clear need for a
different internal representation.  (And even then, you might consider
donating it to become part of the Parrot core.)

=back

The rest of this section details exceptions and caveats in dealing with scalar
data types.

=head3 "Fuzzy" scalars

Some languages are willing to coerce strings to numbers and vice versa without
any special action on the part of the programmer and others are not.  The
problem arises when such "fuzzy" scalars are passed (or returned) to languages
that do not support "fuzzy" coercion . . .

{{ This section is meant to answer Geoffrey's "What does Lisp do with a Perl 5
Scalar?" question.  I gotta think about this more.  -- rgr, 29-Jul-08.  }}

=head3 C<Complex> numbers

Not all languages support complex numbers, so if an exported function requires
a complex argument, it should either throw a suitable error, or coerce an
acceptable numeric argument.  In the latter case, be sure to advertise this in
the documentation, so that callers without complex numbers can tell their
compiler that acceptable numeric type.

=head3 C<Ratio> numbers

Not all languages support ratios (rather few, actually), so if an exported
function requires a ratio as an argument, it should either throw a suitable
error, or convert an acceptable numeric value.

However, since ratios are rare (and it is rather eccentric for a program to
insist on a ratio as a parameter), it is strongly advised to accept a floating
point or integer value, and convert it in the wrapper.

    {{ Parrot does not support these yet, so this is not a current issue.  --
    rgr, 28-Jul-08. }}

=head2 Aggregate data types

{{ I probably haven't done these issues justice; I don't know enough Java or
Tcl to grok this part of the list discussion.  -- rgr, 28-Jul-08. }}

Aggregates (hashes, arrays, and struct-like thingies) can either be passed
directly, or mapped by wrapper or caller code into something different.  The
problem with mapping, besides being slow, is that if I<either> the caller or
the callee does this, the aggregate is effectively read-only.  (It is possible
for the wrapper to stuff the changes back in the original structure by side
effect, but this has its own set of problems.)

In other words, if the mapping is not straightforward, it may not be possible.
If the mapping C<is> straightforward it may not be necessary -- and an
unnecessary mapping may limit use of the called module's API.

Struct-like objects are problematic.  They are normally considered as
low-level and language-specific, and handled by emitting special code for slot
accessor/setter function, which other language compilers won't necessarily
know how to do.  The choices are therefore to (a) treat them like black boxes
in the other language, or (b) provide a separate functional or OO API (or
both) for calling from other languages.

Several questions arise for languages with multiple representations for
aggregate types.  Typically, this is because these types are more restricted
in some fashion.  [finish.  -- rgr, 29-Jul-08.]

=head2 Functional data types

In a sense, functional types (i.e. callable objects) are the easiest things to
pass across languages, since they require no mapping at all.  On the other
hand, if a language doesn't support functional arguments, then there is no
hope of using an API written in another language that requires them.

=head2 Datum vs. object

Some languages present everything to the programmer as an object; in such
languages, code only exists in methods.  A few languages have no methods, only
functions (and/or subroutines) and "passive" data.  The remainder have both,
and pose no problem calling into the others.

But how does an obligate OO language call a non-OO language, or vice versa?
An extreme case would be Ruby (which has only objects) and Scheme (which (as
far as Ruby is concerned) has none).  What good is a Ruby object as a datum to
a Scheme program if Scheme can't access any of the methods?  Similarly, what
could Ruby do with a Scheme list when it can't even get to the Scheme C<car>
function?

{{ Methinks the right thing would be to define a common introspection API (a
good thing in its own right).  Scheme and Ruby should each define their own
implementation of the same in "plain Parrot semantics" terms, independently.
The caller can then use his/her language's binding of the introspection API to
poke around in the other module, and find the necessary tools to call the
other.  For Scheme, this would mean functions for finding Ruby classes and
providing functional wrappers around methods.  For Ruby, I admit this would
probably be even wierder.  In any case, it is important that the calling user
not need anything out of the ordinary, from either language or the called
module author.  -- rgr, 29-Jul-08. }}

=head3 Defining methods across language boundaries

{{ Is the term "unimethod" acceptable here?  -- rgr, 29-Jul-08. }}

There will be cases where a module user wants to extend that module by
defining a new method on an externally-defined class, or add a multimethod to
an externally-defined multisub.  Since a class with unimethod dispatch belongs
wholly to the external language, the calling language (i.e. the one adding the
method) must use the semantics of the external language.  If the external
language uses a significantly different metamodel, simply adding the
C<:method> pragma may not cut it.

There are two cases:  (1) The calling language is adding a new method, which
cannot therefore interfere with existing usage in the called language; and (2)
the calling language is attempting to extend an existing interface provided by
the called language.  In the first case, the calling compiler has the option
of treating the new method as part of the calling language, and dispensing
with the glue altogether.  In the second case, the compiler must treat the new
method as part of the foreign language, and provide B<both> glue layers (as
necessary) around it.  It is therefore not expected that all compilers will
provide a way to define methods on all foreign classes for all language pairs.

Multimethods are easier; although the multisub does belong conceptually to one
language (from whose namespace the caller must find the multisub), multis are
more loosely coupled to their original language.

The cases for multimethods are similar, though:  (1) If the calling language
method is specialized to classes that appear only in the calling module, then
other uses of the multisub will never call the new method, and the calling
language can choose to treat as internal.  (2) If the calling method is
specialized only on Parrot or called-language classes, then the compiler
should take care to make it generally usable.

=head3 Subclassing across language boundaries

{{ This is an important feature, but requires compatible metamodels.  -- rgr,
29-Jul-08. }}

=head3 Method vs. multimethod

{{ This is the issue where some languages (e.g. Common Lisp) use only
multimethods, where others (e.g. Ruby) use only unimethods.  (S04 says
something about MMD "falling back" to unimethods, but so far this is not
described in Parrot.)  Calling is easy; multimethods look like functions, so
the MM language just has to create a function (or MM) wrapper for the UM
language, and a UM language can similarly treat a MM call as a normal function
call.  (Which will require the normal "make the function look like a method"
hack for obligate OO languages like Ruby.)  Defining methods across the
boundary is harder, and may not be worth the trouble.  -- rgr, 29-Jul-08. }}

=cut

__END__
Local Variables:
  fill-column:78
End:

Re: [perl #57190] HLL Interoperation

Reply via email to