Attached is a write-up of the status and design ideas in the new
embed_api branch that we've been working on. I would like to talk
about this branch work at PDS, and would like to both solicit feedback
and get new developers interested in helping with the effort.
The new Embedding API that we have written is a subset of what it
really should be. We've basically written only enough so that the
Parrot executable and some of the other utility programs (pbc_merge,
pbc_disassemble, pbc_to_exe) can use it exclusively. There is more to
write, which can be done up-front or on a per-request basis from new
embedding applications.
The branch is currently failing a few tests, especially those
involving the formatting of error messages. The embed_api branch is
going to fix all these test failures before merging, so that we can
avoid a deprecation boundary, so many of those tests really become
moot anyway.
I look forward to seeing everybody at PDS, and talking about this work.
--Andrew Whitworth
On Sunday at PDS I would like to take some time to talk about the new Embedding
API work I (and bluescreen) have been doing in the embed_api and embed_api2
branches. This writeup is going to act like a primer for the conversation so
everybody can come to the meeting knowing exactly what I am trying to do, and
everybody can take the time to look over the code and generate some feedback.
The new Embedding API is being created as a new layer over existing functions.
The current functions in src/embed.c and elsewhere will still be exported, and
I am not adjusting the behavior. If you have an embedding application that uses
the old API, that application should continue to work in the presence of the
new API. I think everybody will want to upgrade when they see the new
capabilities, but that's just a hunch. Once the embed_api2 branch is approved
and merged, we can talk about deprecating the older interface, among other
things (which I strongly recommend). First I'll talk about the general design
of the new API, and then I will talk about some of the specific changes made to
the Parrot internals.
The new embedding API is located entirely in the src/embed/ directory. The new
header file is "include/parrot/api.h". api.h is to be used ONLY by the
embedding applications, and should be the only header file used. In the new
system, the embedding app should NEVER #include "parrot/parrot.h". Likewise,
internal development should never use api.h, because it is only intended for
embedding applications. All API functions are named "Parrot_api_*", and all of
them are decorated with the "PARROT_API" macro (currently defined identically
to PARROT_EXPORT, but it may change).
All API functions, without exception, have the following form:
PARROT_API
Parrot_Int
Parrot_api_some_func(Parrot_PMC interp_pmc, <args>, <&returns>)
{
EMBED_API_CALLIN(interp_pmc, interp)
... Logic goes here ...
EMBED_API_CALLOUT(interp_pmc, interp)
}
There are several things to notice here. First, every API function without
exception returns a Parrot_Int (INTVAL, for you internals-junkies). This is a
status flag that indicates normal success or some other kind of situation. For
instance, normal behavior is to "exit 0" from a PIR application, or a
".return()" from a Parrot Sub executed directly. An "exit 1" indicates a
non-normal return, but it's not necessarily an error. Any other kind of
unhandled exception is an error.
Once an API function returns 0, we can use the Parrot_api_get_result function
to gather information about it. That function has the following signature:
PARROT_API
Parrot_Int
Parrot_api_get_result(Parrot_PMC interp, Parrot_Int *is_error, Parrot_PMC
*exception, Parrot_Int *exit_code, Parrot_String *errmsg);
It contains a flag to say whether we have an error condition (typically an
unhandled exception that isn't an EXCEPT_exit). It also returns the Exception
PMC. As a convenience, it currently returns the exit_code value from the PMC
and also it's string message.
Because all API functions return a boolean success, we can chain them together
in embedding applications, and share error-handling routines:
if (Parrot_api_do_one_thing(...) && Parrot_api_do_another_thing) {
...
} else if (Parrot_api_get_result(...)) {
... Handle Error ...
}
Notice also that the Parrot_api_get_result function returns a boolean. If that
fails, we're in a pretty catastrophic situation and can't even get information
about the error that caused us to crash.
As a design decision, I have been trying to use the 4 native Parrot types
exclusively in API function signatures: Parrot_PMC, Parrot_String, Parrot_Int,
and Parrot_Float. There are a handful of places where this doesn't make sense,
especially during interpreter initialization and when taking C strings from the
embedding application. Places where the input string is likely available as a
constant for instance, or where it is only used once, it didn't make sense to
me to force the user to wrap it up into a Parrot_String. We can talk about
those kinds of details if people are interested.
Another thing to point out is that I never use a raw Parrot_Interp pointer
directly. Interpreters are always passed as ParrotInterpreter PMCs. Since the
interpreter structure is considered opaque, and since the ParrotInterpreter PMC
has a number of useful methods that the embedding app may want to use, this
seemed to me to be the most natural choice. It also opens the possibility that
the embedding application could substitute in a *subclass* of ParrotInterpreter
to get some custom behavior. I don't think we support ParrotInterpreter
subclasses in the API yet, but if people are interested in the feature it
shouldn't be too hard to add.
I've used the new API in most of the executables in the embed_api branch. If
you look at src/main.c, src/pbc_disassemble.c, src/merge.c, or the fakecutables
generated by pbc_to_exe, you'll see the new embedding API in action. There are
a handful of places where it needs work, of course, but we have plenty of time
to sort out any issues.
As for the internal design of the system, several things have changed to enable
this new API. I'll list them for convenience.
1) The interpreter configuration hash is now set as a PMC, instead of as a raw
stream of bytes. Also, the config hash can be set at any time after the
interpreter is created (instead of having to be set before the first
interpreter is created), and we can assign a new config hash to each new
interpreter created. Where a config hash is not provided, some sane default
values are provided. Internally the config hash is mostly used to set up search
paths.
Since the config hash can be set as a PMC by the embedding application, there's
no real reason why it would have to be a Hash at all. The embedding application
has complete freedom to set anything they want here (including an HLL-friendly
subclass).
As a caveat, once the config hash is set on the interpreter, there is currently
no good clean way to change it. That is, you can change the PMC itself, but
there is no good way to undo the changes made to the library search paths
array. If this is a feature that people want we can work on it, but for now it
seems like a non-issue to me.
2) Similarly to the config hash, the command-line args (available as
IGLOBALS_ARGV_ARRAY in the interpreter) are no longer static. The arguments to
:main can be any arbitrary PMC, and are passed as an argument to the new
Parrot_api_run_bytecode function. You can set this value fresh on every call to
Parrot_api_run_bytecode so individual interpreters in your program can all take
different argument PMCs. There's no real reason why you would have to pass your
:main function an array of strings either, if you don't want. It can be any PMC
type, including a Hash or an HLL-friendly subclass.
3) Parrot_exit no longer calls "exit()". The new API sets a jump point (the
EMBED_API_CALLIN and EMBED_API_CALLOUT macros handle all this). When you "exit"
the interpreter's program you jump immediately back to the API call and return
the status information back to the embedding application. In fact, Parrot_exit
(recently renamed to Parrot_x_exit) should no longer be called in most
situations. Parrot_x_exit runs several exit handlers, some of which may be
destructive (like finalizing GC). Parrot_x_exit should now only be called in
conjunction with Parrot_destroy when we are actually cleaning up the
interpreter and not planning to execute anything else with it.
There is now a function "Parrot_x_jump_out" in src/exit.c that should be used
most times when you want to "exit" your program. This returns control directly
to the embedding application and communicates the current status.
4) die_from_exception, the fallback when we throw an exception but cannot find
any handlers for it, no longer prints error or backtrace information to stderr.
All the necessary information is packaged up in the Exception and passed back
to the embedding application through the Parrot_api_get_result function. The
embedding application can disect that PMC and handle all the necessary output
operations. Besides debugging situations, it's my goal that libparrot should
NEVER use fprintf to communicate error information directly to the user.
libparrot should communicate with the embedding application, and that
application is in charge of interfacing with the user.
5) the longopt family of functions in src/longopt.c is not linked in with
libparrot anymore. We do still compile the object file, and embedding
applications (like parrot.exe) may link with it if they want it. Some of the
code changes here were a little bit ugly. We are still trying to tease out some
of the argument processing code from IMCC for instance.
6) We still have a ways to go with this, but IMCC no longer executes the :main
program directly. Instead, it returns a PBC PMC (currently an UnManagedStruct
with a pointer to a PackFile structure). The user can use the
Parrot_api_run_bytecode routine to run the PBC PMC. There is also a
Parrot_api_load_bytecode_file
that can be used to load in a pre-compiled bytecode file to get the PBC PMC,
and then run it from there. This creates the opportunity that any front-end
which can produce a PBC PMC of some form can be used in place of IMCC.
The goal of the embedding API work is to start approaching a new vision for
what Parrot could be. We should really be thinking about Parrot as two parts:
libparrot (a language-agnostic bytecode interpreter and runtime) and the Parrot
executable (the IMCC PIR/PASM front-end for libparrot). The Parrot executable
now embeds libparrot using the new embedding API. And if the Parrot executable
can do it, anybody else can too. The idea is that any application can embed
libparrot and use it to execute any bytecode without making assumptions about
the language that the program was written in, and without including all sorts
of infrastructure that the application does not need. We don't assume that code
came from IMCC. We don't assume we are using conventions from PIR/PASM. We give
control to the embedding application to set the environment and all inputs,
which makes for a much more flexibile and powerful tool.
At the time of writing this, the embed_api2 branch has seen a few major changes
and is failing some tests. I'm going to get it fixed up again tonight so people
can start playing with it if they want. I would very much like to get some
feedback this week and during PDS so I can focus my efforts to make this work
acceptable to the community and hopefully get it merged before 3.0.
_______________________________________________
http://lists.parrot.org/mailman/listinfo/parrot-dev