embed_api discussion at PDS

Andrew Whitworth Fri, 03 Dec 2010 08:15:45 -0800

Attached is a write-up of the status and design ideas in the new
embed_api branch that we've been working on. I would like to talk
about this branch work at PDS, and would like to both solicit feedback
and get new developers interested in helping with the effort.


The new Embedding API that we have written is a subset of what it
really should be. We've basically written only enough so that the
Parrot executable and some of the other utility programs (pbc_merge,
pbc_disassemble, pbc_to_exe) can use it exclusively. There is more to
write, which can be done up-front or on a per-request basis from new
embedding applications.

The branch is currently failing a few tests, especially those
involving the formatting of error messages. The embed_api branch is
going to fix all these test failures before merging, so that we can
avoid a deprecation boundary, so many of those tests really become
moot anyway.

I look forward to seeing everybody at PDS, and talking about this work.

--Andrew Whitworth

On Sunday at PDS I would like to take some time to talk about the new Embedding 
API work I (and bluescreen) have been doing in the embed_api and embed_api2 
branches. This writeup is going to act like a primer for the conversation so 
everybody can come to the meeting knowing exactly what I am trying to do, and 
everybody can take the time to look over the code and generate some feedback.

The new Embedding API is being created as a new layer over existing functions. 
The current functions in src/embed.c and elsewhere will still be exported, and 
I am not adjusting the behavior. If you have an embedding application that uses 
the old API, that application should continue to work in the presence of the 
new API. I think everybody will want to upgrade when they see the new 
capabilities, but that's just a hunch. Once the embed_api2 branch is approved 
and merged, we can talk about deprecating the older interface, among other 
things (which I strongly recommend). First I'll talk about the general design 
of the new API, and then I will talk about some of the specific changes made to 
the Parrot internals.

The new embedding API is located entirely in the src/embed/ directory. The new 
header file is "include/parrot/api.h". api.h is to be used ONLY by the 
embedding applications, and should be the only header file used. In the new 
system, the embedding app should NEVER #include "parrot/parrot.h". Likewise, 
internal development should never use api.h, because it is only intended for 
embedding applications. All API functions are named "Parrot_api_*", and all of 
them are decorated with the "PARROT_API" macro (currently defined identically 
to PARROT_EXPORT, but it may change).

All API functions, without exception, have the following form:

PARROT_API
Parrot_Int
Parrot_api_some_func(Parrot_PMC interp_pmc, <args>, <&returns>)
{
    EMBED_API_CALLIN(interp_pmc, interp)
    ... Logic goes here ...
    EMBED_API_CALLOUT(interp_pmc, interp)
}

There are several things to notice here. First, every API function without 
exception returns a Parrot_Int (INTVAL, for you internals-junkies). This is a 
status flag that indicates normal success or some other kind of situation. For 
instance, normal behavior is to "exit 0" from a PIR application, or a 
".return()" from a Parrot Sub executed directly. An "exit 1" indicates a 
non-normal return, but it's not necessarily an error. Any other kind of 
unhandled exception is an error.

Once an API function returns 0, we can use the Parrot_api_get_result function 
to gather information about it. That function has the following signature:

PARROT_API
Parrot_Int
Parrot_api_get_result(Parrot_PMC interp, Parrot_Int *is_error, Parrot_PMC 
*exception, Parrot_Int *exit_code, Parrot_String *errmsg);

It contains a flag to say whether we have an error condition (typically an 
unhandled exception that isn't an EXCEPT_exit). It also returns the Exception 
PMC. As a convenience, it currently returns the exit_code value from the PMC 
and also it's string message.

Because all API functions return a boolean success, we can chain them together 
in embedding applications, and share error-handling routines:

if (Parrot_api_do_one_thing(...) && Parrot_api_do_another_thing) {
    ...
} else if (Parrot_api_get_result(...)) {
    ... Handle Error ...
}

Notice also that the Parrot_api_get_result function returns a boolean. If that 
fails, we're in a pretty catastrophic situation and can't even get information 
about the error that caused us to crash.

As a design decision, I have been trying to use the 4 native Parrot types 
exclusively in API function signatures: Parrot_PMC, Parrot_String, Parrot_Int, 
and Parrot_Float. There are a handful of places where this doesn't make sense, 
especially during interpreter initialization and when taking C strings from the 
embedding application. Places where the input string is likely available as a 
constant for instance, or where it is only used once, it didn't make sense to 
me to force the user to wrap it up into a Parrot_String. We can talk about 
those kinds of details if people are interested.

Another thing to point out is that I never use a raw Parrot_Interp pointer 
directly. Interpreters are always passed as ParrotInterpreter PMCs. Since the 
interpreter structure is considered opaque, and since the ParrotInterpreter PMC 
has a number of useful methods that the embedding app may want to use, this 
seemed to me to be the most natural choice. It also opens the possibility that 
the embedding application could substitute in a *subclass* of ParrotInterpreter 
to get some custom behavior. I don't think we support ParrotInterpreter 
subclasses in the API yet, but if people are interested in the feature it 
shouldn't be too hard to add.

I've used the new API in most of the executables in the embed_api branch. If 
you look at src/main.c, src/pbc_disassemble.c, src/merge.c, or the fakecutables 
generated by pbc_to_exe, you'll see the new embedding API in action. There are 
a handful of places where it needs work, of course, but we have plenty of time 
to sort out any issues.

As for the internal design of the system, several things have changed to enable 
this new API. I'll list them for convenience.

1) The interpreter configuration hash is now set as a PMC, instead of as a raw 
stream of bytes. Also, the config hash can be set at any time after the 
interpreter is created (instead of having to be set before the first 
interpreter is created), and we can assign a new config hash to each new 
interpreter created. Where a config hash is not provided, some sane default 
values are provided. Internally the config hash is mostly used to set up search 
paths. 

Since the config hash can be set as a PMC by the embedding application, there's 
no real reason why it would have to be a Hash at all. The embedding application 
has complete freedom to set anything they want here (including an HLL-friendly 
subclass). 

As a caveat, once the config hash is set on the interpreter, there is currently 
no good clean way to change it.  That is, you can change the PMC itself, but 
there is no good way to undo the changes made to the library search paths 
array. If this is a feature that people want we can work on it, but for now it 
seems like a non-issue to me.

2) Similarly to the config hash, the command-line args (available as 
IGLOBALS_ARGV_ARRAY in the interpreter) are no longer static. The arguments to 
:main can be any arbitrary PMC, and are passed as an argument to the new 
Parrot_api_run_bytecode function. You can set this value fresh on every call to 
Parrot_api_run_bytecode so individual interpreters in your program can all take 
different argument PMCs. There's no real reason why you would have to pass your 
:main function an array of strings either, if you don't want. It can be any PMC 
type, including a Hash or an HLL-friendly subclass.

3) Parrot_exit no longer calls "exit()". The new API sets a jump point (the 
EMBED_API_CALLIN and EMBED_API_CALLOUT macros handle all this). When you "exit" 
the interpreter's program you jump immediately back to the API call and return 
the status information back to the embedding application. In fact, Parrot_exit 
(recently renamed to Parrot_x_exit) should no longer be called in most 
situations. Parrot_x_exit runs several exit handlers, some of which may be 
destructive (like finalizing GC). Parrot_x_exit should now only be called in 
conjunction with Parrot_destroy when we are actually cleaning up the 
interpreter and not planning to execute anything else with it. 

There is now a function "Parrot_x_jump_out" in src/exit.c that should be used 
most times when you want to "exit" your program. This returns control directly 
to the embedding application and communicates the current status.

4) die_from_exception, the fallback when we throw an exception but cannot find 
any handlers for it, no longer prints error or backtrace information to stderr. 
All the necessary information is packaged up in the Exception and passed back 
to the embedding application through the Parrot_api_get_result function. The 
embedding application can disect that PMC and handle all the necessary output 
operations. Besides debugging situations, it's my goal that libparrot should 
NEVER use fprintf to communicate error information directly to the user. 
libparrot should communicate with the embedding application, and that 
application is in charge of interfacing with the user.

5) the longopt family of functions in src/longopt.c is not linked in with 
libparrot anymore. We do still compile the object file, and embedding 
applications (like parrot.exe) may link with it if they want it. Some of the 
code changes here were a little bit ugly. We are still trying to tease out some 
of the argument processing code from IMCC for instance.

6) We still have a ways to go with this, but IMCC no longer executes the :main 
program directly. Instead, it returns a PBC PMC (currently an UnManagedStruct 
with a pointer to a PackFile structure). The user can use the 
Parrot_api_run_bytecode routine to run the PBC PMC. There is also a 
Parrot_api_load_bytecode_file 
that can be used to load in a pre-compiled bytecode file to get the PBC PMC, 
and then run it from there. This creates the opportunity that any front-end 
which can produce a PBC PMC of some form can be used in place of IMCC.

The goal of the embedding API work is to start approaching a new vision for 
what Parrot could be. We should really be thinking about Parrot as two parts: 
libparrot (a language-agnostic bytecode interpreter and runtime) and the Parrot 
executable (the IMCC PIR/PASM front-end for libparrot). The Parrot executable 
now embeds libparrot using the new embedding API. And if the Parrot executable 
can do it, anybody else can too. The idea is that any application can embed 
libparrot and use it to execute any  bytecode without making assumptions about 
the language that the program was written in, and without including all sorts 
of infrastructure that the application does not need. We don't assume that code 
came from IMCC. We don't assume we are using conventions from PIR/PASM. We give 
control to the embedding application to set the environment and all inputs, 
which makes for a much more flexibile and powerful tool.

At the time of writing this, the embed_api2 branch has seen a few major changes 
and is failing some tests. I'm going to get it fixed up again tonight so people 
can start playing with it if they want. I would very much like to get some 
feedback this week and during PDS so I can focus my efforts to make this work 
acceptable to the community and hopefully get it merged before 3.0.

_______________________________________________
http://lists.parrot.org/mailman/listinfo/parrot-dev

embed_api discussion at PDS

Reply via email to