On Tue, Aug 10, 2004 at 10:45:43AM -0700, Brent 'Dax' Royal-Gordon wrote:

: I would assume (hope) that these tables would not be allowed to change
: once Parrot started using them. It seems like an extremely dangerous
: thing to have two calls to read() be performed by different functions,
: after all.


Why is it any more dangerous than allowing a function like read() to read from sockets, strings, memory, file handles, special devices, or anything else the user can dream up than can support the read() interface?

Furthermore, look at the benefits in terms of the ease with which it becomes possible to transparently install debugging hooks and logging and security handlers (among many other types) in legacy code that is retargeted to the Parrot VM.
The way that I got involved in this thread was actually because of some language in Dan's 8/4/2004-11:23 AM email "Re: Unicode Support - ICU Optional" in which the statement "3) We make Parrot's string system use the loadable encoding and charset system " got me thinking about the loadability-of-low-level-functions issue.


As a question for my own understanding, how low-level is the encoding API supposed to be? Is it at the same level as read(), malloc(), and friends? In one of Dan's more recent emails, he hit the nail on the head when he said: "The issue here is whether we let the interpreter swap out what's the equivalent of functions in the C runtime." Are the charset and encoding APIs at the level of the C runtime? I was under the impression that they were, since the C runtime spends a lot of time dealing with strings and with string encoding.

If that's so, and if as, according to the first quote, the encoding and charset system needs to be loadable, I merely asked if we were planning on making a general framework (i.e. macros, or pre-processor magic) to support this kind of ability to store multiple versions of low-level implemented-in-C functions.

There were two motivations for this question. One issue was "how are we actually going to do the loading and the function resolution" which, I feel, still needs to be addressed. (If I just missed the explanation, since I joined the list relatively recently, could someone send me a link?)

The second was if we've got a table with one function pointer in it, why not make it easy to chain functions together so that we can get the benefits of AOP-like functionality not only for operations done at C compile-time but for operations at any time.

Here's what I was thinking: given that we're using a table of function pointers, its easy enough when compiling Parrot to substitute a chain-of-functions for any of the low-level calls in the table. However, since this is done at compile time, it's very difficult to add and remove options, or, if you're in a multi-vendor environment, it might be very, very hard to make code (that you don't have the source for) behave properly with this kind of solution. (You run into a similar kind of problem in C++ with multiple inheritance that policy classes and template metaprogramming give you a nice way to solve. But anyway:)

So it's easy, under ideal circumstances, to make special parrot builds that do what I'm talking about, if we're using these tables of function pointers for low-level things, as is implied by the demand to have loadable charset/encoding abilities, even if we don't do it with explicit chains of functions, but just by writing our own chain in a single function, compiling it, and stuffing its address in the table slot.

However, Parrot is about being able to do lots of cool stuff at runtime without sacrificing speed, right? Furthermore, it wants to do this in a really portable way. It seemed to me that there might well be lots of situations where it would be great to have this ability to chain handlers for these functions but where recompiling parrot to support it was probably not possible, or as I point out with the situation where multiple vendors want to install hooks, where things get downright nasty at compile-time but are fine at runtime.

So the second issue that I'm asking is "if we're going to have function tables for all of this low-level stuff, can we at least chain things at runtime AS WELL as at compile time, since compile time is obviously made possible simply by the fact that we're using function tables but runtime would (it seems to me) solve a number of other things-that-people-will-surely-want-to-do." This question is orthogonal from an implementation perspective to the question of which parts of the "C runtime", whatever that includes, will be accessed through function tables. My point was that if we're going to use function tables for anything this low-level, let's build it the right way and get this really cool feature out of it.

Also, to reiterate a point, while I'm afraid that I'm not terribly familiar with Perl's .wrap/.unwrap functions, as Dan suggests, the real issue is "what C APIs are we going to access through function tables?".

Hope this clarifies the issues a bit, as I see them.
Also, it looks like there's lots of room here to compromise. It just seems to me that this should be stuff that the C compiler or the preprocessor can optimize into nothingness if it's not desired, but writing it in such a way that it's possible has lots of long-term benefits and may clarify some of the charset/encoding implementation issues by enforcing uniformity on the way that all or most functions at that level are accessed that would save us some grief in that area and let us concentrate more whole-heartedly on the major Unicode issues we face, while simultaneously making the embedders happy by giving them the opportunity to hook into anything they might need to.


Thoughts, criticism, and improvements are most welcome.

Michael

Reply via email to