At 03:00 PM 1/22/2003 -0500, you wrote:
Okay, since this has all come up, here's the scoop from a design perspective.

First, the branch opcodes (branch, bsr, and the conditionals) are all meant for movement within a segment of bytecode. They are *not* supposed to leave a segment. To do so was arguably a bad idea, now it's officially an error. If you need to do so, branch to an op that can transfer across boundaries.

Design Edict #1: Branches, which is any transfer of control that takes an offset, may *not* escape the current bytecode segment.
Seems reasonable. Especially when they bytecode loader may not guarantee the relative placement of segments (think mmap()). Although,
all this would seem to suggest that we'd need/want a special-purpose allocator for bytecode segments, since every sub has to fit within precisely
one segment (and I know _I'd_ like to keep bytecode segments on their own memory pages, to e.g. maximize sharing on fork()).

Next, jumps. Jumps take absolute addresses, so either need fixup at load time (blech), are only valid in dynamically generated code (okay, but limiting), or can only jump to values in registers (that's fine). Jumps aren't a problem in general.
Fixups aren't so bad if we make the jump opcode itself take an index into a table of fixups (thus letting the bytecode stream stay read-only). Register jumps
are dangerous, since parrot can't control what the user code loads into the register (while we can theoretically protect the fixup table from anything short of
native code).

Design Edict #2: Jumps may go anywhere.

Destinations. These are a pain, since if we can go anywhere then the JIT has to do all sorts of nasty and unpleasant things to compensate, and to make every op a valid destination. Yuck.

Design Edict #3: All destinations *must* be marked as such in the bytecode metadata segment. (I am officially nervous about this, as I can see a number of ways to subvert this for evil)
Marked destinations are very important; as for evil subversion, how about just saying "untrusted code only gets pure interpretation, and the untrusting interpreter bounds-checks everything"?

[snip]
Calling actual routines--subs, methods, functions, whatever--at the high level isn't done with branches or jumps. It is, instead, done with the call series of ops. (call, callmeth, callcc, tailcall, tailcallmeth, tailcallcc (though that one makes my head hurt), invoke) These are specifically for calling code that's potentially in other segments, and to call into them at fixed points. I think these need to be hashed out a bit to make them more JIT-friendly, but they're the primary transfer destination point

Design Edict #6: The first op in a sub is always a valid jump/branch/control transfer destination
Wouldn't make much sense if you had a sub but couldn't call it, now would it? :-D

Now. Eval. The compile opcode going in is phenomenally cool (thanks, Leo!) but has pointed out some holes in the semantics. I got handwavey and, well, it shows. No cookie for me.

The compreg op should compile the passed code in the language that is indicated and should load that bytecode into the current interpreter. That means that if there are any symbols that get installed because someone's defined a sub then, well, they should get installed into the interpreter's symbol tables.

Compiled code is an interesting thing. In some cases it should return a sub PMC, in some cases it should execute and return a value, and in some cases it should install a bunch of stuff in a symbol table and then return a value. These correspond to:


eval "print 12";

$foo = eval "sub bar{return 1;}";

require foo.pm;

respectively. It's sort of a mixed bag, and unfortunately we can't count on the code doing the compilation to properly handle the semantics of the language being compiled. So...

Design Edict #7: the compreg opcode will execute the compiled code, calling in with parrot's calling conventions. If it should return something, then it had darned well better build it and return it.
How does this play with

eval 'sub bar { change_foo(); } BEGIN { bar(); } (...stuff that depends on foo...)';

? The semantics of BEGIN{} would seem to require that bar be installed into the symbol table immediately... but then how do we reproduce that if we're e.g. loading
precompiled bytecode?

Oh, and:

Design Edict #8: compreg is prototyped. It takes a single string and must return a single PMC. The compiler may cheat as need be. (No need to check and see if it returned a string, or an int)

Yes, this does mean that for plain assembly that we want to compile and return a sub ref for we need to do extra in the assembly we pass in. Tough, we can deal. If it was dead-simple it wouldn't be assembly. :)
That makes sense.

-- BKS

Reply via email to