Re: Transferring control between code segments, eval, and suchlike things

Juergen Boemmels Thu, 23 Jan 2003 13:27:04 -0800

Dan Sugalski <[EMAIL PROTECTED]> writes:

> Okay, since this has all come up, here's the scoop from a design perspective.
> 
> First, the branch opcodes (branch, bsr, and the conditionals) are all
> meant for movement within a segment of bytecode. They are *not*
> supposed to leave a segment. To do so was arguably a bad idea, now
> it's officially an error. If you need to do so, branch to an op that
> can transfer across boundaries.
> 
> 
> Design Edict #1: Branches, which is any transfer of control that takes
> an offset, may *not* escape the current bytecode segment.


Okay with that.

> Next, jumps. Jumps take absolute addresses, so either need fixup at
> load time (blech), are only valid in dynamically generated code (okay,
> but limiting), or can only jump to values in registers (that's
> fine). Jumps aren't a problem in general.
> 
> 
> Design Edict #2: Jumps may go anywhere.

In the sense that every possible target (via #3) can be reached with a
jump, but bad things may happen if target isnt valid.

> Destinations. These are a pain, since if we can go anywhere then the
> JIT has to do all sorts of nasty and unpleasant things to compensate,
> and to make every op a valid destination. Yuck.
> 
> 
> Design Edict #3: All destinations *must* be marked as such in the
> bytecode metadata segment. (I am officially nervous about this, as I
> can see a number of ways to subvert this for evil)

This is not more or less evil than 
branch -1
The destinations can be rangechecked at load time, the assembler will
hopefully emit these offsets correct, and they will be read-only after
compilation.

> I'm only keeping jumps (and their corresponding jsr) around for
> nostalgic reasons, and with the vague hope they may be useful. I'm not
> sure about this.
> 
> 
> Design Edict #4: Dan is officially iffy on jumps, but can see them as
> useful for lower-level statically bound languages such as forth,
> Scheme, or C.
> 
> 
> That leads us to
> 
> Design Edict #5: Dan will accommodate semantics for languages outside
> the core set (perl, python, ruby) only if they don't compromise
> performance for the core set.
> 
> 
> Calling actual routines--subs, methods, functions, whatever--at the
> high level isn't done with branches or jumps. It is, instead, done
> with the call series of ops. (call, callmeth, callcc, tailcall,
> tailcallmeth, tailcallcc (though that one makes my head hurt), invoke)
> These are specifically for calling code that's potentially in other
> segments, and to call into them at fixed points. I think these need to
> be hashed out a bit to make them more JIT-friendly, but they're the
> primary transfer destination point

This calls are allways jumps or jsr in disguise. In the end they
always do a goto ADDRESS(something). These means that every
sub/method/continuation must be marked by #3

> Design Edict #6: The first op in a sub is always a valid
> jump/branch/control transfer destination

This is the essentally #3

> Now. Eval. The compile opcode going in is phenomenally cool (thanks,
> Leo!) but has pointed out some holes in the semantics. I got handwavey
> and, well, it shows. No cookie for me.
> 
> 
> The compreg op should compile the passed code in the language that is
> indicated and should load that bytecode into the current
> interpreter. That means that if there are any symbols that get
> installed because someone's defined a sub then, well, they should get
> installed into the interpreter's symbol tables.

Not the compile would install the symbols in the interpreters symbol
table, it would store it somewhere in the bytecode metadata. The eval
should install this in the interpreters symboltable.

The problem really starts if BEGIN {...} blocks are used because they
will be evaluated after the block compiled but before the whole
compile is finished.

> Compiled code is an interesting thing. In some cases it should return
> a sub PMC, in some cases it should execute and return a value, and in
> some cases  it should install a bunch of stuff in a symbol table and
> then return a value. These correspond to:
> 
> 
> 
>     eval "print 12";
> 
>     $foo = eval "sub bar{return 1;}";
> 
>     require foo.pm;
> 
> respectively. It's sort of a mixed bag, and unfortunately we can't
> count on the code doing the compilation to properly handle the
> semantics of the language being compiled. So...
> 
> 
> Design Edict #7: the compreg opcode will execute the compiled code,
> calling in with parrot's calling conventions. If it should return
> something, then it had darned well better build it and return it.

I find it better to leave compile and eval seperate.
The compile opcode should simply return a bytecode-PMC which then can
be invoked sometimes later.

> Oh, and:
> 
> Design Edict #8: compreg is prototyped. It takes a single string and
> must return a single PMC. The compiler may cheat as need be. (No need
> to check and see if it returned a string, or an int)

It should return a bytecodesegment.
 
> Yes, this does mean that for plain assembly that we want to compile
> and return a sub ref for we need to do extra in the assembly we pass
> in. Tough, we can deal. If it was dead-simple it wouldn't be
> assembly. :)

The assembler in assembly might be very simple:

open P0, "infile", "r"
read S0, P0, filesize
close P0
compile P0, S0
open P1, "outfile", "w"
puts P1, P0
close P1
end

The hard part is all hidden in the compile opcode.

bye
boe.
-- 
Juergen Boemmels                        [EMAIL PROTECTED]
Fachbereich Physik                      Tel: ++49-(0)631-205-2817
Universitaet Kaiserslautern             Fax: ++49-(0)631-205-3906
PGP Key fingerprint = 9F 56 54 3D 45 C1 32 6F  23 F6 C7 2F 85 93 DD 47

Re: Transferring control between code segments, eval, and suchlike things

Reply via email to