Re: Transferring control between code segments, eval, and suchlike things
On Thu, Jan 23, 2003 at 12:11:20AM -0500, Dan Sugalski wrote: > Every sub doesn't have to fit in a single segment, though. There may > well be a half-zillion subs in any one segment. (Though one segment > per sub does give us some interesting possibilities for GCing unused > code) For an interpreter that is allowing eval (or a namespace that isn't locked against eval) I think that you could only GC the old definition of redefined subroutines, and any anonymous subroutines that become unreferenced. Anything else is the potential lucky destination of a random future eval. Nicholas Clark
Re: Transferring control between code segments, eval, and suchlike things
Dan Sugalski <[EMAIL PROTECTED]> writes: > Okay, since this has all come up, here's the scoop from a design perspective. > > First, the branch opcodes (branch, bsr, and the conditionals) are all > meant for movement within a segment of bytecode. They are *not* > supposed to leave a segment. To do so was arguably a bad idea, now > it's officially an error. If you need to do so, branch to an op that > can transfer across boundaries. > > > Design Edict #1: Branches, which is any transfer of control that takes > an offset, may *not* escape the current bytecode segment. Okay with that. > Next, jumps. Jumps take absolute addresses, so either need fixup at > load time (blech), are only valid in dynamically generated code (okay, > but limiting), or can only jump to values in registers (that's > fine). Jumps aren't a problem in general. > > > Design Edict #2: Jumps may go anywhere. In the sense that every possible target (via #3) can be reached with a jump, but bad things may happen if target isnt valid. > Destinations. These are a pain, since if we can go anywhere then the > JIT has to do all sorts of nasty and unpleasant things to compensate, > and to make every op a valid destination. Yuck. > > > Design Edict #3: All destinations *must* be marked as such in the > bytecode metadata segment. (I am officially nervous about this, as I > can see a number of ways to subvert this for evil) This is not more or less evil than branch -1 The destinations can be rangechecked at load time, the assembler will hopefully emit these offsets correct, and they will be read-only after compilation. > I'm only keeping jumps (and their corresponding jsr) around for > nostalgic reasons, and with the vague hope they may be useful. I'm not > sure about this. > > > Design Edict #4: Dan is officially iffy on jumps, but can see them as > useful for lower-level statically bound languages such as forth, > Scheme, or C. > > > That leads us to > > Design Edict #5: Dan will accommodate semantics for languages outside > the core set (perl, python, ruby) only if they don't compromise > performance for the core set. > > > Calling actual routines--subs, methods, functions, whatever--at the > high level isn't done with branches or jumps. It is, instead, done > with the call series of ops. (call, callmeth, callcc, tailcall, > tailcallmeth, tailcallcc (though that one makes my head hurt), invoke) > These are specifically for calling code that's potentially in other > segments, and to call into them at fixed points. I think these need to > be hashed out a bit to make them more JIT-friendly, but they're the > primary transfer destination point This calls are allways jumps or jsr in disguise. In the end they always do a goto ADDRESS(something). These means that every sub/method/continuation must be marked by #3 > Design Edict #6: The first op in a sub is always a valid > jump/branch/control transfer destination This is the essentally #3 > Now. Eval. The compile opcode going in is phenomenally cool (thanks, > Leo!) but has pointed out some holes in the semantics. I got handwavey > and, well, it shows. No cookie for me. > > > The compreg op should compile the passed code in the language that is > indicated and should load that bytecode into the current > interpreter. That means that if there are any symbols that get > installed because someone's defined a sub then, well, they should get > installed into the interpreter's symbol tables. Not the compile would install the symbols in the interpreters symbol table, it would store it somewhere in the bytecode metadata. The eval should install this in the interpreters symboltable. The problem really starts if BEGIN {...} blocks are used because they will be evaluated after the block compiled but before the whole compile is finished. > Compiled code is an interesting thing. In some cases it should return > a sub PMC, in some cases it should execute and return a value, and in > some cases it should install a bunch of stuff in a symbol table and > then return a value. These correspond to: > > > > eval "print 12"; > > $foo = eval "sub bar{return 1;}"; > > require foo.pm; > > respectively. It's sort of a mixed bag, and unfortunately we can't > count on the code doing the compilation to properly handle the > semantics of the language being compiled. So... > > > Design Edict #7: the compreg opcode will execute the compiled code, > calling in with parrot's calling conventions. If it should return > something, then it had darned well better build it and return it. I find it better to leave compile and eval seperate. The compile opcode should simply return a bytecode-PMC which then can be invoked sometimes later. > Oh, and: > > Design Edict #8: compreg is prototyped. It takes a single string and > must return a single PMC. The compiler may cheat as need be. (No need > to check and see if it returned a string, or an int) It should return a bytecodesegment. > Yes, this
Re: Transferring control between code segments, eval, and suchlike things
On Wed, Jan 22, 2003 at 03:00:37PM -0500, Dan Sugalski wrote: > Destinations. These are a pain, since if we can go anywhere then the > JIT has to do all sorts of nasty and unpleasant things to compensate, > and to make every op a valid destination. Yuck. Arbitrary jumps are not that difficult to deal with in the JIT. The JIT compiler can handle jumps to arbitrary addresses by falling back into the interpreter if the destination does not coincide with a previously known entry point, reentering the JIT code later at a safe point. pbc2c generated code does this. This way the JIT does not have to support making every instruction a safe branch destination. -- Jason
Re: Transferring control between code segments, eval, and suchlike things
Benjamin Stuhl wrote: At 03:00 PM 1/22/2003 -0500, you wrote: ... Although, all this would seem to suggest that we'd need/want a special-purpose allocator for bytecode segments, since every sub has to fit within precisely one segment (and I know _I'd_ like to keep bytecode segments on their own memory pages, to e.g. maximize sharing on fork()). IMHO this is a big waste of memory - and running this page aligned code JITted doesn't buy anything. Design Edict #7: the compreg opcode will execute the compiled code, calling in with parrot's calling conventions. If it should return something, then it had darned well better build it and return it. How does this play with eval 'sub bar { change_foo(); } BEGIN { bar(); } (...stuff that depends on foo...)'; ? The semantics of BEGIN{} would seem to require that bar be installed into the symbol table immediately... but then how do we reproduce that if we're e.g. loading precompiled bytecode? Precompiled PBC and eval is a PITA. This issue seems to imply some extra parsing during load time and setting up symbols. I dunno yet, how to handle this. leo
Re: Transferring control between code segments, eval, and suchlike things
Dan Sugalski wrote: Okay, since this has all come up, here's the scoop from a design perspective. Hard stuff did meet my printer at midnight, reading it onscreen twice didn't help ;-) First: Definition #0: A bytecode segment is a sequence of code, which is loaded into memory with no execution of such code intersparsed. So all subs, modules, whatever loaded from zig files may be one code segment, *if* the runloop wasn't entered. Or: as soon as the code is running, loading additional bytecode puts this code into a different bytecode segment. Design Edict #1: Branches, which is any transfer of control that takes an offset, may *not* escape the current bytecode segment. Design Edict #2: Jumps may go anywhere. Design Edict #3: All destinations *must* be marked as such in the bytecode metadata segment. (I am officially nervous about this, as I can see a number of ways to subvert this for evil) I would define: Jumps may go to any location aquired per set_addr call or to branch tables. Jumping somewhere else may kill your dog. Jumping to a set_addr label is recognized already, jump tables may probably need some marker around them, so that the jump targets won't get killed by dead code elimination. I'm only keeping jumps (and their corresponding jsr) around for nostalgic reasons, and with the vague hope they may be useful. I'm not sure about this. They would be useful for a computed goto. s/compreg/compile/g for($below); The compreg op should compile the passed code ... Design Edict #7: the compreg opcode will execute the compiled code, calling in with parrot's calling conventions. If it should return something, then it had darned well better build it and return it. If the compile opcode has to execute the code, I would call it "eval". But: When compile and eval are separate stages, the HL might be able to pull the compile stage out of e.g. loops. So I think keeping compiling and evaling separate makes sense. Thanks for putting this together, leo
Re: Transferring control between code segments, eval, and suchlike things
At 6:24 PM -0500 1/22/03, Benjamin Stuhl wrote: At 03:00 PM 1/22/2003 -0500, you wrote: Okay, since this has all come up, here's the scoop from a design perspective. First, the branch opcodes (branch, bsr, and the conditionals) are all meant for movement within a segment of bytecode. They are *not* supposed to leave a segment. To do so was arguably a bad idea, now it's officially an error. If you need to do so, branch to an op that can transfer across boundaries. Design Edict #1: Branches, which is any transfer of control that takes an offset, may *not* escape the current bytecode segment. Seems reasonable. Especially when they bytecode loader may not guarantee the relative placement of segments (think mmap()). Although, all this would seem to suggest that we'd need/want a special-purpose allocator for bytecode segments, since every sub has to fit within precisely one segment (and I know _I'd_ like to keep bytecode segments on their own memory pages, to e.g. maximize sharing on fork()). Every sub doesn't have to fit in a single segment, though. There may well be a half-zillion subs in any one segment. (Though one segment per sub does give us some interesting possibilities for GCing unused code) Next, jumps. Jumps take absolute addresses, so either need fixup at load time (blech), are only valid in dynamically generated code (okay, but limiting), or can only jump to values in registers (that's fine). Jumps aren't a problem in general. Fixups aren't so bad if we make the jump opcode itself take an index into a table of fixups (thus letting the bytecode stream stay read-only). Register jumps are dangerous, since parrot can't control what the user code loads into the register (while we can theoretically protect the fixup table from anything short of native code). Indirection. Ick. :) Though, on the other hand, a jump with an integer constant destination is pretty pointless, so we could consider using that to index into a jump table. OTOH, it'd be the only thing using the jump table, so I'm not sure it's worth it. Might speed things up some. I'll think on that for a bit. Design Edict #2: Jumps may go anywhere. Destinations. These are a pain, since if we can go anywhere then the JIT has to do all sorts of nasty and unpleasant things to compensate, and to make every op a valid destination. Yuck. Design Edict #3: All destinations *must* be marked as such in the bytecode metadata segment. (I am officially nervous about this, as I can see a number of ways to subvert this for evil) Marked destinations are very important; as for evil subversion, how about just saying "untrusted code only gets pure interpretation, and the untrusting interpreter bounds-checks everything"? True, and we'll not be JITting safe-mode code, or likely not at least because of the resource constraint checking. [snip] Calling actual routines--subs, methods, functions, whatever--at the high level isn't done with branches or jumps. It is, instead, done with the call series of ops. (call, callmeth, callcc, tailcall, tailcallmeth, tailcallcc (though that one makes my head hurt), invoke) These are specifically for calling code that's potentially in other segments, and to call into them at fixed points. I think these need to be hashed out a bit to make them more JIT-friendly, but they're the primary transfer destination point Design Edict #6: The first op in a sub is always a valid jump/branch/control transfer destination Wouldn't make much sense if you had a sub but couldn't call it, now would it? :-D Don't tempt the JAPHers! Now. Eval. The compile opcode going in is phenomenally cool (thanks, Leo!) but has pointed out some holes in the semantics. I got handwavey and, well, it shows. No cookie for me. The compreg op should compile the passed code in the language that is indicated and should load that bytecode into the current interpreter. That means that if there are any symbols that get installed because someone's defined a sub then, well, they should get installed into the interpreter's symbol tables. Compiled code is an interesting thing. In some cases it should return a sub PMC, in some cases it should execute and return a value, and in some cases it should install a bunch of stuff in a symbol table and then return a value. These correspond to: eval "print 12"; $foo = eval "sub bar{return 1;}"; require foo.pm; respectively. It's sort of a mixed bag, and unfortunately we can't count on the code doing the compilation to properly handle the semantics of the language being compiled. So... Design Edict #7: the compreg opcode will execute the compiled code, calling in with parrot's calling conventions. If it should return something, then it had darned well better build it and return it. How does this play with eval 'sub bar { change_foo(); } BEGIN { bar(); } (...stuff that depends on foo...)'; ? The semantics of BEGIN{} would seem to require that bar be in
Re: Transferring control between code segments, eval, and suchlike things
At 03:00 PM 1/22/2003 -0500, you wrote: Okay, since this has all come up, here's the scoop from a design perspective. First, the branch opcodes (branch, bsr, and the conditionals) are all meant for movement within a segment of bytecode. They are *not* supposed to leave a segment. To do so was arguably a bad idea, now it's officially an error. If you need to do so, branch to an op that can transfer across boundaries. Design Edict #1: Branches, which is any transfer of control that takes an offset, may *not* escape the current bytecode segment. Seems reasonable. Especially when they bytecode loader may not guarantee the relative placement of segments (think mmap()). Although, all this would seem to suggest that we'd need/want a special-purpose allocator for bytecode segments, since every sub has to fit within precisely one segment (and I know _I'd_ like to keep bytecode segments on their own memory pages, to e.g. maximize sharing on fork()). Next, jumps. Jumps take absolute addresses, so either need fixup at load time (blech), are only valid in dynamically generated code (okay, but limiting), or can only jump to values in registers (that's fine). Jumps aren't a problem in general. Fixups aren't so bad if we make the jump opcode itself take an index into a table of fixups (thus letting the bytecode stream stay read-only). Register jumps are dangerous, since parrot can't control what the user code loads into the register (while we can theoretically protect the fixup table from anything short of native code). Design Edict #2: Jumps may go anywhere. Destinations. These are a pain, since if we can go anywhere then the JIT has to do all sorts of nasty and unpleasant things to compensate, and to make every op a valid destination. Yuck. Design Edict #3: All destinations *must* be marked as such in the bytecode metadata segment. (I am officially nervous about this, as I can see a number of ways to subvert this for evil) Marked destinations are very important; as for evil subversion, how about just saying "untrusted code only gets pure interpretation, and the untrusting interpreter bounds-checks everything"? [snip] Calling actual routines--subs, methods, functions, whatever--at the high level isn't done with branches or jumps. It is, instead, done with the call series of ops. (call, callmeth, callcc, tailcall, tailcallmeth, tailcallcc (though that one makes my head hurt), invoke) These are specifically for calling code that's potentially in other segments, and to call into them at fixed points. I think these need to be hashed out a bit to make them more JIT-friendly, but they're the primary transfer destination point Design Edict #6: The first op in a sub is always a valid jump/branch/control transfer destination Wouldn't make much sense if you had a sub but couldn't call it, now would it? :-D Now. Eval. The compile opcode going in is phenomenally cool (thanks, Leo!) but has pointed out some holes in the semantics. I got handwavey and, well, it shows. No cookie for me. The compreg op should compile the passed code in the language that is indicated and should load that bytecode into the current interpreter. That means that if there are any symbols that get installed because someone's defined a sub then, well, they should get installed into the interpreter's symbol tables. Compiled code is an interesting thing. In some cases it should return a sub PMC, in some cases it should execute and return a value, and in some cases it should install a bunch of stuff in a symbol table and then return a value. These correspond to: eval "print 12"; $foo = eval "sub bar{return 1;}"; require foo.pm; respectively. It's sort of a mixed bag, and unfortunately we can't count on the code doing the compilation to properly handle the semantics of the language being compiled. So... Design Edict #7: the compreg opcode will execute the compiled code, calling in with parrot's calling conventions. If it should return something, then it had darned well better build it and return it. How does this play with eval 'sub bar { change_foo(); } BEGIN { bar(); } (...stuff that depends on foo...)'; ? The semantics of BEGIN{} would seem to require that bar be installed into the symbol table immediately... but then how do we reproduce that if we're e.g. loading precompiled bytecode? Oh, and: Design Edict #8: compreg is prototyped. It takes a single string and must return a single PMC. The compiler may cheat as need be. (No need to check and see if it returned a string, or an int) Yes, this does mean that for plain assembly that we want to compile and return a sub ref for we need to do extra in the assembly we pass in. Tough, we can deal. If it was dead-simple it wouldn't be assembly. :) That makes sense. -- BKS
Transferring control between code segments, eval, and suchlike things
Okay, since this has all come up, here's the scoop from a design perspective. First, the branch opcodes (branch, bsr, and the conditionals) are all meant for movement within a segment of bytecode. They are *not* supposed to leave a segment. To do so was arguably a bad idea, now it's officially an error. If you need to do so, branch to an op that can transfer across boundaries. Design Edict #1: Branches, which is any transfer of control that takes an offset, may *not* escape the current bytecode segment. Next, jumps. Jumps take absolute addresses, so either need fixup at load time (blech), are only valid in dynamically generated code (okay, but limiting), or can only jump to values in registers (that's fine). Jumps aren't a problem in general. Design Edict #2: Jumps may go anywhere. Destinations. These are a pain, since if we can go anywhere then the JIT has to do all sorts of nasty and unpleasant things to compensate, and to make every op a valid destination. Yuck. Design Edict #3: All destinations *must* be marked as such in the bytecode metadata segment. (I am officially nervous about this, as I can see a number of ways to subvert this for evil) I'm only keeping jumps (and their corresponding jsr) around for nostalgic reasons, and with the vague hope they may be useful. I'm not sure about this. Design Edict #4: Dan is officially iffy on jumps, but can see them as useful for lower-level statically bound languages such as forth, Scheme, or C. That leads us to Design Edict #5: Dan will accommodate semantics for languages outside the core set (perl, python, ruby) only if they don't compromise performance for the core set. Calling actual routines--subs, methods, functions, whatever--at the high level isn't done with branches or jumps. It is, instead, done with the call series of ops. (call, callmeth, callcc, tailcall, tailcallmeth, tailcallcc (though that one makes my head hurt), invoke) These are specifically for calling code that's potentially in other segments, and to call into them at fixed points. I think these need to be hashed out a bit to make them more JIT-friendly, but they're the primary transfer destination point Design Edict #6: The first op in a sub is always a valid jump/branch/control transfer destination Now. Eval. The compile opcode going in is phenomenally cool (thanks, Leo!) but has pointed out some holes in the semantics. I got handwavey and, well, it shows. No cookie for me. The compreg op should compile the passed code in the language that is indicated and should load that bytecode into the current interpreter. That means that if there are any symbols that get installed because someone's defined a sub then, well, they should get installed into the interpreter's symbol tables. Compiled code is an interesting thing. In some cases it should return a sub PMC, in some cases it should execute and return a value, and in some cases it should install a bunch of stuff in a symbol table and then return a value. These correspond to: eval "print 12"; $foo = eval "sub bar{return 1;}"; require foo.pm; respectively. It's sort of a mixed bag, and unfortunately we can't count on the code doing the compilation to properly handle the semantics of the language being compiled. So... Design Edict #7: the compreg opcode will execute the compiled code, calling in with parrot's calling conventions. If it should return something, then it had darned well better build it and return it. Oh, and: Design Edict #8: compreg is prototyped. It takes a single string and must return a single PMC. The compiler may cheat as need be. (No need to check and see if it returned a string, or an int) Yes, this does mean that for plain assembly that we want to compile and return a sub ref for we need to do extra in the assembly we pass in. Tough, we can deal. If it was dead-simple it wouldn't be assembly. :) I think that's it. Let's have at it and see where the edicts need fixing. -- Dan --"it's like this"--- Dan Sugalski even samurai [EMAIL PROTECTED] have teddy bears and even teddy bears get drunk