Re: Threaded Perl bytecode (was: Re: stackless python)
> "KF" == Ken Fox <[EMAIL PROTECTED]> writes: KF> Adam Turoff wrote: >> when dealing with threaded bytecode is that the threading specifically >> eliminates the indirection in the name of speed. KF> Yes. Chaim was saying that for the functions that need indirection, KF> they could use stubs. You don't need to guess in advance which ones KF> need indirection because at run-time you can just copy the old code KF> to a new location and *write over* the old location with a "fetch pointer KF> and tail call" stub. All bytecode pointers stay the same -- they just KF> point to the stub now. The only restriction on this technique is that KF> the no sub body can be smaller than the indirection stub. (We could KF> easily make a single bytecode op that does a symbol table lookup KF> and tail call so I don't see any practical restrictions at all.) We may not even need to copy the body. If the header of the function is target location, the header could any one of nop, nest another inner loop lookup current symbol fixup caller or jump to new target. (Hmm, with Q::S, it could be all of them in constant time.) -- Chaim FrenkelNonlinear Knowledge, Inc. [EMAIL PROTECTED] +1-718-236-0183
Re: [not quite an RFC] shared bytecode/optree
> "BS" == Benjamin Stuhl <[EMAIL PROTECTED]> writes: BS> My primary goal (it may not have come accross strongly BS> enough) in this proposal was sharing bytecode between BS> threads even with an ithreadsish model (variables are BS> thread-private, except when explicitly shared). This BS> requires that the bytecode not contain direct pointers to BS> variables, but rather references with at least one level of BS> indirection. Avoiding fixups/relocations and allowing BS> bytecode to be mmap()ed are additional potential benefits. BS> But my first goal was to not have one copy of each BS> subroutine in File::Spec::Functions for each thread I run. If you look back over several of the discussions in -internals, you'll notice Dan in particular, pointing out that the optree (or it's replacement) would be 'inviolate'. If for no other reason than to avoid having to grab mutexes. The actual disk version of the bytecode, is still way out. (As a strange aside, This last discussion reminded me of someone's claim that the IBM 360 executables were actually stored with self-modifying io instructions. As the pieces were pulled off the disk, the next io instruction ended up in the right place. Myth?) -- Chaim FrenkelNonlinear Knowledge, Inc. [EMAIL PROTECTED] +1-718-236-0183
Re: Threaded Perl bytecode (was: Re: stackless python)
> > Joshua N Pritikin writes: > > : http://www.oreillynet.com/pub/a/python/2000/10/04/stackless-intro.html > > > > Perl 5 is already stackless in that sense, though we never implemented > > continuations. The main impetus for going stackless was to make it > > possible to implement a Forth-style treaded code interpreter, though > > we never put one of those into production either. There's a large school of thought in the Lisp world that holds that full continuations are a bad idea. See for example: http://www.deja.com/threadmsg_ct.xp?AN=635369657 Executive summary of this article: * Continuations are hard to implement and harder to implement efficiently. Languages with continuations tend to be slower because of the extreme generality constraints imposed by the presence of continuations. * Typical uses of continuations are for things like exception handling. Nobody really uses continuations because they are too difficult to understand. Exception handling is adequately served by simpler and more efficient catch-throw mechanisms which everyone already understands. Anyone seriously interested in putting continuations into Perl 6 would probably do well to read the entire thread headed by the article I cited above.
Re: [not quite an RFC] shared bytecode/optree
On Wed, Oct 25, 2000 at 06:23:20PM +0100, Tom Hughes wrote: > In message <[EMAIL PROTECTED]> > Nicholas Clark <[EMAIL PROTECTED]> wrote: > > > Specific example where you can't: > > on ARM, the branch instructions (B and BL) are PC relative, but only have > > a 24 bit offset field. The address space is (now) 32 bit, so there's parts > > you can't reach without either calculating addresses (in another register) > > and MOVing them to the PC, or loading the PC from a branch table in memory. > > That is actually a word offset of course, so it can actually reach > up to 26 bits away in bytes. Still not the full 32 though. Good point > Of course that only becomes a problem if your program is big enough > to exceed 26 bits of address space, which is pretty unlikely. That > or if the program occupies seriously disjoint areas of address space. Which is likely: nick@Bagpuss [test]$ uname -a Linux Bagpuss.unfortu.net 2.2.17-rmk1 #5 Mon Sep 18 19:03:46 BST 2000 armv4l unknown nick@Bagpuss [test]$ cat mmap.c #include #include #include #include #include #include int main () { int motd = open ("/etc/motd", O_RDONLY); void *mapped, *malloced, *big; if (motd < 0) { perror ("Failed to open /etc/motd"); return 1; } mapped = mmap(NULL, 1024, PROT_EXEC | PROT_READ | PROT_WRITE , MAP_PRIVATE, motd, 0); malloced = malloc (1024); big = malloc (1024*1024*32); printf ("mapped = %p malloced = %p big = %p main = %p\n", mapped, malloced, big, &main); return 0; } nick@Bagpuss [test]$ ./mmap mapped = 0x40015000 malloced = 0x2008670 big = 0x40105008 main = 0x200040c likewise x86 [nick@babyhippo nick]$ ./mmap mapped = 0x40013000 malloced = 0x80498d0 big = 0x40109008 main = 0x80484a0 [nick@babyhippo nick]$ uname -a Linux babyhippo.com 2.2.12-20 #1 Mon Sep 27 10:40:35 EDT 1999 i686 unknown mmap gives you memory from somewhere disjoint. And for some malloc() implementations (glibc2.1 here, but I've compiled Doug Lea's malloc on Solaris and HP UX) will call mmap for a large request. (And at least one out of Solaris and HP UX also gives you pointers greater than 0x8000 from mmap()) Particularly likely if we're considering mmap()ing bytecode in Nicholas Clark
Re: [not quite an RFC] shared bytecode/optree
In message <[EMAIL PROTECTED]> Nicholas Clark <[EMAIL PROTECTED]> wrote: > Specific example where you can't: > on ARM, the branch instructions (B and BL) are PC relative, but only have > a 24 bit offset field. The address space is (now) 32 bit, so there's parts > you can't reach without either calculating addresses (in another register) > and MOVing them to the PC, or loading the PC from a branch table in memory. That is actually a word offset of course, so it can actually reach up to 26 bits away in bytes. Still not the full 32 though. Of course that only becomes a problem if your program is big enough to exceed 26 bits of address space, which is pretty unlikely. That or if the program occupies seriously disjoint areas of address space. Tom -- Tom Hughes ([EMAIL PROTECTED]) http://www.compton.nu/ ...Don't believe in astrology. We Scorpios aren't taken in by such things.
Re: Threaded Perl bytecode (was: Re: stackless python)
Adam Turoff wrote: > when dealing with threaded bytecode is that the threading specifically > eliminates the indirection in the name of speed. Yes. Chaim was saying that for the functions that need indirection, they could use stubs. You don't need to guess in advance which ones need indirection because at run-time you can just copy the old code to a new location and *write over* the old location with a "fetch pointer and tail call" stub. All bytecode pointers stay the same -- they just point to the stub now. The only restriction on this technique is that the no sub body can be smaller than the indirection stub. (We could easily make a single bytecode op that does a symbol table lookup and tail call so I don't see any practical restrictions at all.) - Ken
Re: [not quite an RFC] shared bytecode/optree
--- Chaim Frenkel <[EMAIL PROTECTED]> wrote: > > "BS" == Benjamin Stuhl <[EMAIL PROTECTED]> writes: > > BS> 1. Bytecode can just be mmap'ed or read in, no > playing > BS> around with relocations on loading or games with RVAs > BS> (which can't be used anyway, since variable RVAs vary > based > BS> on what's been allocated or freed earlier). > > (What is an RVA?) relative virtual address > And how does the actual runtime use a relocatable > pointer? If it is > an offset, then any access becomes an add. And depending > upon the > source of the pointer, it would either be a real address > or an offset. > > Or if everything is a handle, then each access requires > two fetches. > And I don't see where you avoided the relocation. The > handle table > that would come in with the bytecode would need to be > adjusted to > reflect the real address. > > I vaguly can see a TIL that uses machine code linkage > (real machine code > jumps) that perhaps could use relative addressing as not > needing > relocation. But I'm not sure that all architectures > support long enough > relative jumps/calls. > > Doing the actual relocation should be quite fast. I > believe that all > current executables have to be relocated upon loading. > Not to mention > the calls to shared modules/dlls. > > > -- > Chaim Frenkel Nonlinear Knowledge, Inc. > [EMAIL PROTECTED] +1-718-236-0183 My primary goal (it may not have come accross strongly enough) in this proposal was sharing bytecode between threads even with an ithreadsish model (variables are thread-private, except when explicitly shared). This requires that the bytecode not contain direct pointers to variables, but rather references with at least one level of indirection. Avoiding fixups/relocations and allowing bytecode to be mmap()ed are additional potential benefits. But my first goal was to not have one copy of each subroutine in File::Spec::Functions for each thread I run. -- BKS __ Do You Yahoo!? Yahoo! Messenger - Talk while you surf! It's FREE. http://im.yahoo.com/
Re: [not quite an RFC] shared bytecode/optree
On Wed, Oct 25, 2000 at 09:45:55AM -0700, Steve Fink wrote: > Hey, it's finally a use for the 'use less space/use less time' pragma! > 'use less space' means share the bytecode and either do computed jumps > or unshared lookup tables; 'use less time' means fixup unshared bytecode > at load time (or page fault time, or whatever). :-) I thought so far we'd only had "use more". I like "use less" and what it offers us: use English; can be replaced with use less "line noise"; :-) Nicholas Clark
Re: Special syntax for numeric constants [Was: A tentative list of vtable functions]
David Mitchell wrote: > Well, I was assuming that there would be *a* numeric class in scope > - as defined be the innermost lexical 'use foo'. And that numeric class would remove int and num from the scope? > I assumed that Perl wouldn't be clever enough to know about all available > numberic types and automatically chose the best representation; rather > that it was the programmer's responsibilty via 'use' or some other syntax. Well "some other syntax" leaves it pretty wide open doesn't it. ;) IMHO we should shoot for "clever enough" (aka DWIM) and fall back to putting the burden on the programmer if it gets too hard for Perl. > I'm not familiar with Scheme, I'm afraid. Scheme directly implements the sets of numbers (the numeric tower): integer -> real -> complex It's complicated by having multiple representations for the numbers: small fixed point, fixed point, and "big", but the main idea is that Scheme figures out when to shift both type and representation automatically. Unfortunately different implementations usually choose different portions of the numeric tower to implement. - Ken
Re: Special syntax for numeric constants [Was: A tentativelist of vtable functions]
At 12:48 PM 10/25/00 -0400, Ken Fox wrote: >If Larry does what I'm hoping, we'll be able to extend the lexer to >recognize new number formats and not have to kludge things together with >strings. Am I reading too much into the Atlanta talk or is that your >take on it too? I think you're likely right. The big question will be how easy it is to do the extensions to the lexer and parser. That's not trivial work. It might also be easier to just deal with constants as strings anyway. They only need to be converted once, and the string->variable conversion code will be needed anyway. Dan --"it's like this"--- Dan Sugalski even samurai [EMAIL PROTECTED] have teddy bears and even teddy bears get drunk
Re: [not quite an RFC] shared bytecode/optree
Hey, it's finally a use for the 'use less space/use less time' pragma! 'use less space' means share the bytecode and either do computed jumps or unshared lookup tables; 'use less time' means fixup unshared bytecode at load time (or page fault time, or whatever). :-)
Re: Special syntax for numeric constants [Was: A tentativelist of vtable functions]
Dan Sugalski wrote: > Numeric constants will probably fall into two classes--those perl's parser > knows about and can convert to, and those it doesn't and just treats as > strings. I'm really excited to see what magic Larry is going to cook up for extending the lexer and parser. His talk made it pretty clear that he wants to make small languages easy to build from the Perl core. If Larry does what I'm hoping, we'll be able to extend the lexer to recognize new number formats and not have to kludge things together with strings. Am I reading too much into the Atlanta talk or is that your take on it too? - Ken
Re: [not quite an RFC] shared bytecode/optree
At 05:21 PM 10/25/00 +0100, Nicholas Clark wrote: >On Wed, Oct 25, 2000 at 12:05:22PM -0400, Dan Sugalski wrote: > > At 05:02 PM 10/25/00 +0100, Nicholas Clark wrote: > > >On Wed, Oct 25, 2000 at 11:45:54AM -0400, Chaim Frenkel wrote: > > > > I vaguly can see a TIL that uses machine code linkage (real machine > code > > > > jumps) that perhaps could use relative addressing as not needing > > > > relocation. But I'm not sure that all architectures support long enough > > > > relative jumps/calls. > > > > > >Specific example where you can't: > > >on ARM, the branch instructions (B and BL) are PC relative, but only have > > >a 24 bit offset field. The address space is (now) 32 bit, so there's parts > > >you can't reach without either calculating addresses (in another register) > > >and MOVing them to the PC, or loading the PC from a branch table in > memory. > > > > I think the Alphas can be the same, though I vaguely remember the offset > > being something like 32 bits. I'm not sure we'd trip over either, but the > > possibility does exist. > > > > No matter what we do we're going to have fixup sections of some sort in > the > > shared code that gets loaded in. There's no real way around that. > >"fixup sections" sound horribly like something I've read in association >with a.out or ELF shared libraries. (I forget which) Both, though they may call it something else. As far as I know, *everyone* who does shared libraries has some sort of runtime fixup that needs to be done. I don't think you can get away from it. Dan --"it's like this"--- Dan Sugalski even samurai [EMAIL PROTECTED] have teddy bears and even teddy bears get drunk
Re: [not quite an RFC] shared bytecode/optree
On Wed, Oct 25, 2000 at 12:28:55PM -0400, Dan Sugalski wrote: > At 05:21 PM 10/25/00 +0100, Nicholas Clark wrote: > >"fixup sections" sound horribly like something I've read in association > >with a.out or ELF shared libraries. (I forget which) > > Both, though they may call it something else. As far as I know, *everyone* > who does shared libraries has some sort of runtime fixup that needs to be > done. I don't think you can get away from it. If I understand it correctly the intent (at least on unix and the like) is to minimise and localise the bits that need fixing per shared file, so that the maximum amount of file can be mmap()ed in as read only (makes life simple for virtual memory systems) which also maximises the amount of physical RAM that's common to multiple processes. (which is what we'd love - every perl program use()ing POSIX or CGI to share the op tree for that module with every other perl program on that machine). It just feels like we might be re-inventing the OS to achieve this. Nicholas Clark
Re: [not quite an RFC] shared bytecode/optree
On Wed, Oct 25, 2000 at 12:05:22PM -0400, Dan Sugalski wrote: > At 05:02 PM 10/25/00 +0100, Nicholas Clark wrote: > >On Wed, Oct 25, 2000 at 11:45:54AM -0400, Chaim Frenkel wrote: > > > I vaguly can see a TIL that uses machine code linkage (real machine code > > > jumps) that perhaps could use relative addressing as not needing > > > relocation. But I'm not sure that all architectures support long enough > > > relative jumps/calls. > > > >Specific example where you can't: > >on ARM, the branch instructions (B and BL) are PC relative, but only have > >a 24 bit offset field. The address space is (now) 32 bit, so there's parts > >you can't reach without either calculating addresses (in another register) > >and MOVing them to the PC, or loading the PC from a branch table in memory. > > I think the Alphas can be the same, though I vaguely remember the offset > being something like 32 bits. I'm not sure we'd trip over either, but the > possibility does exist. > > No matter what we do we're going to have fixup sections of some sort in the > shared code that gets loaded in. There's no real way around that. "fixup sections" sound horribly like something I've read in association with a.out or ELF shared libraries. (I forget which) Aren't we in danger of re-inventing the wheel here? [with a.out, ELF and XFree86 having done it http://www.xfree86.org/4.0.1/DESIGN17.html#65 ] Nicholas Clark
Re: [not quite an RFC] shared bytecode/optree
At 05:02 PM 10/25/00 +0100, Nicholas Clark wrote: >On Wed, Oct 25, 2000 at 11:45:54AM -0400, Chaim Frenkel wrote: > > I vaguly can see a TIL that uses machine code linkage (real machine code > > jumps) that perhaps could use relative addressing as not needing > > relocation. But I'm not sure that all architectures support long enough > > relative jumps/calls. > >Specific example where you can't: >on ARM, the branch instructions (B and BL) are PC relative, but only have >a 24 bit offset field. The address space is (now) 32 bit, so there's parts >you can't reach without either calculating addresses (in another register) >and MOVing them to the PC, or loading the PC from a branch table in memory. I think the Alphas can be the same, though I vaguely remember the offset being something like 32 bits. I'm not sure we'd trip over either, but the possibility does exist. No matter what we do we're going to have fixup sections of some sort in the shared code that gets loaded in. There's no real way around that. Dan --"it's like this"--- Dan Sugalski even samurai [EMAIL PROTECTED] have teddy bears and even teddy bears get drunk
Re: [not quite an RFC] shared bytecode/optree
On Wed, Oct 25, 2000 at 11:45:54AM -0400, Chaim Frenkel wrote: > I vaguly can see a TIL that uses machine code linkage (real machine code > jumps) that perhaps could use relative addressing as not needing > relocation. But I'm not sure that all architectures support long enough > relative jumps/calls. Specific example where you can't: on ARM, the branch instructions (B and BL) are PC relative, but only have a 24 bit offset field. The address space is (now) 32 bit, so there's parts you can't reach without either calculating addresses (in another register) and MOVing them to the PC, or loading the PC from a branch table in memory. Nicholas Clark
Re: [not quite an RFC] shared bytecode/optree
> "BS" == Benjamin Stuhl <[EMAIL PROTECTED]> writes: BS> 1. Bytecode can just be mmap'ed or read in, no playing BS> around with relocations on loading or games with RVAs BS> (which can't be used anyway, since variable RVAs vary based BS> on what's been allocated or freed earlier). (What is an RVA?) And how does the actual runtime use a relocatable pointer? If it is an offset, then any access becomes an add. And depending upon the source of the pointer, it would either be a real address or an offset. Or if everything is a handle, then each access requires two fetches. And I don't see where you avoided the relocation. The handle table that would come in with the bytecode would need to be adjusted to reflect the real address. I vaguly can see a TIL that uses machine code linkage (real machine code jumps) that perhaps could use relative addressing as not needing relocation. But I'm not sure that all architectures support long enough relative jumps/calls. Doing the actual relocation should be quite fast. I believe that all current executables have to be relocated upon loading. Not to mention the calls to shared modules/dlls. -- Chaim FrenkelNonlinear Knowledge, Inc. [EMAIL PROTECTED] +1-718-236-0183
Re: Threaded Perl bytecode (was: Re: stackless python)
> "AT" == Adam Turoff <[EMAIL PROTECTED]> writes: AT> On Tue, Oct 24, 2000 at 10:55:29AM -0400, Chaim Frenkel wrote: >> I don't see it. >> >> I would find it extremely akward to allow >> >> thread 1:*foo = \&one_foo; >> thread 2:*foo = \&other_foo; >> [...] >> >> copy the &foo body to a new location. >> replace the old &foo body with an indirection >> >> (I believe this is atomic.) AT> Actually, that shouldn't be awkward, if both threads have their own AT> private symbol tables. As you pointed out below, we lose the use of bytecode threading. As all lookups need to go through the symbol table. Actually, we probably lose any pre-compilation wins, since all function lookups need to go through the symbol table. AT> In any case, that's a different kind of threading. IPC threading AT> has nothing to do with bytecode threading (for the most part). AT> What you describe here is IPC threading. Larry was talking about AT> bytecode threading. :-) No, I was pointing out the interaction between the two. If you want the two execution threads to be able to have seperate meanings for *foo, then we need to have seperate symbol tables. If we want them shared then we need mutexes and dynamic lookups. If we want to have shared optrees (or threaded bytecode) we need to prevent this or find a usable workaround. AT> Bytecode threading is a concept pioneered in Forth ~30 years ago. Forth AT> compiles incredibly easily into an intermediate representation. That AT> intermediate representation is executed by a very tight interpreter AT> (on the order of a few dozen instructions) that eliminates the need for AT> standard sub calls (push the registers on the stack, push the params AT> onto the stack and JSR). The interpreter is optional. There does not need to be an interpreter. The pointers can be machine level JSR. Which removes the interpreter loop and runs a machine speed. AT> Forth works by passing all parameters on the data stack, so there's no AT> explicit need to do the JSR or save registers. "Function" calls are done AT> by adding bytecode that simply says "continue here" where "here" is the AT> current definition of the sub being called. As a result, the current AT> definition of a function is hard-coded into be the callee when the caller AT> is compiled. [*] Err, from my reading there is no come from. The PC is saved on the execution stack and restored when the inner loop exits the current nesting level. Actually, being able to use registers would do wonders to speed up the calls. I vaguely recall that the sparc has some sort of register windows and that most of the parameters can be passed in a register. (At this point we are at a major porting effort. But the inner loop TIL would be the easiest to port.) AT> The problem with AT> *main::localtime = \&foo; AT> *foo = \&bar; AT> when dealing with threaded bytecode is that the threading specifically AT> eliminates the indirection in the name of speed. Because Perl expects AT> this kind of re-assignment to be done dynamically, threaded bytecodes AT> aren't a good fit without accepting a huge bunch of modifications to Perl AT> behavior as we know it. (Using threaded bytecodes as an intermediate AT> interpretation also confound optimization, since so little of the AT> context is saved. Intentionally.) Actually from my reading one doesn't have to lose it entirely. All TIL functions have some sort of header. The pointer to a TIL function points to the first entry/pointer in the function. In the case of real machine level code, the pointer is to a routine that actually invokes the function. In the case of a higher level TIL function, the pointer is to a function that nests the inner loop. In the event of redirecting a function. This pointer can be redirected to an appropriate routine that either fixes up the original code or simply redirects to the new version. (And the old code can be reactivated easily) AT> *: Forth is an interactive development environment of sorts and AT> the "current definition of a sub" may change over time, but the AT> previously compiled calling functions won't be updated after the AT> sub is redefined, unless they're recompiled to use the new definition. The current defintion of a sub doesn't change. Only a new entry in the dictonary (symbol table) now points at a new body. If the definition is deleted, the old value reappears. This is no different than what happens in postscript. One makes the decision to either do /foo { ... } def and take the lookup hits, or /foo { ... } bind def and locks in the current meanings. -- Chaim FrenkelNonlinear Knowledge, Inc. [EMAIL PROTECTED] +1-718-236-0183