Re: Threaded Perl bytecode (was: Re: stackless python)

2000-10-25 Thread Chaim Frenkel

> "KF" == Ken Fox <[EMAIL PROTECTED]> writes:

KF> Adam Turoff wrote:
>> when dealing with threaded bytecode is that the threading specifically
>> eliminates the indirection in the name of speed.

KF> Yes. Chaim was saying that for the functions that need indirection,
KF> they could use stubs. You don't need to guess in advance which ones
KF> need indirection because at run-time you can just copy the old code
KF> to a new location and *write over* the old location with a "fetch pointer
KF> and tail call" stub. All bytecode pointers stay the same -- they just
KF> point to the stub now. The only restriction on this technique is that
KF> the no sub body can be smaller than the indirection stub. (We could
KF> easily make a single bytecode op that does a symbol table lookup
KF> and tail call so I don't see any practical restrictions at all.)

We may not even need to copy the body. If the header of the function
is target location, the header could any one of
nop, 
nest another inner loop
lookup current symbol
fixup caller
or jump to new target.

(Hmm, with Q::S, it could be all of them in constant time.)


-- 
Chaim FrenkelNonlinear Knowledge, Inc.
[EMAIL PROTECTED]   +1-718-236-0183



Re: [not quite an RFC] shared bytecode/optree

2000-10-25 Thread Chaim Frenkel

> "BS" == Benjamin Stuhl <[EMAIL PROTECTED]> writes:

BS> My primary goal (it may not have come accross strongly
BS> enough) in this proposal was sharing bytecode between
BS> threads even with an ithreadsish model (variables are
BS> thread-private, except when explicitly shared). This
BS> requires that the bytecode not contain direct pointers to
BS> variables, but rather references with at least one level of
BS> indirection. Avoiding fixups/relocations and allowing
BS> bytecode to be mmap()ed are additional potential benefits.
BS> But my first goal was to not have one copy of each
BS> subroutine in File::Spec::Functions for each thread I run.

If you look back over several of the discussions in -internals, you'll
notice Dan in particular, pointing out that the optree (or it's
replacement) would be 'inviolate'. If for no other reason than to
avoid having to grab mutexes.

The actual disk version of the bytecode, is still way out.

(As a strange aside, This last discussion reminded me of someone's claim
that the IBM 360 executables were actually stored with self-modifying
io instructions. As the pieces were pulled off the disk, the next io
instruction ended up in the right place. Myth?)


-- 
Chaim FrenkelNonlinear Knowledge, Inc.
[EMAIL PROTECTED]   +1-718-236-0183



Re: Threaded Perl bytecode (was: Re: stackless python)

2000-10-25 Thread Mark-Jason Dominus


> > Joshua N Pritikin writes:
> > : http://www.oreillynet.com/pub/a/python/2000/10/04/stackless-intro.html
> > 
> > Perl 5 is already stackless in that sense, though we never implemented
> > continuations.  The main impetus for going stackless was to make it
> > possible to implement a Forth-style treaded code interpreter, though
> > we never put one of those into production either.

There's a large school of thought in the Lisp world that holds that
full continuations are a bad idea.  See for example:

http://www.deja.com/threadmsg_ct.xp?AN=635369657

Executive summary of this article:  

* Continuations are hard to implement and harder to implement
  efficiently.   Languages with continuations tend to be slower
  because of the extreme generality constraints imposed by the
  presence of continuations.

* Typical uses of continuations are for things like exception
  handling.  Nobody really uses continuations because they are too
  difficult to understand.  Exception handling is adequately served by
  simpler and more efficient catch-throw mechanisms which everyone
  already understands.


Anyone seriously interested in putting continuations into Perl 6 would
probably do well to read the entire thread headed by the article I
cited above.




Re: [not quite an RFC] shared bytecode/optree

2000-10-25 Thread Nicholas Clark

On Wed, Oct 25, 2000 at 06:23:20PM +0100, Tom Hughes wrote:
> In message <[EMAIL PROTECTED]>
>   Nicholas Clark <[EMAIL PROTECTED]> wrote:
> 
> > Specific example where you can't:
> > on ARM, the branch instructions (B and BL) are PC relative, but only have
> > a 24 bit offset field. The address space is (now) 32 bit, so there's parts
> > you can't reach without either calculating addresses (in another register)
> > and MOVing them to the PC, or loading the PC from a branch table in memory.
> 
> That is actually a word offset of course, so it can actually reach
> up to 26 bits away in bytes. Still not the full 32 though.

Good point
 
> Of course that only becomes a problem if your program is big enough
> to exceed 26 bits of address space, which is pretty unlikely. That
> or if the program occupies seriously disjoint areas of address space.

Which is likely:

nick@Bagpuss [test]$ uname -a
Linux Bagpuss.unfortu.net 2.2.17-rmk1 #5 Mon Sep 18 19:03:46 BST 2000 armv4l unknown
nick@Bagpuss [test]$ cat mmap.c
#include 
#include 
#include 
#include 
#include 
#include 

int main () {
  int motd = open ("/etc/motd", O_RDONLY);
  void *mapped, *malloced, *big;

  if (motd < 0) {
perror ("Failed to open /etc/motd");
return 1;
  }
  mapped = mmap(NULL, 1024, PROT_EXEC | PROT_READ | PROT_WRITE , MAP_PRIVATE, motd, 0);
  malloced = malloc (1024);
  big = malloc (1024*1024*32);
  printf ("mapped = %p malloced = %p big = %p main = %p\n", mapped, malloced, big, 
&main);
  return 0;
}
nick@Bagpuss [test]$ ./mmap 
mapped = 0x40015000 malloced = 0x2008670 big = 0x40105008 main = 0x200040c

likewise x86

[nick@babyhippo nick]$ ./mmap
mapped = 0x40013000 malloced = 0x80498d0 big = 0x40109008 main = 0x80484a0
[nick@babyhippo nick]$ uname -a
Linux babyhippo.com 2.2.12-20 #1 Mon Sep 27 10:40:35 EDT 1999 i686 unknown

mmap gives you memory from somewhere disjoint. And for some malloc()
implementations (glibc2.1 here, but I've compiled Doug Lea's malloc on
Solaris and HP UX) will call mmap for a large request.
(And at least one out of Solaris and HP UX also gives you pointers greater
than 0x8000 from mmap())

Particularly likely if we're considering mmap()ing bytecode in

Nicholas Clark



Re: [not quite an RFC] shared bytecode/optree

2000-10-25 Thread Tom Hughes

In message <[EMAIL PROTECTED]>
  Nicholas Clark <[EMAIL PROTECTED]> wrote:

> Specific example where you can't:
> on ARM, the branch instructions (B and BL) are PC relative, but only have
> a 24 bit offset field. The address space is (now) 32 bit, so there's parts
> you can't reach without either calculating addresses (in another register)
> and MOVing them to the PC, or loading the PC from a branch table in memory.

That is actually a word offset of course, so it can actually reach
up to 26 bits away in bytes. Still not the full 32 though.

Of course that only becomes a problem if your program is big enough
to exceed 26 bits of address space, which is pretty unlikely. That
or if the program occupies seriously disjoint areas of address space.

Tom

-- 
Tom Hughes ([EMAIL PROTECTED])
http://www.compton.nu/
...Don't believe in astrology. We Scorpios aren't taken in by such things.




Re: Threaded Perl bytecode (was: Re: stackless python)

2000-10-25 Thread Ken Fox

Adam Turoff wrote:
> when dealing with threaded bytecode is that the threading specifically
> eliminates the indirection in the name of speed.

Yes. Chaim was saying that for the functions that need indirection,
they could use stubs. You don't need to guess in advance which ones
need indirection because at run-time you can just copy the old code
to a new location and *write over* the old location with a "fetch pointer
and tail call" stub. All bytecode pointers stay the same -- they just
point to the stub now. The only restriction on this technique is that
the no sub body can be smaller than the indirection stub. (We could
easily make a single bytecode op that does a symbol table lookup
and tail call so I don't see any practical restrictions at all.)

- Ken



Re: [not quite an RFC] shared bytecode/optree

2000-10-25 Thread Benjamin Stuhl

--- Chaim Frenkel <[EMAIL PROTECTED]> wrote:
> > "BS" == Benjamin Stuhl <[EMAIL PROTECTED]> writes:
> 
> BS> 1. Bytecode can just be mmap'ed or read in, no
> playing
> BS> around with relocations on loading or games with RVAs
> BS> (which can't be used anyway, since variable RVAs vary
> based
> BS> on what's been allocated or freed earlier).
> 
> (What is an RVA?)

relative virtual address
 
> And how does the actual runtime use a relocatable
> pointer?  If it is
> an offset, then any access becomes an add. And depending
> upon the
> source of the pointer, it would either be a real address
> or an offset.
> 
> Or if everything is a handle, then each access requires
> two fetches.
> And I don't see where you avoided the relocation. The
> handle table
> that would come in with the bytecode would need to be
> adjusted to
> reflect the real address.
> 
> I vaguly can see a TIL that uses machine code linkage
> (real machine code
> jumps) that perhaps could use relative addressing as not
> needing
> relocation. But I'm not sure that all architectures
> support long enough
> relative jumps/calls.
> 
> Doing the actual relocation should be quite fast. I
> believe that all
> current executables have to be relocated upon loading.
> Not to mention
> the calls to shared modules/dlls.
> 
> 
> -- 
> Chaim Frenkel  Nonlinear Knowledge, Inc.
> [EMAIL PROTECTED] +1-718-236-0183

My primary goal (it may not have come accross strongly
enough) in this proposal was sharing bytecode between
threads even with an ithreadsish model (variables are
thread-private, except when explicitly shared). This
requires that the bytecode not contain direct pointers to
variables, but rather references with at least one level of
indirection. Avoiding fixups/relocations and allowing
bytecode to be mmap()ed are additional potential benefits.
But my first goal was to not have one copy of each
subroutine in File::Spec::Functions for each thread I run.

-- BKS


__
Do You Yahoo!?
Yahoo! Messenger - Talk while you surf!  It's FREE.
http://im.yahoo.com/



Re: [not quite an RFC] shared bytecode/optree

2000-10-25 Thread Nicholas Clark

On Wed, Oct 25, 2000 at 09:45:55AM -0700, Steve Fink wrote:
> Hey, it's finally a use for the 'use less space/use less time' pragma!
> 'use less space' means share the bytecode and either do computed jumps
> or unshared lookup tables; 'use less time' means fixup unshared bytecode
> at load time (or page fault time, or whatever). :-)

I thought so far we'd only had "use more".
I like "use less" and what it offers us:

use English; can be replaced with use less "line noise";  :-)

Nicholas Clark



Re: Special syntax for numeric constants [Was: A tentative list of vtable functions]

2000-10-25 Thread Ken Fox

David Mitchell wrote:
> Well, I was assuming that there would be *a* numeric class in scope
> - as defined be the innermost lexical 'use foo'.

And that numeric class would remove int and num from the scope?

> I assumed that Perl wouldn't be clever enough to know about all available
> numberic types and automatically chose the best representation; rather
> that it was the programmer's responsibilty via 'use' or some other syntax.

Well "some other syntax" leaves it pretty wide open doesn't it. ;) IMHO we
should shoot for "clever enough" (aka DWIM) and fall back to putting the
burden on the programmer if it gets too hard for Perl.

> I'm not familiar with Scheme, I'm afraid.

Scheme directly implements the sets of numbers (the numeric tower):

  integer -> real -> complex

It's complicated by having multiple representations for the numbers:
small fixed point, fixed point, and "big", but the main idea is
that Scheme figures out when to shift both type and representation
automatically. Unfortunately different implementations usually choose
different portions of the numeric tower to implement.

- Ken



Re: Special syntax for numeric constants [Was: A tentativelist of vtable functions]

2000-10-25 Thread Dan Sugalski

At 12:48 PM 10/25/00 -0400, Ken Fox wrote:
>If Larry does what I'm hoping, we'll be able to extend the lexer to
>recognize new number formats and not have to kludge things together with
>strings. Am I reading too much into the Atlanta talk or is that your
>take on it too?

I think you're likely right. The big question will be how easy it is to do 
the extensions to the lexer and parser. That's not trivial work. It might 
also be easier to just deal with constants as strings anyway. They only 
need to be converted once, and the string->variable conversion code will be 
needed anyway.

Dan

--"it's like this"---
Dan Sugalski  even samurai
[EMAIL PROTECTED] have teddy bears and even
  teddy bears get drunk




Re: [not quite an RFC] shared bytecode/optree

2000-10-25 Thread Steve Fink

Hey, it's finally a use for the 'use less space/use less time' pragma!
'use less space' means share the bytecode and either do computed jumps
or unshared lookup tables; 'use less time' means fixup unshared bytecode
at load time (or page fault time, or whatever). :-)



Re: Special syntax for numeric constants [Was: A tentativelist of vtable functions]

2000-10-25 Thread Ken Fox

Dan Sugalski wrote:
> Numeric constants will probably fall into two classes--those perl's parser
> knows about and can convert to, and those it doesn't and just treats as
> strings.

I'm really excited to see what magic Larry is going to cook up for
extending the lexer and parser. His talk made it pretty clear that he
wants to make small languages easy to build from the Perl core.

If Larry does what I'm hoping, we'll be able to extend the lexer to
recognize new number formats and not have to kludge things together with
strings. Am I reading too much into the Atlanta talk or is that your
take on it too?

- Ken



Re: [not quite an RFC] shared bytecode/optree

2000-10-25 Thread Dan Sugalski

At 05:21 PM 10/25/00 +0100, Nicholas Clark wrote:
>On Wed, Oct 25, 2000 at 12:05:22PM -0400, Dan Sugalski wrote:
> > At 05:02 PM 10/25/00 +0100, Nicholas Clark wrote:
> > >On Wed, Oct 25, 2000 at 11:45:54AM -0400, Chaim Frenkel wrote:
> > > > I vaguly can see a TIL that uses machine code linkage (real machine 
> code
> > > > jumps) that perhaps could use relative addressing as not needing
> > > > relocation. But I'm not sure that all architectures support long enough
> > > > relative jumps/calls.
> > >
> > >Specific example where you can't:
> > >on ARM, the branch instructions (B and BL) are PC relative, but only have
> > >a 24 bit offset field. The address space is (now) 32 bit, so there's parts
> > >you can't reach without either calculating addresses (in another register)
> > >and MOVing them to the PC, or loading the PC from a branch table in 
> memory.
> >
> > I think the Alphas can be the same, though I vaguely remember the offset
> > being something like 32 bits. I'm not sure we'd trip over either, but the
> > possibility does exist.
> >
> > No matter what we do we're going to have fixup sections of some sort in 
> the
> > shared code that gets loaded in. There's no real way around that.
>
>"fixup sections" sound horribly like something I've read in association
>with a.out or ELF shared libraries. (I forget which)

Both, though they may call it something else. As far as I know, *everyone* 
who does shared libraries has some sort of runtime fixup that needs to be 
done. I don't think you can get away from it.

Dan

--"it's like this"---
Dan Sugalski  even samurai
[EMAIL PROTECTED] have teddy bears and even
  teddy bears get drunk




Re: [not quite an RFC] shared bytecode/optree

2000-10-25 Thread Nicholas Clark

On Wed, Oct 25, 2000 at 12:28:55PM -0400, Dan Sugalski wrote:
> At 05:21 PM 10/25/00 +0100, Nicholas Clark wrote:
> >"fixup sections" sound horribly like something I've read in association
> >with a.out or ELF shared libraries. (I forget which)
> 
> Both, though they may call it something else. As far as I know, *everyone* 
> who does shared libraries has some sort of runtime fixup that needs to be 
> done. I don't think you can get away from it.

If I understand it correctly the intent (at least on unix and the like)
is to minimise and localise the bits that need fixing per shared file,
so that the maximum amount of file can be mmap()ed in as read only
(makes life simple for virtual memory systems) which also maximises the
amount of physical RAM that's common to multiple processes.

(which is what we'd love - every perl program use()ing POSIX or CGI to
share the op tree for that module with every other perl program on that
machine). It just feels like we might be re-inventing the OS to achieve
this.

Nicholas Clark



Re: [not quite an RFC] shared bytecode/optree

2000-10-25 Thread Nicholas Clark

On Wed, Oct 25, 2000 at 12:05:22PM -0400, Dan Sugalski wrote:
> At 05:02 PM 10/25/00 +0100, Nicholas Clark wrote:
> >On Wed, Oct 25, 2000 at 11:45:54AM -0400, Chaim Frenkel wrote:
> > > I vaguly can see a TIL that uses machine code linkage (real machine code
> > > jumps) that perhaps could use relative addressing as not needing
> > > relocation. But I'm not sure that all architectures support long enough
> > > relative jumps/calls.
> >
> >Specific example where you can't:
> >on ARM, the branch instructions (B and BL) are PC relative, but only have
> >a 24 bit offset field. The address space is (now) 32 bit, so there's parts
> >you can't reach without either calculating addresses (in another register)
> >and MOVing them to the PC, or loading the PC from a branch table in memory.
> 
> I think the Alphas can be the same, though I vaguely remember the offset 
> being something like 32 bits. I'm not sure we'd trip over either, but the 
> possibility does exist.
> 
> No matter what we do we're going to have fixup sections of some sort in the 
> shared code that gets loaded in. There's no real way around that.

"fixup sections" sound horribly like something I've read in association
with a.out or ELF shared libraries. (I forget which)
Aren't we in danger of re-inventing the wheel here?
[with a.out, ELF and XFree86 having done it
http://www.xfree86.org/4.0.1/DESIGN17.html#65
]

Nicholas Clark



Re: [not quite an RFC] shared bytecode/optree

2000-10-25 Thread Dan Sugalski

At 05:02 PM 10/25/00 +0100, Nicholas Clark wrote:
>On Wed, Oct 25, 2000 at 11:45:54AM -0400, Chaim Frenkel wrote:
> > I vaguly can see a TIL that uses machine code linkage (real machine code
> > jumps) that perhaps could use relative addressing as not needing
> > relocation. But I'm not sure that all architectures support long enough
> > relative jumps/calls.
>
>Specific example where you can't:
>on ARM, the branch instructions (B and BL) are PC relative, but only have
>a 24 bit offset field. The address space is (now) 32 bit, so there's parts
>you can't reach without either calculating addresses (in another register)
>and MOVing them to the PC, or loading the PC from a branch table in memory.

I think the Alphas can be the same, though I vaguely remember the offset 
being something like 32 bits. I'm not sure we'd trip over either, but the 
possibility does exist.

No matter what we do we're going to have fixup sections of some sort in the 
shared code that gets loaded in. There's no real way around that.

Dan

--"it's like this"---
Dan Sugalski  even samurai
[EMAIL PROTECTED] have teddy bears and even
  teddy bears get drunk




Re: [not quite an RFC] shared bytecode/optree

2000-10-25 Thread Nicholas Clark

On Wed, Oct 25, 2000 at 11:45:54AM -0400, Chaim Frenkel wrote:
> I vaguly can see a TIL that uses machine code linkage (real machine code
> jumps) that perhaps could use relative addressing as not needing
> relocation. But I'm not sure that all architectures support long enough
> relative jumps/calls.

Specific example where you can't:
on ARM, the branch instructions (B and BL) are PC relative, but only have
a 24 bit offset field. The address space is (now) 32 bit, so there's parts
you can't reach without either calculating addresses (in another register)
and MOVing them to the PC, or loading the PC from a branch table in memory.

Nicholas Clark



Re: [not quite an RFC] shared bytecode/optree

2000-10-25 Thread Chaim Frenkel

> "BS" == Benjamin Stuhl <[EMAIL PROTECTED]> writes:

BS> 1. Bytecode can just be mmap'ed or read in, no playing
BS> around with relocations on loading or games with RVAs
BS> (which can't be used anyway, since variable RVAs vary based
BS> on what's been allocated or freed earlier).

(What is an RVA?)

And how does the actual runtime use a relocatable pointer?  If it is
an offset, then any access becomes an add. And depending upon the
source of the pointer, it would either be a real address or an offset.

Or if everything is a handle, then each access requires two fetches.
And I don't see where you avoided the relocation. The handle table
that would come in with the bytecode would need to be adjusted to
reflect the real address.

I vaguly can see a TIL that uses machine code linkage (real machine code
jumps) that perhaps could use relative addressing as not needing
relocation. But I'm not sure that all architectures support long enough
relative jumps/calls.

Doing the actual relocation should be quite fast. I believe that all
current executables have to be relocated upon loading. Not to mention
the calls to shared modules/dlls.


-- 
Chaim FrenkelNonlinear Knowledge, Inc.
[EMAIL PROTECTED]   +1-718-236-0183



Re: Threaded Perl bytecode (was: Re: stackless python)

2000-10-25 Thread Chaim Frenkel

> "AT" == Adam Turoff <[EMAIL PROTECTED]> writes:

AT> On Tue, Oct 24, 2000 at 10:55:29AM -0400, Chaim Frenkel wrote:
>> I don't see it.
>> 
>> I would find it extremely akward to allow 
>> 
>> thread 1:*foo = \&one_foo;
>> thread 2:*foo = \&other_foo;
>> [...]
>> 
>> copy the &foo body to a new location.
>> replace the old &foo body with an indirection
>> 
>> (I believe this is atomic.)

AT> Actually, that shouldn't be awkward, if both threads have their own
AT> private symbol tables.

As you pointed out below, we lose the use of bytecode threading. As
all lookups need to go through the symbol table. Actually, we probably
lose any pre-compilation wins, since all function lookups need to go
through the symbol table.

AT> In any case, that's a different kind of threading.   IPC threading
AT> has nothing to do with bytecode threading (for the most part).
AT> What you describe here is IPC threading.  Larry was talking about 
AT> bytecode threading.  :-)

No, I was pointing out the interaction between the two. If you want
the two execution threads to be able to have seperate meanings for *foo,
then we need to have seperate symbol tables. If we want them shared then
we need mutexes and dynamic lookups. If we want to have shared optrees
(or threaded bytecode) we need to prevent this or find a usable workaround.

AT> Bytecode threading is a concept pioneered in Forth ~30 years ago.  Forth
AT> compiles incredibly easily into an intermediate representation.  That 
AT> intermediate representation is executed by a very tight interpreter
AT> (on the order of a few dozen instructions) that eliminates the need for 
AT> standard sub calls (push the registers on the stack, push the params
AT> onto the stack and JSR).

The interpreter is optional. There does not need to be an interpreter.
The pointers can be machine level JSR. Which removes the interpreter loop
and runs a machine speed.

AT> Forth works by passing all parameters on the data stack, so there's no
AT> explicit need to do the JSR or save registers.  "Function" calls are done
AT> by adding bytecode that simply says "continue here" where "here" is the
AT> current definition of the sub being called.  As a result, the current 
AT> definition of a function is hard-coded into be the callee when the caller
AT> is compiled. [*]

Err, from my reading there is no come from. The PC is saved on the
execution stack and restored when the inner loop exits the current
nesting level.

Actually, being able to use registers would do wonders to speed up the
calls. I vaguely recall that the sparc has some sort of register windows
and that most of the parameters can be passed in a register. 

(At this point we are at a major porting effort. But the inner loop TIL
would be the easiest to port.)

AT> The problem with 
AT> *main::localtime = \&foo;
AT> *foo = \&bar;
AT> when dealing with threaded bytecode is that the threading specifically 
AT> eliminates the indirection in the name of speed.  Because Perl expects
AT> this kind of re-assignment to be done dynamically, threaded bytecodes
AT> aren't a good fit without accepting a huge bunch of modifications to Perl 
AT> behavior as we know it.  (Using threaded bytecodes as an intermediate
AT> interpretation also confound optimization, since so little of the
AT> context is saved.  Intentionally.)

Actually from my reading one doesn't have to lose it entirely.
All TIL functions have some sort of header. The pointer to a TIL function
points to the first entry/pointer in the function. In the case of
real machine level code, the pointer is to a routine that actually
invokes the function. In the case of a higher level TIL function, the
pointer is to a function that nests the inner loop.

In the event of redirecting a function. This pointer can be redirected
to an appropriate routine that either fixes up the original code or
simply redirects to the new version. (And the old code can be reactivated
easily)

AT> *: Forth is an interactive development environment of sorts and
AT> the "current definition of a sub" may change over time, but the
AT> previously compiled calling functions won't be updated after the
AT> sub is redefined, unless they're recompiled to use the new definition.

The current defintion of a sub doesn't change. Only a new entry in the
dictonary (symbol table) now points at a new body. If the definition is
deleted, the old value reappears.

This is no different than what happens in postscript. One makes the
decision to either do 
/foo { ... } def 
and take the lookup hits, or
/foo { ... } bind def
and locks in the current meanings.


-- 
Chaim FrenkelNonlinear Knowledge, Inc.
[EMAIL PROTECTED]   +1-718-236-0183