Re: Lexicals, continuations, and register allocation

2004-11-24 Thread Ken Fox
Leopold Toetsch wrote:
   +-+--+
   | ctp | interpreter state|
   +-+--+
  |
  ++
   |
  +--+-+---+--+
  | prev | ctx |  lexicals | volatiles|
  +--+-+---+--+ 
   | p a r r o t   r e g i s t e r s  |
A very strong architecture for sure.
> + no lexical fetch overhead
That alone is worth the price of admission. No register allocator
needed because the HLL only needs volatiles for anonymous temporaries
which are easily allocated during expression parsing.
I would make a couple changes:
   +-+--+
   | ctp | interpreter state|
   +-+--+
  |
  +--+--+-+---+--+
  | caller's out | prev | ctx |  lexicals | volatiles|
  +--+--+-+---+--+
Merge with the variable sized stack frame proposal and expose
prev+ctx as registers. As long as an architecture change is on the
table, might as well make it a doozy. (Caller's out may need to be
padded if caller did not pass the right number of args.)
Sorry for the de-lurk. I'll now return to hacking on my storage to
storage VM which looks amazingly similar to Leo's VM. ;)
- Ken


Re: [netlabs #801] [PATCH] PerlArray in scalar context

2002-09-05 Thread Ken Fox

Brent Dax wrote:
>  Keep in mind how much it could inflate the bytecode if
> we render a ton of generic, N-dimensional hyper-operator logic into
> bytecode.

The main problem I see is that you could spend minutes executing
inside that hyper op. Doesn't that screw with the plan for putting
the event handler in the dispatch loop? It's not very friendly for
debugging either -- how can the user set a breakpoint on one of
the hyped calls?

The bytecode generated for hyper can't be any larger than the
equivalent loops in Perl 5. If there isn't any problem with byte
code size now, why would there be a problem in Perl 6?

- Ken




Re: On writing JITs

2002-08-03 Thread Ken Fox

Jason Gloudon wrote:
> http://www.ddj.com/ftp/2001/2001_07/aa0701.txt
> 
> I believe the LOOKUP method was the fastest for me on SPARC, if I recall
> correctly.

Did they really spend 64K to create a lookup table just to find
the most significant bit? Calculating log2 for a power of two is
simpler -- all you need to find is the one set bit. If you really
want to use lookup tables, just use an 8 byte table and shift
by 4 bits until a non-zero is found. (Change constants to taste.)

- Ken




Re: abs_i_i but no abs_i

2002-08-03 Thread Ken Fox

Nicholas Clark wrote:
> But there do seem already to be arguably duplicate 2 operand versions of
> many ops. Hence I was surprised at the lack of consistency.

Right. I suspect that as people get more experience with the new
Perl 6 compiler the 2 operand ops will go away (or vice versa).

At the very least, ops will be ordered so that frequent ops are
nearby each other. If an infrequent op doesn't save more CPU time
than blowing the instruction cache wastes, it's a net loss.

- Ken




Re: On writing JITs

2002-08-03 Thread Ken Fox

Nicholas Clark wrote:
> It seems that foo & (foo - 1) is zero only for a power of 2 (or foo == 0) 
> but is there a fast way once you know that foo is a power of 2, to find out
> log2 foo?

You're right about (foo & (foo -1)).

gcc uses a repeated test and shift. That's works very nicely if foo
is small -- and I bet it usually is. If you want to protect yourself
from the cases where foo is large, you can unroll the loop 2 or 3 times
and then use a binary search.

I've attached two simple implementations I wrote. Neither log2()
implementation is from gcc, so don't worry about GPL contamination.

- Ken

#include 
#include 

int simple_log2(int foo) {
 int log = 0;

 assert((foo & (foo - 1)) == 0);

 while (foo) {
++log;
foo >>= 1;
 }

 --log;

 return log;
}

int unrolled_log2(int foo) {
 int log = 0;
 int mid = 4 * sizeof(foo);

 assert(foo != 0);
 assert((foo & (foo - 1)) == 0);

 foo >>= 1;
 if (foo == 0) goto done;
 foo >>= 1; ++log;
 if (foo == 0) goto done;
 foo >>= 1; ++log;
 if (foo == 0) goto done;

 do {
if (foo >= (1 << mid)) {
foo >>= mid;
log += mid;
}
mid >>= 1;
 }
 while (mid);

 ++log;

done:

 return log;
}

int main(int argc, char *argv[]) {

#if 1

 int i = 1;
 while (i < (1 << 30)) {
printf("log2(%d) == %d\n", i, unrolled_log2(i));
i <<= 1;
 }

#else

 /* simple timing loop */

 int i = 1000;
 volatile int j;
 while (i > 0) {
j = unrolled_log2(1 << 29);
j = unrolled_log2(2);
j = simple_log2(1 << 29);
j = simple_log2(2);
--i;
 }

#endif

 return 0;
}




Re: abs_i_i but no abs_i

2002-08-03 Thread Ken Fox

Nicholas Clark wrote:
> I can write more a efficient implementation of abs_i ... things will
> go slightly faster

The law of diminishing returns is broken for a VM. Eventually you
reach a point where adding more ops actually decreases total
performance. Instead of the change in performance tending towards
zero, it actually becomes negative. No surprise there -- it's
the heart of the RISC v CISC debate.

Parrot has a RISC-like 3 operand design. IMHO if all ops are
consistent with that design, it will be easier/faster to generate
code. You can always add abs_i later if abs_i_i turns out to be a
performance problem.

- Ken




Re: sizeof(INTVAL), sizeof(void*), sizeof(opcode_t)

2001-11-20 Thread Ken Fox

James Mastros wrote:
> In byteswapping the bytecode ...
> 
> I propose that we make INTVAL and opcode_t the same size, and gaurrenteed
> to be able to hold a void*.

It sounds like you want portable byte code. Is that a goal? It seems like
we can have either mmap'able byte code or portable byte code, but not both.
Personally, I'd rather have portable byte code because memory is cheap
and self-modifiying byte code opens up a lot of optimization potential. I
know others disagree.

Are we looking at two different byte code formats? Dan?

- Ken



Re: What happened at little languages?

2001-11-19 Thread Ken Fox

Simon Cozens wrote:
> Us: We're working on this, that and the other.
> Them: Pshaw. We solved those problems thirty years ago.

Were Perl and Python both grouped into the same category
of re-inventing the wheel? Or is this just the academic
distaste for Perl syntax showing through? I had hoped that
the Perl 6 effort, especially Damian's influence, might
gain respect for Perl in the academic community. Doesn't
sound like it though.

What new and interesting things did the "Them" crowd
talk about?

- Ken



What happened at little languages?

2001-11-19 Thread Ken Fox

Can't find any articles or notes on what happened
at the conference. What happened? I'm really curious
about the "Worse is Better" panel and the talk that
Dan and Simon gave.

- Ken



Re: Proper Tail Recursion

2001-11-15 Thread Ken Fox

Shlomi Fish wrote:
> Proper Tail Recursion is harder to debug, but consumes less memory and is
> faster to execute ...

It definitely consumes less memory, but performance is the same (until
the memory issue starts dominating...) I don't know what you mean by
debugging -- user code or parrot internals? Users may be confused when
caller() doesn't have the complete list of callers, but that doesn't
make debugging harder.

> For instance, we can have a "ret-with-call" opcode. However, isn't it
> exactly the same as a jump instruction ?

No. The current scope must be destroyed, lexicals destroyed, localized
globals restored, etc. It's basically the clean-up of a return, but the
control flow logic of a sub call.

> What is the current stance on implementing proper tail recursion in perl6?

You'd have to ask perl6-language. IMHO Parrot should have tail-call
ops even if perl6 doesn't specify tail recursion.

- Ken



better parrotguts drawing

2001-11-14 Thread Ken Fox

I made a couple changes to the drawing. Stacks
and register structures are now a bit more
conceptual -- it is much easier to see how they
work.

See 
for the latest. Keep in mind that the light blue
"frame" stuff at the bottom is experimental.

Anybody interested in a version useful for hand
simulation? I'm pretty sure that gluing a picture
to foam core board would be a pretty good tool
for simulating Parrot. Gotta love low tech push
pins and string... ;)

- Ken



Re: Lexical implementation work

2001-11-13 Thread Ken Fox

Dan Sugalski wrote:
> Nope, not stone tablet at all. More a sketch than anything else,
> since I'm not sure yet of all the things Larry's got in store.

Ok. I've made some more progress. There's a crude picture of
some of the internals at 
The lexical stuff is at the bottom of the picture.

I've patched the assembler to accept syntax like:

.begin
  .reserve 1
  fetch_lex I0, 0, 0
.end

This is "nice" syntax for use in hand-coded assembler. I assume
that a compiler will take control of the scope definitions and
emit the enter/exit scope ops itself. (I'm throwing in the more
complicated sync_scope op too so a compiler can handle weird
jumps between scopes -- IMHO it's simpler to do this than provide
an introspective API that lets the compiler manually adjust
the scope stack.)

There's psuedo-code for the ops attached below. I'm probably
a couple evenings away from having things working.

Unless people really hate the idea, I'm going to put in "v"
operand types in ops2c.pl. They'll look like "c" types (e.g. "nc"),
but reference the current lexical scope instead of the constant
section.

QUESTIONS!

Who owns the bytecode format? How do I propose changes? I need
a "scope definition" section. Each scope is assigned a per-module
id. I'm not sure what info is needed yet, but certainly a size
and a code ref (opcode address) for the DESTROY sub.

The control stack isn't used for much and it would simplify my
code a lot if we combine the frame stack with the control stack.
The only down-side I think will be with hand-coded assembler.
There may be a redundant "restore scope" pointer on the stack
when calling a sub in the same scope. (This is impossible with
Perl code BTW.) Well, there's also a "purist" downside too --
mixing opcode addresses with lexical storage. This is the thing
that makes capturing a closure easier, so I see it as an
advantage.

Anybody care if I subsume the control stack?

Lastly, does anybody care if I change how assembler directives
are registered? I think there are going to be a whole bunch of
symbol table and debugging directives going in. The current "if"
tests are kind of clunky.

Here's the op definitions (in pseudo-code form) taken from the
copy of core.ops I'm working on:

--- parrot/core.ops Wed Nov  7 22:34:10 2001
+++ my-parrot/core.ops  Tue Nov 13 21:13:53 2001
@@ -1930,6 +1930,159 @@
 
 ###
 
+=head2 Lexical Variables
+
+These opcodes implement lexical scopes and PMC lexical variables.
+This functionality is only concerned with scoping and storage,
+not with the symbol tables used for translating variable names to
+storage. Compilers often use the same symbol table for managing
+both lexical and dynamic variables.
+
+First, the terminology:
+
+Lexical Scope: A contiguous region of text (code) bounded by
+.begin/.end scope directives. A lexical scope may contain any
+number of non-overlapping interior (child) scopes. The flow
+control of the code does not affect these scoping rules.
+
+Lexical Variable: A variable that is visible only by code in the
+same lexical scope, or an interior scope, that the variable is
+defined in.
+
+Frame: A group of lexical variables defined in a lexical scope.
+The frame does not include the lexical variables defined in
+interior scopes -- each interior scope requires its own frame.
+
+Frame ID: The position of a frame relative to either the current
+scope (non-positive frame IDs), or the outer-most scope (positive
+IDs).
+
+Variable ID: The non-negative position of a variable relative to
+the start of a frame.
+
+Scope ID: A unique positive number assigned by the assembler or
+compiler to each lexical scope. Information about the scope, such
+as how much space is required for a frame, is retrieved using the
+scope ID.
+
+=over 4
+
+=item B(out p variable, i|ic frame_id, i|ic variable_id)
+
+=item B(i|ic frame_id, i|ic variable_id, p variable)
+
+Note: While the PMC code is being developed, lexicals hold integers
+instead of PMCs. This changes the usage of lexicals because PMC
+lexicals will not need to be stored back to the frame.
+
+=item B(ic scope_id)
+
+=item B()
+
+=item B(ic scope_id)
+
+=item B<.begin> [ID]
+
+B<.begin> is a pseudo-op that does two things: it begins a lexical
+scope at the current position in the code and it emits an
+B op for the current scope. If ID is not provided, the
+assembler will generate one automatically.
+
+=item B<.reserve> variable_count
+
+=item B<.end>
+
+B<.end> is a pseudo-op that does two things: it ends the current
+lexical scope (returning to the enclosing lexical scope) and it
+emits an B op.
+
+=back
+
+=cut
+
+AUTO_OP fetch_lex(i, i|ic, i|ic) {
+  /*
+ int frame_id = $2;
+ int variable_id = $3;
+
+ if (frame_id <= 0) {
+frame_id += interpreter->frame_display->used;
+ }
+
+ $1 = interpreter->frame_display->frame[frame_id][variab

Re: memory assumptions

2001-11-13 Thread Ken Fox

Michael L Maraist wrote:
> Are we allowing _any_ dynamic memory to be non-GC-managed?

Parrot must allow third party libraries to use the standard system
malloc/free. Playing linker games to hide malloc/free gets *really*
ugly.

> Can we assume that a "buffer object" is ONLY accessible by a single
> reverse-path-to-PMC?

IMHO whatever we want to call private shouldn't be accessible to
extensions. If singleton buffers make GC easier then do it. PMC
internals are definitely private -- vtables require it.

> I believe Dan said that we _could_ assume that PMC's were unmovable

The term PMC is used in two ways: a PMC is an opaque handle to
a Perl variable, and a PMC is a struct that holds (meta)data for
a Perl variable.

The handles are movable, but the structs aren't. This isn't a big
deal since the structs are all the same size -- it's the array of
handles concept often used to simplify GC. I don't think anybody
has decided whether PMCs nested inside another PMC (arrays/hashes)
follow the normal PMC rules. (It would be a pain to use them if
they didn't, but there are lots of optimizations that need
contiguous+movable internals.)

> Are _all_ handles (PMC's, Strings, or whatever) that serve as
> root-set elements dynamically allocated?

Nope. Part of the root set includes auto C PMC handles used by
extensions. You also have to think about hardware registers too
if GC is running in another thread.

IMHO the easiest way to code this is to define an API that
extension writers use to declare every PMC handle. This code
looks ugly and encourages bugs though so a lot of people don't
like it. Many systems treat the C stack as conservative roots.
Those systems flush the hardware registers to the C stack
before running the collector. The benefit is that handles
don't need to be declared. The cost is the non-portable code
used to scan the C stack and flush the registers.

Another simple solution is to suspend the collector while in
extension code (or untrusted extension code). This works for
lots of cases, but falls apart badly in callback/event-loop
extensions that never return.

One nice thing about writing Perl extensions is that the "SV *"
can be stuck in odd places. For example, I put them in X Toolkit
and Motif widget "user data" slots. Once the "SV *" are stuck
in the slots I have no idea what happens to them. This works fine
because the ref count will keep the SV alive until the widget
dies. A hybrid ref count GC with pinned PMCs will work exactly
the same. If we get fancier though and either use movable
PMCs or tracing collectors, then my "user data" slots become
roots. That will crash and burn because I don't know how to
register them. I'd have to use a separate array for holding the
PMCs and then stick the array indexes into the widgets.

It's probably not worth making my job as an extension
writer easier if it slows/complicates the rest of the GC. Most
extension writers are pretty good at this sort of thing and
a special utility library for pseudo-pinned PMCs wouldn't be
any big deal.

Hmm. I guess that argues against pinned PMCs. Maybe the best
thing is to develop a public API to PMCs that doesn't assume
pinning. That way you can use pinned PMCs as an implementation
optimization while leaving the door open for changing the
internals in the future.

Other extension writers may have a different view. ;) Asking
for comments on comp.lang.perl.modules would be a good idea.
Maybe the perl-xs mailing list too.

- Ken



Re: Mixing lightweight refcount and full GC

2001-11-12 Thread Ken Fox

Michael L Maraist wrote:
> No barriers for us?

Generational collectors require a write barrier because
old objects must never point to younger ones. ('Course Dan
said he's starting with a simple copying collector, so we
don't need a barrier. Hmm. I guess Dan's not *reject*ing
a barrier, just rejecting a barrier *now*. ;)

Your point about XS code is a good one. I'm not sure the
write barrier is a problem though because all XS code
currently goes through the inc/dec/store API. The main
problem IMHO is that XS code caches PMC internals -- if
a GC moves the internals the XS code will crash. All
copying collectors have this problem.

Good article. I haven't gotten through the vmem article
yet and now here's another one. ;)

- Ken



Re: polymorphic inline caches

2001-11-12 Thread Ken Fox

Simon Cozens wrote:
> You save one level of indirection, at a large complexity
> cost.

A lot less complexity than a JIT though. 100% portable
code too.

It's also something that can be bolted on later, so there's
no reason to reject it now. I'm just throwing it out to the
list because I know other people are experimenting with
dispatchers.

- Ken



polymorphic inline caches

2001-11-12 Thread Ken Fox

I few days ago I suggested inlining some PMC methods
would be good for performance. It turns out that this
question has been heavily studied by the OO community
for at least 10 years. A common solution is to
dynamically replace a method call with the body of the
method wrapped in an if statement.

Code like:

  object->foo()

becomes

  if (typeof object == Bar) {
 ... inlined body of Bar::foo
  }
  else {
 object->foo()
  }

There are lots of tweaks to the idea. SmallEifel
uses this concept to completely eliminate vtable
method lookups -- all of its methods are called using
inlined binary search trees. This is totally useless
for Parrot, but somebody might find it as interesting
as I did. ;)

Where this fits into Parrot's interpreter is that
languages could pre-generate ops corresponding to
dynamically generated inlined caches. All we need is a
way to replace the simple method call op with the
inlined one. Yep. You heard it -- dynamically modified
byte code.

- Ken



Re: preferences for data structure diagrams?

2001-11-12 Thread Ken Fox

Robert Spier wrote:
> On Sun, Nov 11, 2001 at 07:38:28PM -0500, Ken Fox wrote:
> | ... Powerpoint would be a better choice since everybody
> | has to deal with that format anyway.
> 
> Please, no!  Powerpoint is one of the few formats which
> cannot be easily read on a non Windows or Mac system.  Any
> raster graphics format, or PDF.  Or all of the above.  

Brent Dax wrote:
> Ken Fox:
> # Ideally we should have something with the quality of
> # PerlGuts Illustrated and the simplicity of ASCII.
> 
> I'd suggest PDF or PostScript for most, and a GIF or something for
> those who can't read the main format.

Sorry I wasn't clear. Definitely PDF or HTML+PNG are the
choices for viewing.

I was wondering about *editing* them. IMHO all the data
structures in Parrot must be documented as beautifully as
PerlGuts Illustrated. Parrot is evolving quickly though
and needs documentation that is easy to update.

"dia" is probably a worse choice than Powerpoint -- just
count the number of people who have an editor able to make
changes to those formats. I've heard of several
non-Microsoft solutions for editing Powerpoint too. I'm
not a fan of Powerpoint, but it does seem to be a de facto
structured graphics format.

- Ken



Lexical implementation work

2001-11-08 Thread Ken Fox

Simon just chastised me for talking instead of patching.
Most of the stuff I'm interested in is heavily related to
the implementation of lexicals, so that's where I'm going
to jump in.

There are a number of decisions to make about lexicals
and the current PDD is pretty slim. So, IMHO, the place
to start is with the PDD. Anybody have any more thoughts
on the interface? Dan? Is this stone tablet stuff yet?

The big five questions on my list are:

1. The find_lex/fetch_lex ops push a lot of work into the
   run-time when the compiler can figure out what frame a
   lexical appears in.

   Should there be an alternative, faster interface to
   lexicals that eliminates string lookups? The symbolic
   interface is required to support %MY and debugging, so
   it stays regardless of whether other interfaces exist.

2. Perl 5 doesn't support nested subs, but I haven't read
   anything about Perl 6 and I don't know if other
   targets for Parrot support them either.

   Do we want to support nested subs? Efficiently?

3. We've adopted a register machine architecture to
   reduce push/pop stack traffic. Register save/load
   traffic is similar, but not nearly as bad.

   Do we want to further reduce memory moves by allowing
   ops to work directly on lexicals?

4. There has been discussion about how useful non-PMC
   registers will be to the Perl compiler. If there are
   questions there, surely non-PMC lexicals would be even
   less useful to Perl.

   Do we want non-PMC lexical support?

5. Perl 5 uses pads and scratchpads for holding lexicals.

   Do we want to consider other implementations such as
   traditional stack frames with a display or static
   link?

I'm leaning towards yes on all questions.

If there aren't any obvious answers, I propose that we
implement a couple different approaches and objectively
compare them. Lexical implementation is as critical to
Perl's performance as the dispatcher is, so we should
take some time to get it right.

- Ken



Re: Yet another switch/goto implementation

2001-11-08 Thread Ken Fox

Dan Sugalski wrote:
> Gack. Looks like a mis-placed optimization in perl 5. The list of a foreach 
> is *supposed* to flatten at loop start and be static. Apparently not. :)

Is anybody keeping a list of things that are *supposed* to be static? Is
the list changing much with Perl 6?

> Care to file the perl 5 bug report, or shall I?

I thought it was a feature. ;)

- Ken



Re: JIT compilation

2001-11-08 Thread Ken Fox

Dan Sugalski wrote:
> [native code regexps] There's a hugely good case for JITting.

Yes, for JITing the regexp engine. That looks like a much easier
problem to solve than JITing all of Parrot.

> If you think about it, the interpreter loop is essentially:
> 
>while (code) {
>  code = (func_table[*code])();
>}

That's *an* interpreter loop. ;)

> you pay a lot of overhead there in looping (yes, even with the computed 
> goto, that computation's loop overhead), plus all the pipeline misses from 
> the indirect branching. (And the contention for D cache with your real data)

The gcc goto-label dispatcher cuts the overhead to a single indirect
jump through a register. No table lookups. No extra branches. This is
even on the register starved x86. Yeah, there's a stall potential,
but it looks easy enough to fill with other stuff.

If Perl code *requires* that everything goes through vtable PMC ops,
then the cost of the vtable fetch, the method offset, the address
fetch and the function call will completely dominate the dispatcher.

> Dynamic languages have potential overheads that can't be generally factored 
> out the way you can with static languages--nature of the beast, and one of 
> the things that makes dynamic languages nice.

I know just enough to be dangerous. Lots of people have done work in
this area -- and on languages like Self and Smalltalk which are as
hard to optimize as Perl. Are we aiming high enough with our
performance goals?

I'll be happy with a clean re-design of Perl. I'd be *happier* with
an implementation that only charges me for cool features when I use
them. (Oops. That's what Bjarne said and look where that took him... ;)

- Ken



Instruction timings for Intel/AMD chips?

2001-11-01 Thread Ken Fox

After downloading a dozen PDF files I've given up.
All I need is the approximate cycle counts for
instructions and address modes.

The particular problem I've got now is deciding
which of these three is the fastest:

   movl (%edi,%eax,4),%eax
   movl (%edi,%eax),%eax
   movl (%edi),%eax

Same with:

   movl $1,(%eax,%edx,4)
   movl $1,(%eax,%edi)

According to an old 486 book I have, it claims
that complex addressing modes don't have cycle
penalties for leaving out the scale or the offset.
That seems hard to believe for the RISC-like
P3s and Athalons.

What about other processors? Is it common to
have address modes like:

   base+offset*scale

Most RISC instruction sets only provide base +
constant offset don't they?

Yeah, yeah, I know. Premature optimization is the
root of all evil. Except this isn't premature.
Getting to 50 mops was pretty easy. Getting to 100
mops is a *lot* harder!

- Ken



Re: vmem memory manager

2001-11-01 Thread Ken Fox

Michael L Maraist wrote:
[an incredible amount of detailed information that will
 take me weeks to digest...]

This looks like a malloc/free style allocator. Since the whole
GC system for Parrot is on the table, you don't have to constrain
yourself to malloc/free. IMHO free is not needed at all -- we
should be scavenging entire arenas all at once. I assume you
want to use malloc to grab an arena, but carving up the arena is
the GC's job.

- Ken



Re: Improved storage-to-storage architecture performance

2001-10-30 Thread Ken Fox

Michael L Maraist wrote:
> The only "memory" storage for scalars that I currently am conceiving of is
> in name-space stashes (globals). Thus our most likely implementation of S2S
> would be to have 'add "g_x", "g_y", "g_z"' which performs two stash 
> lookups, an add, then one stash write.

Kakapo currently uses:

   .global G:$x
   .global G:$y
   .global G:$z

   add G:$x, G:$y, G:$z

The assembler translates this to:

   add [1:0], [1:1], [1:2]

The operands are in [frame:offset] notation. Global variables always
appear in frame #1. Symbol table lookup is done at compile time.
Run-time uses the same addressing mode to fetch globals as it does
to fetch locals.

There isn't any aliasing mechanism -- I'm not sure if Perl 6 bindings
are going to require VM aliasing support or not. Even without aliasing,
somebody can still do by-name lookups: (hand-waving here)

   lookup L0, "$x"

That doesn't alias L0 to G:$x -- it copies the value of G:$x into
L0. (Note that PMCs act like pointers anyways, so if G:$x is a PMC,
then L0 is effectively an alias. That wouldn't be true if G:$x
was a machine int.)

> How does the GC sweep across a huge VM heap (as opposed to walking
> the register / stash maps).

It simplifies things because the root set is just the frame display
and inactive regions of the display that have been pushed onto the
stack. Since there aren't any registers or register stacks, all
traversals are done on a single data structure -- a frame.

> I spoke exclusively on scalars (PMCs) because I had an interesting time 
> considering static datatypes.  It's obvious what happens when we declare:
> for my int $idx ( 0 .. 100 ) {
> }
> We get an integer register variable which maps directly to add / extraction 
> calls.

Maybe. You might get a strongly-typed PMC depending on how the body
of the code uses $idx. (Dan has said that Perl 6 functions will take
a list of PMCs as arguments. If $idx gets passed to a sub, you'll
need to create a scratch PMC out of that integer register.)

Capturing a closure inside the loop might also affect how $idx is
allocated.

> But what about:
> 
> our int $g_idx;

You're asking how Parrot will do this? I don't think Dan's told us yet.

In Kakapo's case, the compiler just assigns a spot in the
scope definition. When the scope is entered, a frame is created
large enough to hold all the variables. A single frame holds
many types -- the only difference between "our int $g_idx" and
"our $g_idx" is how the space in the frame is initialized.

- Ken



Re: Improved storage-to-storage architecture performance

2001-10-30 Thread Ken Fox

Dan Sugalski wrote:
> Hmmm. I'd like to see the two run with the same style dispatcher to get a 
> closer comparison of run speeds. When you say threaded, you're doing more 
> or less what the switch/computed-goto dispatcher does, right?

If you take a look at the op_add code I posted, you'll see that it
uses gcc's "goto *address" feature. What happens is the bytecode is
pre-processed and the addresses of the ops are stashed directly
into the byte code stream. There's nothing to compute -- the system
just loads the address and jumps.

I'd really like to try different dispatchers out with Parrot too. That's
why I asked if anybody did a threaded dispatcher yet. (If nobody has,
then I'll do one...)

> >Parrot and Kakapo should have very similar mops when using the
> >same dispatcher.
> 
> In which case I'm not sure I see the win, though I'm definitely curious as 
> to how it stacks up. I think the end result will be essentially the same as 
> far as performance goes, but I'm not sure yet.

The win is in simplicity -- both the VM and the compiler. Register
VMs require the compiler to load things into registers, right? If the
register allocation is good it will be fast. Poor register allocation
will waste time with redundant loads, excessive register spills, etc.

> Ken Fox wrote:
> > What happens when you "goto middle" depends on where you started.
> 
> Personally I'm all for throwing a fatal error, but that's just me.

:)

> If you're copying things around that means you have to do a bunch of 
> pointer fixups too, otherwise you'll have code pointing to the wrong place.

Nope. The byte code holds pointers to scope definitions (think of them
like templates) and op addresses. Everything else is a [frame:offset]
pair. Entire frames can be moved around without disrupting any code. The
callee can even move around its caller's frames and nothing breaks. (This
is how taking continuations works.)

> If you're not storing pointers to things in frames, then I don't see the 
> advantage to this scheme, since you're indirect anyway, which is what we 
> are now.

Indirection is absolutely required in both our schemes. The difference
is that a register architecture copies data into a register whereas
a storage-to-storage architecture works on the data in place.

You can think of Kakapo as having a single address mode for
everything. Constants, globals, locals, temporaries, etc. are all
identified by [frame:offset]. (In-lined constants are an exception
to this rule, but that's just a performance hack.)

> Smart compilers aren't that tough to build any more--there's a lot of 
> literature for them these days, so it's not much more work to build a
> smart one than it is to build a dumb one.

Sure, agree. Smart compilers can take some time to run though. For
example, IMHO a compiler has to hoist register loads out of loops to
get decent performance. To do that it's going to have to analyze when
the local variable will change -- a pretty expensive optimization.

Maybe it won't matter because PMCs all introduce an extra layer of
indirection anyway. I have no idea whether most things will be
PMCs or if compilers will try to use strongly typed registers for
speed.

- Ken



Re: java vs. parrot mops

2001-10-30 Thread Ken Fox

Kevin Huber wrote:
> This is a comparison of mops running on Parrot (-O6 on an Athlon 700)
> versus Java JDK 1.4.0 beta 2 jitted and interpreted.  You can see that
> Parrot performance is very comparable to Java in interpreted mode.

I have an Athlon 700 too. With these compiler flags:

PERL-CFLAGS = -O3 -fomit-frame-pointer -pipe -s -march=pentium -ffast-math \
-fexpensive-optimizations -fno-strict-aliasing \
-D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64

I'm seeing 24 mops which puts Parrot even closer to Java. Do those
flags improve your times?

When you run mops.pbc through pbc2c what kind of performance do
you get?

I have a feeling that Parrot will end up with higher code density
than a JVM so performance in "real world" cases will be better. Of
course most of the effort in Java is going into JIT, so shooting for
JVM interpreter speeds is likely to be shooting way too low.

- Ken



Re: Improved storage-to-storage architecture performance

2001-10-29 Thread Ken Fox

Uri Guttman wrote:
> that is good. i wasn't disagreeing with your alternative architecture.
> i was just making sure that the priority was execution over compilation
> speed.

I use a snazzy quintuple-pass object-oriented assembler written
in equal parts spit and string (with a little RecDescent thrown in for
good measure). A real speed demon it is... ;)

The real motivation of my work is to see if a storage-to-storage
machine ends up using cache better and with less compiler effort
than a register machine. When I read about CRISP, the first thing
that came to mind was the "top-of-stack-register-file" could be
simulated exactly with high-speed cache in a software VM. Dropping
the stack-machine instructions in favor of Parrot's 3 operand ones
made it sound even better.

> ... then be mmap'ed in and run with hopefully impressive speed.

I'm impressed with the possibilities of the pbc->C translator. The
core modules on my system probably won't be mmap'ed byte code -- they'll
be mmap'ed executable. Reducing memory foot-print this way might take
some of the pressure off the need to share byte code. Lots of really
nice optimizations require frobbing the byte code, which definitely
hurts sharing.

- Ken



Re: Improved storage-to-storage architecture performance

2001-10-29 Thread Ken Fox

Uri Guttman wrote:
> so my point is the the speed of the VM is a separate issue from the ease
> of code generation. an S2S VM would be easier to code generate for but
> may be slower to run. the speed difference is still an open point as dan
> has said. but since his goal is execution speed, that will determine the
> best parrot design, not ease of code generation.

Absolutely. Which is why I posted my numbers and my code.

The other thing to consider is that Perl is still a compile-on-the-fly
system. I hope Perl 6 keeps this aspect of Perl instead of moving to a
more Javaesque development environment.

This means time spent doing dataflow analysis to help with register
allocation is going to eat into perceived execution speed. Parrot might
fly, but if the Perl 6 compiler is slow, the user experience will be
poor. Hopefully the compiler will run quickly and generate decent code.
A separate super-optimizing compiler can be used for important things
like CGI.pm. ;)

- Ken



Re: Improved storage-to-storage architecture performance

2001-10-29 Thread Ken Fox

Uri Guttman wrote:
> and please don't bring in hardware comparisons again. a VM design
> cannot be compared in any way to a hardware design.

I have absolutely no idea what you are talking about. I didn't
say a single thing about hardware. My entire post was simply about
an alternative VM architecture. It's not a theory. You can go get
the code right now.

I'm just messing around on a storage-to-storage VM system I've
named Kakapo. It's a dead-end. A fat, flightless, endangered kind
of parrot. It's fun to experiment with ideas and I hope that good
ideas might make it into Parrot.

- Ken



Re: Improved storage-to-storage architecture performance

2001-10-29 Thread Ken Fox

Dan Sugalski wrote:
> What sort of dispatch was your version using, and what sort was
> parrot using in your test?

Parrot used the standard function call dispatcher without bounds
checking.

Kakapo used a threaded dispatcher. There's a pre-processing phase
that does byte code verification because threading makes for some
outrageously unsafe code.

Parrot and Kakapo should have very similar mops when using the
same dispatcher. You all know what a Parrot "add" op looks like.
Here's the Kakapo add op:

op_add:
STORE(kvm_int32, pc[1]) = FETCH(kvm_int32, pc[2]) +
  FETCH(kvm_int32, pc[3]);
pc += 4;
NEXT_OP;

Ok, ok. You want to know what those macros do... ;)

op_add:
*(kvm_int32 *)(frame[pc[1].word.hi] + pc[1].word.lo) = 
   *(const kvm_int32 *)(frame[pc[2].word.hi] + pc[2].word.lo) +
   *(const kvm_int32 *)(frame[pc[3].word.hi] + pc[3].word.lo);
pc += 4;
goto *(pc->i_addr);

I haven't counted derefs, but Parrot and Kakapo should be close.
On architectures with very slow word instructions, some code bloat
to store hi/lo offsets in native ints might be worth faster
address calculations.

> Ken Fox wrote:
> > One thing I learned is that it's not necessary (or
> > desirable) to do enter/exit scope ops.
> 
> Don't forget that you'll need those for higher-level constructs. For
> example, this code:
> 
>{
>   my Dog $spot is color('brindle'):breed('welsh corgi');
>}
> 
> will need to call Dog's constructor and attribute setting code every time
> you enter that scope.

Definitely. I didn't say Kakapo doesn't have enter/exit scope
semantics -- it does. There's no byte code "enter scope" op though.
What happens is more declarative. There's a sync_scope guard op
that means "the VM must be in lexical scope X to properly run the
following code." If the VM is already in scope X, then it's a nop.
If the VM is in the parent of X, then it's an enter scope. If the
VM is in a child of X, then it's an exit scope.

This makes it *very* easy for a compiler to generate flow control
instructions. For example:

{
   my Dog $spot ...

   {
  my Cat $fluffy ...

middle: $spot->chases($fluffy);

   }
}

What happens when you "goto middle" depends on where you started.
sync_scope might have to create both Dog and Cat scopes when code
jumps to the middle. Or, code might already be in a sub-scope of
Cat, so sync_scope would just pop scopes until it gets back to Cat.

This is where sync_scope is very useful. It allows the compiler
to say "this is the environment I want here" and delegates the job
to the VM on how it happens.

> You also potentially need to allocate a new scope object every time you
> enter a scope so you can remember it properly if any closures are created.

Closures in Kakapo are simple. All it needs to do is:

1. copy any current stack frames to the heap
2. copy the display (array of frame pointers) to the heap
3. save the pc

Step #1 can be optimized because the assembler will have a pretty
good idea which frames escape -- the run-time can scribble a note
on the scope definition if it finds one the assembler missed.
Escaping frames will just be allocated on the heap to begin with.

This means that taking a closure is almost as cheap as calling
a subroutine. Calling a closure is also almost as cheap as
calling a subroutine because we just swap in an entirely new
frame display.

> How does this handle nested copies of a single scope? That's the spot a SS
> architecture needs to switch to indirect access from direct, otherwise you
> can only have a single instance of a particular scope active at any one
> time, and that won't work.

Calling a subroutine basically does this:

1. pushes previous return state on the stack
2. sets the return state registers
3. finds the deepest shared scope between caller and callee's parent
4. pushes the non-shared frames onto the stack
5. transfers control to the callee
6. sync_scope at the callee creates any frames it needs

> I'm curious as to whether the current bytecode could be translated on load
> to something a SS interpreter could handle.

Never thought of that -- I figured the advantage of an SS machine
is that brain-dead compilers can still generate fast code. Taking
a really smart compiler generating register-based code and then
translating it to an SS machine seems like a losing scenario.

I think this is why storage-to-storage architectures have lost
favor -- today's compilers are just too smart. Possibly with a
software VM the memory pressure argument favoring registers isn't
strong enough to offset the disadvantage of requiring smart
compilers.

I just put up the 0.2 version of Kakapo at
<http://www.msen.com/~fox/Kakapo-0.2.tar.gz>

This version ha

Improved storage-to-storage architecture performance

2001-10-29 Thread Ken Fox

A little while back I posted some code that
implemented a storage-to-storage architecture.
It was slow, but I tossed that off as an
implementation detail. Really. It was. :)

Well, I've tuned things up a bit. It's now
hitting 56 mops with the mops.pasm example. Parrot
turns in 24 mops on the same machine with the same
compiler options. This is not a fair comparison
because the Parrot dispatcher isn't optimal, but it
shows I'm not hand waving about the architecture
any more... ;)

Dan was right. It's a lot faster to emit explicit
scope change instructions than to include a scope
tag everywhere. Memory usage is about the same, but
the explicit instructions permit code threading
which is a *huge* win on some architectures. The
assembler does 99% of the optimizations, and it
still uses scope tagged instructions, so nothing is
really lost by ripping out the scope tags.

One thing I learned is that it's not necessary (or
desirable) to do enter/exit scope ops. I implemented
"sync_scope" which takes a scope id as an operand
and switches the VM into that scope, adjusting
the current lexical environment as necessary. This
works really well. The reason why sync_scope works
better than explicit enter/exit ops is because
sync_scope doesn't force any execution order on the
code. Compilers just worry about flow control and
the VM figures out how to adjust the environment
automatically. For example, Algol-style non-local
goto is very fast -- faster and cleaner than
exceptions for escaping from deep recursion.

One other thing I tested was subroutine calling.
This is an area where a storage-to-storage arch
really shines. I called a naive factorial(5) in a
loop 10 million times. Subroutine call performance
obviously dominates. Here's the code and the times:

Parrot: 237,000 fact(5)/sec

fact:   clonei
eq  I0, 1, done
set I1, I0
dec I0, 1
bsr fact
mul I0, I0, I1
done:   saveI0
popi
restore I0
ret

Kakapo: 467,000 fact(5)/sec

.begin
fact: arg   L0, 0
  cmp   L1, L0, 1
  brne  L1, else
  ret.i 1
else: sub   L2, L0, 1
  jsr   L3, fact, L2
  mul   L4, L0, L3
  ret.i L4
.end

I think the main thing that makes the storage-
to-storage architecture faster is that the callee
won't step on the caller's registers. The caller's
arguments can be fetched directly by the callee.
There's no argument stack or save/restore needed.

Here's the calling conventions for Kakapo.

On a sub call, the pc is saved in the ret_pc
register. Any frames not shared (lexically) between
the caller and callee are dumped to the stack (just
the frame pointers; the frames themselves are never
copied).

A sync_scope instruction at the start of a sub
takes care of building the callee's lexical
environment.

The caller passes arguments by reference. The "arg"
instruction uses the operands in the jsr instruction
as an argument list. (The jsr instruction is easy to
access because the ret_pc register points to it.)
"arg" works exactly like "set" except that it uses
the caller's lexical environment to fetch the source
value. Yes, this makes jsr a variable-size instruction,
but so what? There's no penalty on a software VM.

- Ken



Anybody write a threaded dispatcher yet?

2001-10-29 Thread Ken Fox

Anybody do a gcc-specific "goto *pc" dispatcher
for Parrot yet? On some architectures it really
cooks.

- Ken



Interesting experiment with lexical scoped VM

2001-10-21 Thread Ken Fox

A while back I wondered if a higher-level VM might be
useful for implementing higher-level languages. I
proposed a lexically scoped machine without registers
or stacks. The response wasn't encouraging.

A quick tour through the library turned up a few
machine designs that sounded very similar to what I
was thinking. There's even a name for them: storage
to storage architectures. The one I found the most
literature on was CRISP. AT&T based the Hobbit chips
off the CRISP design.

CRISP is a RISC architecture that does not have
named registers. It maps the top of the stack to a
large register file and provides a few addressing modes
to access the stack. This hugely simplifies a compiler
because it doesn't have to worry about register
conventions -- especially it doesn't have to worry
about saving/restoring register sets in function calls.
The effect is sort of like custom-fit register windows.

One thing that makes these particularly well suited
for a software VM is that they never copy data.
A software register architecture needs to be *very*
smart about register usage, otherwise all it's doing
is loading two copies of everything into L2.

I became interested enough that I implemented a
throw-away VM based on the concept. It's a Perl 5
module that implements a lexically scoped, storage
to storage VM. Parts of the system ended up being more
of a framework -- there's a parser, assembler, byte-code
generator and VM all integrated together. It's very
easy to hack on.

I'm calling the module Kakapo. The Kakapo is the
heaviest parrot in the world. It's critically endangered
because the *flightless* bird uses a "stand and freeze"
defense to evade predators.

As you can see I have high hopes for this code. ;)

I realize I'm too slow to have any direct impact on
Parrot, but maybe some of the ideas will find a home.

Here are a couple factorial examples in Kakapo:

; code a stupid compiler might generate

.begin
fact1:arg   L0, 0
  cmp   L1, L0, 1
  brne  L1, else
  ret   1
else: sub   L2, L0, 1
  jsr   L3, fact1, L2
  mul   L4, L0, L3
  ret   L4
.end

; code a smart compiler might generate

.begin
fact2:arg   L0, 0
  sub   L1, L0, 1
  bre   L1, done
  jsr   L1, fact2, L1
  mul   L0, L0, L1
done: ret   L0
.end

The ".begin" and ".end" commands are assembly
directives that introduce a lexical scope. Every
time a scope is entered (via a subroutine call
or branch) the VM compares the current lexical
scope state to the one the instruction requires.
The VM will automatically sync up. This works
almost exactly like make-your-own register
windows. Here's an example:

  .begin
 set L0, 1
 .begin
set L0, L0@1
 .end
  .end

The inner scope has it's own register set -- it
only uses "L0" so the set only has one register
in it. The "L0@1" syntax asks for the "L0"
register in the parent scope. (This isn't the
calling scope like the way register windows work
on the Sparc -- this is lexical scoping. The
"arg" op is used to reference the caller's scope.
It treats the argument list of the "jsr" op as
an export statement. The "arg" op imports from
that list.)

Using scopes allows nested subroutines to be
written very easily:

.begin
tens: arg   L0, 0
  set   L1, 10
  set   L2, 0

  jmp   test
next: jsr   L2, add, L2
  sub   L0, L0, 1
test: cmp   L3, L0, 0
  brg   L3, next
  ret   L2

  .begin
add:arg L0, 0
add L0, L0, L1@1
ret L0
  .end
.end

Here "tens" is the address of a subroutine that
takes one input argument and return 10x the value.
It calls the "add" subroutine in a loop. add
takes one argument and adds "L1@1" -- or the L1
register in the parent lexical scope. If you
could write nested subroutines in Perl, the code
would look about like this:

 sub tens {
 my $n = $_[0];
 my $i = 10;
 my $r = 0;

 sub add {
return $_[0] + $i
 }

 while ($n > 0) {
$r = add($r);
$n--
 }

 return $r
 }

Another cool thing about scopes is that they can
be marked "static". (This is how global and module
global scopes work.) When a scope is static -- even
if it's in the middle of a bunch of dynamic scopes --
the VM will re-use a single instance of the scope.
Here's a simple example:

  .begin
 set L0, 1
 .begin static
add L0, L0, L0@1
 .end
  .end

The inner scope is static so "L0" is actually an
accumulator that will keep incrementing by the value
of the parent's "L0". This is pretty handy for
implementing class and function static variables.

Performance is slower than Parrot on some things, but
faster on others. For example, Kakapo is about 50%
slower on tight loops like:

  ; Kakapo
  loop1:  addL0, L0, 1
  cmpL1, L0, 1
  brlL1, loop1

 # Parrot
  REDO: 

Re: Thoughts on a higher-level VM

2001-09-22 Thread Ken Fox

David M. Lloyd wrote:
> Take it from me (the one with several abortive attempts at getting an
> extra compare stuck in Perl 5's dispatch loop):  You don't want to stick
> another compare in there.  It *kills* performance.

Kills? I thought the event flag test dropped performance by a few
percent. We can piggy-back off the event flag test, so I'm not
adding another test, just re-using one that's (already?) there.

Keep in mind that if dispatcher performance drops by a few percent,
we might be able to make it up simply due to reduced instruction
counts. Anybody know what the enter/exit scope percentage is?

IMHO the goal for a software CPU is low instruction counts, not
high MIPS. High MIPS means you're chewing up cache with an opcode
stream instead of useful data.

> You'd want to use scope enter and leave opcodes instead.  Much faster.

I'm not sure. It's a different architecture with many different
optimization possibilities. It would certainly make the compiler
a lot easier to build. No more weird restrictions on gotos.  A
"call ADDRESS" just stuffs the return address in a register and
jumps. Nested subs fall out for free. Tail-calls are just "goto
ADDRESS" because there isn't any prologue. Seems like a lot of
stuff gets a lot easier if you can treat frames declaratively.

One last thing to consider. If the run-time dynamic lexical
insertion stuff comes about, then we'll have to insert scope
enter/exit opcodes in every scope. There are going to be a
lot of empty scopes around. If the run-time is able to
dynamically change the scope tag on an instruction, then we
don't have to generate those empty scopes. The debugger could
actually be built on dynamic scopes -- when you want to set
a break point, you just change the scope tag on the instruction
you want to break and insert a lexical with a DESTROY sub
that halts the VM. Full-speed execution with breaks enabled.

If the VM supports it, dynamic scoping games could be a lot
of fun. I'm starting to come around to Damian's point of view...

- Ken



Thoughts on a higher-level VM

2001-09-22 Thread Ken Fox

I've been thinking about the possibility of building a higher-level
VM. The current VM is very close to a traditional CPU. What if we
did something non-traditional that made implementing higher-level
lexically scoped languages easier?

What if the VM (and assembler) had lexical scoping built-in at
the instruction level? Each instruction could be tagged with a
scope and if the scope changes, the VM could automatically fixup
the environment.

The current dispatch loop is something like:

  while (code)
 code = dispatch(code)

I'm proposing:

  while (code)
 if (code.scope != current.scope)
fixup_environment(code)
 code = dispatch(code)

If we include event checking in the dispatch loop, the scope change
check is free because we can combine the two into a single test --
scope 0 would be reserved for "event pending".

The fixup_environment() routine is very simple for the normal
enter/exit one scope case and the complexity of multi-level scope
changes is handled in one place, not spread through the branching
ops or compiler. Branching into a scope, calling nested functions,
and taking a closure would all be much easier to write.

Lexically scoped instructions would also permit another possibility:
a stack-less *and* register-less architecture. The VM could use
frame offsets for everything -- the only operands needed in the
assembler are local and global variables.

Scoped assembler would look something like:

  frame(size) {
ops ...
frame(size) {
  ops ...
}
  }

Local variables would use "L[frame][offset]" syntax. I see major
benefits in requiring the frame and offset values to be constants,
but they don't need to be. We'd lose the ability to statically check
code, but the dynamic lexical insertion feature would be easier to
implement. The frame size would be very easy to change too since
it's external to the instruction stream.

Global variables would use "G[name]".

The frames can be easily allocated in a small (L2 cache sized)
arena that would be just as fast as Parrot registers. If the GC
detects that a local variable has escaped, the frame just spills
into another arena, i.e. the first allocation is generation-0 and
the copy is generation-1. The compiler could even record scopes
that have escaping variables and allocate them in generation-1 to
begin with.

In summary, here's the advantages I see in a scoped VM:

  1. easier to generate code because the VM automatically
 handles entering/exiting scopes.

  2. easier to optimize scope allocations (or avoid
 allocations completely) because the instruction
 count does not change -- only the scope tags and
 frame offsets of local variables change.

  3. easier to implement higher-level features such as
 taking closures.

  4. lower instruction counts because the compiler
 doesn't need to generate enter/exit scope
 operations.

  5. performance might improve due to reducing the
 instruction count, reducing special cases in the
 branch opcodes and improving cache usage (copying
 a value into a register causes unnecessary cache
 pressure).

  6. the VM is simpler to specify -- no registers, no
 register stacks and no caller/callee saves.

The problems I see are:

  1. some languages might not need Perl 5 lexical scoping
 behavior (C for example does not allocate a frame
 for every scope -- scopes share a single frame).

  2. the test in the dispatch loop slows things down if
 event testing isn't needed. (But even though MIPS
 may be lower, "real" code may not be because the
 instruction count is lower.)

  3. a higher-level VM does not have a nice simple mapping
 to real hardware, so native code JIT will be more
 difficult.

  4. data sizes must be standardized so that offsets in
 bytecode files are portable -- otherwise we'll need
 another level of indirection which will be slower
 than Parrot registers. (If we don't care about
 mmap()ing the bytecode, then this isn't an issue.)

  5. bytecode files will probably be larger because the
 instruction count won't decrease enough to offset
 the extra tag on each instruction.

My idea might be too radical, but I want to see what everybody
thinks before we get too much code written.

- Ken



Thoughts on a higher-level VM

2001-09-22 Thread Ken Fox

[Sorry if this is a duplicate. I sent the original from work.
 Is there a spam filter removing messages from non-subscribers?]

I've been thinking about the possibility of building a higher-level
VM. The current VM is very close to a traditional CPU. What if we
did something non-traditional that made implementing higher-level
lexically scoped languages easier?

What if the VM (and assembler) had lexical scoping built-in at
the instruction level? Each instruction could be tagged with a
scope and if the scope changes, the VM could automatically fixup
the environment.

The current dispatch loop is something like:

  while (code)
 code = dispatch(code)

I'm proposing:

  while (code)
 if (code.scope != current.scope)
fixup_environment(code)
 code = dispatch(code)

If we include event checking in the dispatch loop, the scope change
check is free because we can combine the two into a single test --
scope 0 would be reserved for "event pending".

The fixup_environment() routine is very simple for the normal
enter/exit one scope case and the complexity of multi-level scope
changes is handled in one place, not spread through the branching
ops or compiler. Branching into a scope, calling nested functions,
and taking a closure would all be much easier to write.

Lexically scoped instructions would also permit another possibility:
a stack-less *and* register-less architecture. The VM could use
frame offsets for everything -- the only operands needed in the
assembler are local and global variables.

Scoped assembler would look something like:

  frame(size) {
ops ...
frame(size) {
  ops ...
}
  }

Local variables would use "L[frame][offset]" syntax. I see major
benefits in requiring the frame and offset values to be constants,
but they don't need to be. We'd lose the ability to statically check
code, but the dynamic lexical insertion feature would be easier to
implement. The frame size would be very easy to change too since
it's external to the instruction stream.

Global variables would use "G[name]".

The frames can be easily allocated in a small (L2 cache sized)
arena that would be just as fast as Parrot registers. If the GC
detects that a local variable has escaped, the frame just spills
into another arena, i.e. the first allocation is generation-0 and
the copy is generation-1. The compiler could even record scopes
that have escaping variables and allocate them in generation-1 to
begin with.

In summary, here's the advantages I see in a scoped VM:

  1. easier to generate code because the VM automatically
 handles entering/exiting scopes.

  2. easier to optimize scope allocations (or avoid
 allocations completely) because the instruction
 count does not change -- only the scope tags and
 frame offsets of local variables change.

  3. easier to implement higher-level features such as
 taking closures.

  4. lower instruction counts because the compiler
 doesn't need to generate enter/exit scope
 operations.

  5. performance might improve due to reducing the
 instruction count, reducing special cases in the
 branch opcodes and improving cache usage (copying
 a value into a register causes unnecessary cache
 pressure).

  6. the VM is simpler to specify -- no registers, no
 register stacks and no caller/callee saves.

The problems I see are:

  1. some languages might not need Perl 5 lexical scoping
 behavior (C for example does not allocate a frame
 for every scope -- scopes share a single frame).

  2. the test in the dispatch loop slows things down if
 event testing isn't needed. (But even though MIPS
 may be lower, "real" code may not be because the
 instruction count is lower.)

  3. a higher-level VM does not have a nice simple mapping
 to real hardware, so native code JIT will be more
 difficult.

  4. data sizes must be standardized so that offsets in
 bytecode files are portable -- otherwise we'll need
 another level of indirection which will be slower
 than Parrot registers. (If we don't care about
 mmap()ing the bytecode, then this isn't an issue.)

  5. bytecode files will probably be larger because the
 instruction count won't decrease enough to offset
 the extra tag on each instruction.

My idea might be too radical, but I want to see what everybody
thinks before we get too much code written.

- Ken



Re: String API

2001-09-11 Thread Ken Fox

Dan Sugalski wrote:
> If you're speaking of multiple buffers for a string or something like that,
> you're looking at too low a level. That's something that should go in the
> variables, not in the string bits. (We do *not* want all string ops slow to
> support flexibility of this sort. Only the bits that need it)

I think of buffers as more primitive than strings. They are segments of
bits. They can be copied, extended, garbage collected, etc. They don't
have character encodings. A string is just a buffer with a few more
semantics. PMCs might implement split strings or some higher abstraction
using several parrot_strings.

Perhaps you intend to make parrot_string.bufstart hold a buffer structure.
That'd be good, except that now parrot_string.buflen is not really owned
by the string anymore -- the buffer structure is probably going to need to
mess around with the string's internals. That increases the coupling between
buffers and strings. Coupling is bad. ;)

> >Others are the use of
> >stacks in the interpreter and the dispatch of opcodes in
> >runops().
> 
> So what's wrong with this?

Lots of duplicated code in register.h. We really need a stack template.

The interpreter knows the internals of the stack structure and is
responsible for managing it. To change the stack implementation, we'll
have to carefully examine all the interpreter code. The semantics of
the stack aren't real clear either. For example, when changing the
stack code what invariants must be preserved? If the stack had an API
the interpreter used I think this would be much clearer.

runops() is hard-wired with assumptions on the ops. What if we want
to generate a switch-based runops()? How much code is going to make
the same assumption that runops() does now? runops() should be generated
similarly to the opcode table. There should be an API for calling ops
so we don't have to jump through the same hoops required for perl 5.

> Compiler type checking isn't going to work for us at this level--we'd need
> runtime checking for it to work properly.

I'm talking about the interface between Parrot and user code -- the
XS layer in Perl 5. Sometimes compile-time checking is possible. We
should try to take as much advantage of those situations as possible.
For example, in Win32 we *know* that certain things are 16 bit Unicode.
If we try to build a utf8 parrot_string with them, the compiler should
barf.

> String encodings will be dynamically loadable. I'm going to add
> documentation for those ops in soon. (I was hoping today or tomorrow, but
> things are kinda screwy here at the moment)

Simon just said the opposite. Does this mean that the C and Unicode
encodings will be built-in and everything else dynamically loadable?

- Ken



Re: PDD 6: Parrot Assembly Language

2001-09-11 Thread Ken Fox

"Bryan C. Warnock" wrote:
> On Monday 10 September 2001 09:30 pm, Dan Sugalski wrote:
> > gotos into scopes might not be allowed.
> 
> That's how it currently is for most scopes, and it certainly saves a
> whole lot of trouble and inconsistencies.

I'm not sure I understand what you mean. Perl 5 allows us to enter
and leave scopes with goto:

  sub foo {
my($x, @y) = @_;

if ($x) {
  goto START;
}

while (@y) {
  $x = shift @y;
  START:
  print "$x\n";
}
  }

And of course the keywords last, next and redo are just restricted
gotos. Those are able to leave one or more scopes.

It isn't clear to me that explicit ops are the way to go
for scope management -- that seems to shift the burden of scope
management onto the compiler. A declarative interface between
the compiler and interpreter might be best. The compiler will
probably need that for communicating debugging info anyways.

- Ken



Re: String API

2001-09-11 Thread Ken Fox

Simon Cozens wrote:
> On Mon, Sep 10, 2001 at 08:38:43PM -0400, Ken Fox wrote:
> > Have you guys seen Topaz?
> 
> I may have heard of it, yes.

That's it? You're rejecting all of that work without
learning anything from it? Building strings on buffers
looked like a really good idea.

In general I think Parrot is missing critical abstractions.
The string/buffer relation is one. Others are the use of
stacks in the interpreter and the dispatch of opcodes in
runops(). This code is going to be very difficult to work
with because assumptions of the data structures are made
in lots of different places.

IMHO this is the main reason why Perl 5 is difficult to
understand -- and Parrot is repeating it.

> > The other major suggestion I have is to avoid "void *"
> > interfaces.
> 
> I'm using void * to avoid char *. :)
> ...
> Look at the code for string_make. If you're already
> passing *in* a UTF32 buffer, why do we need any special
> processing for UTF32?

My point is that a "void *" API turns off all compiler
type checking -- it's unsafe as a public API. I have looked
at string_make() and it doesn't do any special processing
based on encoding. It *requires* that the raw bits are
compatible with the encoding.

The only reason I'm bringing this up is that the whole
intent behind handling platform encodings is to integrate
better with user environments. A really nice interface
would type check the arguments when making strings so
that a user sees a type mismatch error when compiling.
For example, on Win32, Parrot might use string_make_utf16
take only LPWSTR.

I don't know much about encodings, but I do know that
type casts make perl really difficult to understand. Casts
suck. They're the goto for data structures.

> No, that's a really bad way to do it. We can special-case
> things like UTF16 to UTF32.

Ok. Are you planning on compiling in all the encodings
or can a module dynamically load one? You might want a
slow-but-standard encoding conversion algorithm anyways.

- Ken



Re: PDD 6: Parrot Assembly Language

2001-09-10 Thread Ken Fox

Dan Sugalski wrote:
>jump FOO
> 
> doesn't change scope.
> 
>newscope scope_template_in_fixup_section
> 
> does. And
> 
>exitscope
> 
> leaves one. :)

Ok. That clears it up a little. The current scope is part of
the VM internal state and compilers need to generate state
change instructions if they use lexicals. (Actually, I hope
they only need state change instructions if they need the
scope's symbol table...)

Just to check my understanding, Perl 6 will compile

  { bar { foo } blah }

into

   newscope
   bar
   newscope
   foo
   exitscope
   blah
   exitscope

What happens with:

   goto FOO; { bar { FOO: foo } blah }

Is goto responsible for figuring out it has entered bar's
scope and setting the VM state so that the exitscopes are
properly balanced?

- Ken



Re: Parrot 0.0.1 is released.

2001-09-10 Thread Ken Fox

Jeffrey Coleman Carlyle wrote:
> Am I missing something (well, clearly I am), but are test.pasm and
> test2.pasm missing from the CVS repository?

Look in ./t

- Ken



Re: PDD 6: Parrot Assembly Language

2001-09-10 Thread Ken Fox

Dan Sugalski wrote:
> At 05:41 PM 9/10/2001 -0400, Ken Fox wrote:
> > You're expecting the current lexical scope to be carried implicitly
> > via the PC?
> 
> No, it'll be in the interpreter struct.

But how does the interpreter know where a lexical scope begins
and ends in the bytecode? For example, a "jump FOO" might change
scopes. How is the scope discovered?

> >This only lets us use multi-methods where arg0 is an object. Is
> >this sufficient for implementing Perl 6?
> 
> Ummm... how on earth do you plan on calling a method of something that's
> not an object? Methods sorta require them... :)

sub add(int x, int y) : multi { ... }
sub add(int x, num y) : multi { ... }
sub add(num x, num y) : multi { ... }

That makes sense to me. My syntax is probably wrong, but the intention
is clear. I'd be surprised if Damian doesn't want this.

- Ken



Re: Patch to assembler/disassembler + parrot asm inconsistancies

2001-09-10 Thread Ken Fox

"Bryan C. Warnock" wrote:
> On Monday 10 September 2001 06:23 pm, Dan Sugalski wrote:
> > When we run out, we repeat the innermost type.
> 
> Why are you doing right-to-left instead of left-to-right?

Because it would be harder to repeat the innermost type then? ;)
Most binary ops will take identical inner args. This is just
a readability optimization to avoid things like "_i_i".

Avoiding type extensions completely seems best if the compiler
and disassembler are smart enough. I'm not sure we'll be able
to do that though -- more complex addressing modes may lose
type info. (But that assumes someday we get more complex
addressing modes... ;)

- Ken



Re: String API

2001-09-10 Thread Ken Fox

Simon Cozens wrote:
> =head1 The Parrot String API

Have you guys seen Topaz? One of many things I think Chip
did right was to build strings from a low-level buffer
concept. This moves memory management (and possibly raw-io)
out of the string class and into the buffer class.

The other major suggestion I have is to avoid "void *"
interfaces. Do we really need a string_make() that takes
the encoding enum? Aren't encodings different enough that
we'd want string_make_ascii(), string_make_utf32(), ...
Maybe I've been doing C++ too long, but taking advantage
of compiler type checks as much as possible seems like an
excellent goal -- especially since the various string_*
functions are going to be visible in user code.

The use of an encoding enum seems a little weird, but once
you explain why it will probably make sense. Right now the
only thing it seems good for is the transcoding system --
everything else is slower and vtables are more cumbersome
because you have to manage a vtable[num_encodings] array.

I'd make a string's encoding a direct pointer to its'
vtable struct. The transcoding algorithm seems like it
could be implemented using a string iterator on the source
with a character-by-character append to the destination.
We would need an abstract character object, but that
seems necessary anyway. (Perl doesn't have characters,
but does Perl 6 or Python?) The destination can be
pre-extended to the length of the source to avoid
fragmenting memory. How many encoding specific
optimizations are there? Is it worth having a table of
custom transcoders instead of using a generic algorithm?

- Ken



PMC vtable vs other overloading techniques

2001-09-10 Thread Ken Fox

Simon Cozens wrote:
> FWIW, it's just dawned on me that if we want all of these things to be
> overloadable by PMCs, they need to have vtable entries. The PMC vtable
> is going to be considerably bigger than we anticipated.

Surely you don't expect the PMC vtable to be the only mechanism
for overloading?!

Perl 6 has the option of:

* emiting Perl 6 specific opcodes
* over-riding built-in opcodes in a lexical scope
* generating method calls instead of ops
* replacing the dispatcher in a lexical scope

We've talked about all of those options. Have you eliminated some
of them?

One of the things that confuses me is how compiler writers will
decide to use one code generation technique over another. For example,
if the Perl back-end generates a method call for sin() and Python
generates a custom op, the two won't interoperate. IMHO this needs to
be nailed down soon or we'll end up with a messy architecture.

Just because Perl is TMTOWTDI doesn't mean Parrot should be.

- Ken



Re: PDD 6: Parrot Assembly Language

2001-09-10 Thread Ken Fox

Dan Sugalski wrote:
> =item if tx, X, Y

What's the purpose of providing Y? Does it make anything easier
allowing Y != 0?

> =item jump tx

I expected a "call" op too. Not a named sub call, but just "call
tx" where tx has the same semantics as in "jump".

A "return" op is needed too.

> =item iton Nx, Iy
> =item ntoi Ix, Ny
> =item tostring Sx, ty, Iz

Are these good names? They aren't very regular IMHO. Why is
integer abbreviated "i" and string left spelled out? Why does
tostring have three operands and the other two?

> =item inc tx, nn *
> =item dec tx, nn *

> Decrement register x by nn. nn is an integer constant. If nn is
> omitted, decrement by 1.

Variable length ops don't sound so cool to me. Why not use
add/sub to inc/dec by something other than 1?

Also, it would be nice if inc/dec *always* kept Parrot semantics
instead of automagically transmogrifying to Perl inc/dec. Is
this the plan?

> =head2 Register and stack ops

I'm a little confused why we can only push/pop entire register
frames. Won't this make function value returns via registers
impossible? IMHO, instead of push/pop a rotate command
would be nice. That would allow SPARC-like register usage which
has always seemed elegant to me.

> =item warp [string]
> 
> Reset the current register stacks to the state they were in when the
> warp was set. Resets only the frame pointers, doesn't guarantee the
> contents of the registers. Be I careful modifying the frame
> pointers by, for example, pushing register frames.

I don't understand this explanation. It sounds like you are setting
up co-routines or maybe resuming a continuation. Shouldn't the ops
be a little safer? IMHO we don't want a co-routine construction set,
we just want co-routines.

> =item find_lex Px, sy
> 
> Find the lexical of name sy and store the PMC pointer in register Px.

You're expecting the current lexical scope to be carried implicitly
via the PC? Seems like find_lex should really be implemented as a
vtable method on a compiler's scope object.

> =item find_method Px, Py, tz
> =item call_method Px, ty

Multi-methods are notably absent. I'm assuming that Parrot does not
understand multi-methods and requires the object (or the compiler
generating the object) to register a multi-method dispatcher as the
single-dispatch method known to Parrot.

This only lets us use multi-methods where arg0 is an object. Is
this sufficient for implementing Perl 6?

- Ken



Re: pads and lexicals

2001-09-06 Thread Ken Fox

Dan Sugalski wrote:
> > Dan Sugalski <[EMAIL PROTECTED]> wrote:
> > > Where do they come from? Leave a plate of milk and cookies on your back
> > > porch and the Temp PMC Gnomes will bring them. :)

> Bad Dan! No cookie for me.

You aren't fooling anybody anymore... You might just as well stop the
charade and write "Dan "The Temp PMC Gnome" Sugalski" in your sig. ;)

At least we know where temps *really* come from now...

- Ken



Re: An overview of the Parrot interpreter

2001-09-06 Thread Ken Fox

Paolo Molaro wrote:
> If anyone has any
> evidence that coding a stack-based virtual machine or a register one
> provides for better instructions scheduling in the dispatch code,
> please step forward.

I think we're going to have some evidence in a few weeks. I'm not
sure which side the evidence is going to support though... ;)

Eric Raymond posted on python-dev that he's doubtful, but has
a "wait and see" approach. Seems sensible. I think even Dan would
say he's just hopeful, not commited.

> I believe that a stack-based machine will have roughly the same
> performance when interpreted as a register-based machine, but
> it easily allows to take a step further and JIT compile the bytecode
> to machine code.

I don't really see much difference. You don't have to map the VM
registers to hardware registers to pick up a lot of speed. If the
JIT wanted to, it could have an optional peep-hole optimization
pass. That actually sounds easier to do with a register-based VM
than a stack-based one. (Of course the run-time environment is a
big wild-card here. We need a fast interface between the dispatcher
and the run-time. That's going to want registers too.)

> With the difference that the registers are malloc()ed while the eval
> stack in a stack machine is in the actual cpu stack.

Is there really a difference in memory access between the heap
and the stack? I've alway thought a page is a page and it doesn't
matter where in memory the page is. I'm not a hardware guy though...

Allocating register pools on the heap (making sure that malloc()
is used sensibly), might be faster if you want your VM to handle
continuations and co-routines. Check out stackless Python for
a good example. I'm not sure if Appel was the first, but he has
written quite a bit about the advantages of allocating activation
records on the heap. (He also points out that a garbage collector
can make heap allocation as fast as stack allocation.)

- Ken



Re: An overview of the Parrot interpreter

2001-09-06 Thread Ken Fox

Simon Cozens wrote:
> I want to get on with writing all the other documents like this one, but
> I don't want the questions raised in this thread to go undocumented and
> unanswered. I would *love* it if someone could volunteer to send me a patch
> to the original document tightening it up in the light of this thread.

Sure. I can do that while *waiting patiently* for Parrot to be
released. ;)

- Ken



Re: pads and lexicals

2001-09-06 Thread Ken Fox

Dave Mitchell wrote:
> So how does that all work then? What does the parrot assembler for
> 
>   foo($x+1, $x+2, , $x+65)

The arg list will be on the stack. Parrot just allocates new PMCs and
pushes the PMC on the stack.

I assume it will look something like

  new_pmc pmc_register[0]
  add pmc_register[0], $x, 1
  push pmc_register[0]

  new_pmc pmc_register[0]
  add pmc_register[0], $x, 2
  push pmc_register[0]

  ...

  call foo, 65

It would be nice if we knew the lifetime of those temps so that
we could optimize the allocation. In Perl 5, closures don't capture
@_ -- I hope Perl 6 won't capture them either. So the only thing
we need to worry about is code taking a reference to @_. That
should be something the compiler can catch.

Hmm. It didn't occur to me that raw values might go on the call
stack. Is the call stack going to store PMCs only? That would
simplify things a lot.

- Ken



Re: pads and lexicals

2001-09-06 Thread Ken Fox

Dave Mitchell wrote:
> The Perl equivalent $a = $a + $a*$b requires a
> temporary PMC to store the intermediate result ($a*$b). I'm asking
> where this tmp PMC comes from.

The PMC will stashed in a register. The PMC's value will be
stored either on the heap or in a special memory pool reserved
for temps. (I'm guessing we won't have a real generational
garbage collector, but we will be able to know when/if a temp
is destroyed at block exit.)

Dan can say for sure since he's read the code. (nudge, nudge ;).

BTW, I think we will be able to optimize this code in some
instances to use the floating point registers instead of the
PMC registers. (This is the main reason I'm totally against
run-time modification of the current scope -- essentially
we'd have to treat *everything* as a PMC and we lose all of
our optimization possibilities.)

- Ken



Re: Should MY:: be a real symbol table?

2001-09-06 Thread Ken Fox

Bart Lateur wrote:
> On Mon, 03 Sep 2001 19:29:09 -0400, Ken Fox wrote:
> > The concept isn't the same. "local" variables are globals. 
> 
> This is nonsense.
> ...
> How are globals conceptually different than, say, globally scoped
> lexicals? Your description of global variables might just as well apply
> to file scoped lexicals.

Try this example:

  use vars qw($x);

  $x = 1;

  sub foo { ++$x }

  sub bar {
local($x);

$x = 2;
foo();
print "$x\n";
  }

  bar();
  print "$x\n";

That prints:

  3
  1

If you use lexicals, you get the behavior that was *probably*
intended. "local" merely saves a copy of the variable's current
value and arranges for the variable to be restored when the
block exits. This is *fundamentally* different than lexical
variables.

It is possible to simulate the behavior of lexicals with
globals if you generate unique symbols for each instance of
each lexical. For example:

  foreach (1..10) { my $x; push @y, \$x }

could compile to something like:

  foreach (1..10) {
local($_temp) = gensym;
$::{$_temp} = undef;
push @y, \$::{$_temp}
  }

The local used for $_temp must be guaranteed to never be
changed by any other sub. Ideally $_temp would be visible
only in the scope of the original lexical, but then $_temp
would be a lexical... ;)

- Ken



Re: pads and lexicals

2001-09-06 Thread Ken Fox

Dave Mitchell wrote:
> Hmmm, except that at the hardware level, registers can store the actual
> temporary values themselves

register struct value *hardware_registers_can_be_pointers_too;

The PMC registers act like pointer-to-struct registers. Other register
sets can hold immediate values. This is exactly the same as real
hardware having registers for floating point, addresses, words, etc.
Parrot just uses more types.

- Ken



Re: Should MY:: be a real symbol table?

2001-09-03 Thread Ken Fox

Dan Sugalski wrote:
> Good. It should. It's a scary feature, and hopefully that same fear will
> strike anyone else who uses it

But all it takes is one fool^Wbrave person to mess up somebody
else's perfectly working code. Basically I think this throws us
back to the bad old days when we only had "local".

Confining it to compile time would allow people to define
custom pragmas and import lexicals. Nat would be happy. Who would
be unhappy?

> Besides, I'm not the guy to talk to about restricting this. Take it up with
> the language guys. :)

I haven't seen an Apocalypse mention anything about it, so I'm
hoping this is just a misunderstanding. ;)

- Ken



Re: Should MY:: be a real symbol table?

2001-09-03 Thread Ken Fox

Dan Sugalski wrote:
> Oh, it gets better. Imagine injecting a lexically scoped sub into the
> caller's lexical scope. Overriding one that's already there. (Either
> because it was global, or because it was lexically defined at the same or
> higher level)
> 
> Needless to say, this makes the optimizer's job... interesting. On the
> other hand, it does allow for some really powerful things to be done by
> code at runtime.

Frankly this scares the hell out of me. I like my safe little
world of predictable lexical variables. That's the main reason I
use them. (So I'm boring. I use strict too.)

IMHO it's a bad idea to allow run-time re-definition of the
dynamic caller's scope. I see value in allowing *compile* time
changes to the current lexical scope. For example:

  sub f {
 use Database::Connection qw($db);
 ...
  }

I see the value in creating a lexical variable named "$db"
in the current scope. (By "current" I mean the scope the BEGIN
block appears in, not the BEGIN block itself.) Since this is
done at compile time, the "use" kicks off a warning telling
me that magical stuff is going to happen. (Not a literal
warning -- just a warning when I read the code.) I even like
this as an alternative to using macros.

However, contrast that with this example:

  $next = 0; # this just shuts up the compiler. I don't
 # know why. Maybe they'll fix it in Perl 7.
  sub iter {
 begin;
 while ($next->());
 end;
  }

  sub begin {
 upvar("$next") = sub { ...; upvar("$next") = sub { ... } };
  }

  sub end {
 ...
  }

Replace upvar with whatever syntax you want. You still have
a complete mess. Where did "$next" come from? Why did the code
stop working because I put the "begin" statement in a block?

This isn't power. This is TMTOWTDI gone mad. (Ok, ok. Yeah,
that's a little harsh. But everybody has their limits, right? ;)

- Ken



Re: Should MY:: be a real symbol table?

2001-09-03 Thread Ken Fox

"Bryan C. Warnock" wrote:
> Except that Perl 6 will have the ability to inject lexical variables in its
> scope, and in any dynamic parent's scope.  (It isn't clear whether that is
> write-only access or not - which it probably should be for lexicals.)
> 
> That, invariably, forces at least some run-time lookup by name, since the
> lexicals aren't there at compile time for the early resolution.

Please, please, please say that this is not like Tcl's upvar/uplevel
stuff.

I was imagining the "injection" to happen *only* at compile time
when a "use" statement replaced the scope management object the
compiler talks to.

I *really* don't like the idea of a called sub modifying its' parent's
scope at run time. This fundamentally breaks the feature of understanding
code just by reading it. When I see a lexical, I want to trust the
definition.

- Ken



Re: Should MY:: be a real symbol table?

2001-09-03 Thread Ken Fox

Dan Sugalski wrote:
> First, of course, runtime and compiletime are mixed on perl. String eval
> has to go walk back up the pads *at runtime* and resolve the variable names.

Sure, if you use eval, the symbol table for the current scope
needs to be available. There's no reason to have more than
one though -- all instances of an activation record share
the same symbol table.

For the common case when we aren't debugging and a sub doesn't
use string eval, it would be nice to use less memory and drop
the scope's symbol table.

> Second, Larry's decreed you'll be able to look up lexicals by name using
> the MY hash. And look up at outer levels. How do you plan to look up
> variables by name when you're peering outside your compilation unit? With a
> key that can be resolved only at runtime?

I've searched for the definition of %MY, but all I can find is
a reference to a pseudo class MY. I don't read those as being the
same thing. It seems like "pseudo class MY" is intended as a
compiler extension API. The %MY hash could sort of do that (if
we wrapped all the access with BEGIN blocks), but it doesn't
feel very consistent with lexical variables.

Is %MY (with attributes presumably) really going to be the API
to the symbol table? Why don't we just have *one* API to the
symbol table and use attributes to differentiate between dynamic
and lexical scoping?

> >That's a huge difference over emulating "my" with "temp" like
> >what was originally proposed!
> 
> That, oddly enough, is doable with enough compiler work. A silly thing, but
> doable. It's irrelevant to the question as it had evolved, namely "what's
> the difference between a stash and a pad?" We could, after all, associate a
> new stash with each level of lexical scope if we were so inclined. Or make
> stashes and pads identical under the hood and just reference them differently.

These things behave totally different in a closure though.
A "temp" variable must have its' global binding restored when
returning to the caller. But that leaves closures referencing
the wrong variable.

Anyways, since you say "oddly enough" and "silly thing", I suspect
that you're aren't doing it this way. ;)

- Ken



Re: Should MY:: be a real symbol table?

2001-09-03 Thread Ken Fox

Dan Sugalski wrote:
> At 05:44 PM 9/3/2001 -0400, Ken Fox wrote:
> > Lexicals are fundamentally different from Perl's package (dynamically
> > scoped) variables.
> 
> No, actually, they're not.

How can you possibly think that lexical scoping and dynamic scoping
are not fundamentally different?!

> > Even if you could somehow hack package variables
> > to implement lexicals, it would waste space duplicating a symbol table
> > for each instance of a lexical scope.

> The big difference between lexical variables and package variables is that
> lexicals are looked up by stash offset and package variables are looked up
> by name.

Right. So how is a hash table smaller than an array?

> The real question, as I see it, is "Should we look lexicals up by name?"
> And the answer is Yes. Larry's decreed it, and it makes sense. (I'm
> half-tempted to hack up something to let it be done in perl 5--wouldn't
> take much work)

Where is the sense in this? Certainly the compiler will look things
up by name, but the run-time doesn't need to.

> The less real question, "Should pads be hashes or arrays", can be answered
> by "whichever is ultimately cheaper". My bet is we'll probably keep the
> array structure with embedded names, and do a linear search for those rare
> times you're actually looking by name.

That doesn't sound like we're looking up by name at all... It
sounds like the compiler is emiting frame pointer offsets, but
there's a pointer to the symbol table stored in the frame just
in case something wants to see the names.

That's a huge difference over emulating "my" with "temp" like
what was originally proposed!

- Ken



Re: Should MY:: be a real symbol table?

2001-09-03 Thread Ken Fox

Brent Dax wrote:
> Ken Fox:
> # Lexicals are fundamentally different from Perl's package (dynamically
> # scoped) variables.
> 
> *How* are they "fundamentally different"?

Perl's "local" variables are dynamically scoped. This means that
they are *globally visible* -- you never know where the actual
variable you're using came from. If you set a "local" variable,
all the subroutines you call see *your* definition.

Perl's "my" variables are lexically scoped. This means that they
are *not* globally visible. Lexicals can only be seen in the scope
they are introduced and they do not get used by subroutines you
call. This is safer and a bit easier to use because you can tell
what code does just by reading it.

> But in this case the pad is actually a full symbol table.  The
> concept is the same, the data structure is different.

The concept isn't the same. "local" variables are globals. You
only have *one* of them no matter how many times you call a sub.
For example, threading or recursion might cause the same sub to
have several copies "running" at the same time. With global
variables (aka "local" variables) all the copies share the same
global variables. With "my" variables each copy of the sub gets
a brand new set of variables. This is known as an activation
record. THIS IS COMPLETELY UNRELATED TO A SYMBOL TABLE!

In both cases you have one symbol table. However, in
the case of "my" variables you have *many* values for each
variable. The particular value being used depends upon which
activation record is being used.

> There *is* run-time lookup in some contexts, such as a string eval.

String eval isn't a run-time lookup. The code is *compiled* and
then run. Also notice that string eval can't change the current
lexical scope. It can create a new inner scope, but it can't
introduce variables into the outer scope.

Basically anything that "breaks" scoping barriers goes against
the grain of lexical scoping. If an inner scope can modify its'
parent, you've just destroyed one of the major advantages of
lexical scoping.

We tolerate symbol table globs with "local" variables because
we've already admitted there's no hope of understanding what
something does just by reading the code. We haven't corrupted
"my" yet -- and I don't want to start!

> In the end, all I'm doing is suggesting an alternate implementation
> which should reduce our workload and make many concepts which currently
> don't work with lexicals work correctly.

Your proposal to use "temp" with flags to implement "my" doesn't
even work, let alone achieve either of your goals.

- Ken



Re: An overview of the Parrot interpreter

2001-09-03 Thread Ken Fox

Thanks for the info. If you guys maintain this level of documentation
as the code develops, Perl 6 will be easy for first-timers to work on.
One goal down, N-1 to go... ;)

Simon Cozens wrote:
> To be more specific about the software CPU, it will contain a large
> number of registers.

The register frames store values, not pointers to values? If
there's another level of indirection with registers, I'm not sure
what the advantage is over just pointing into the heap. Also, heap
based activation records might be faster and more compact than
register files.

> As in Perl, Parrot ops will return the pointer to the next operation in
> the bytecode stream. Although ops will have a predetermined number and
> size of arguments, it's cheaper to have the individual ops skip over
> their arguments returning the next operation, rather than looking up in
> a table the number of bytes to skip over for a given opcode.

This seems to limit the implementation possibilities a lot. Won't a
TIL use direct goto's instead of returning the next op address?

I'd like to see a description of *just* the opcode stream and have a
second section describe the protocol for implementing the ops.
Won't we have separate implementations of the opcode interpreter
that are optimized for certain machines? (I'd at least like to see
that possibility!)

> =head1 Vtables
> 
> The way we achieve this abstraction is to assign to each PMC a set of
> function pointers that determine how it ought to behave when asked to do

Damian's proposals for multi-methods have got me thinking there
should be one very fast implementation of multi-method dispatching
used at the opcode level. It might help solve some of our math and
string problems dealing with mixed operand types.

- Ken



Re: An overview of the Parrot interpreter

2001-09-03 Thread Ken Fox

Dan Sugalski wrote:
> For those of you worrying that parrot will be *just* low-level ops,
> don't. There will be medium and high level ops in the set as well.

I was going to cite ,
but you guys have already read that, eh? ;)

One thing I was expecting was "bytecode subroutines are given opcodes."
Will there be a difference? Can we replace "built-in" opcodes with Perl
subs?

- Ken



Re: Should MY:: be a real symbol table?

2001-09-03 Thread Ken Fox

Brent Dax wrote:
> What I'm suggesting is that, instead of the padlist's AV containing
> arrays, it should contain stashes, otherwise indistinguishable from
> the ones used for global variables.

Lexicals are fundamentally different from Perl's package (dynamically
scoped) variables. Even if you could somehow hack package variables
to implement lexicals, it would waste space duplicating a symbol table
for each instance of a lexical scope.

> The simple way to emulate this is to make sure that no subroutine
> can see another's MY:: stash.

Right. Sounds a lot like a pad to me -- each instance of a scope (sub)
gets its' own copy of the variables. (By instance I mean activation
record, not the symbol table definition.)

> There is a possible caveat with inner blocks--how does an outer block
> get, er, blocked from accessing an inner block's my() variables?
> However, I think this isn't really that big a problem, and can easily be
> solved with properties:

You don't understand lexically scoped variables. There isn't
any run-time name lookup on a variable -- the compiler resolves all
access to a specific memory location (or offset). All your fancy
symbol table flag twiddling is slow, unreliable and unnecessary.

- Ken



Re: Should MY:: be a real symbol table?

2001-09-02 Thread Ken Fox

Brent Dax wrote:
> Perl, however, is such a dynamic language you can't do a lot of that
> resolution at compile-time.  What happens when I say:
> $x="foo";
> $::{"x"}=\"bar";
> print $x;

Just so we're clear, $x is a global variable -- there's only
one $x and it never goes out of scope.

This is a little different from the general case of working
with a symbol table containing lexicals defined in inner scopes.

> # From: [EMAIL PROTECTED]
> # You can't store lexical variable definitions in a global symbol
> # table. What happens when you try to access a local variable that
> # doesn't exist because it isn't in scope? Also, many of the scopes
> # that contain variables aren't named. How would you differentiate
> # between the following two "$x" variables?
> #
> #   sub f() { if (...) { my $x; ... } else { my $x; ... } }
> 
> That's not terribly hard--the two ${x}es are in unrelated scopes.

Ok, great. Please answer some questions on this "easy" case
before we move on to the harder cases:

1. What are the "full" names of the two $x variables?

2. What happens if you try to access them when they
don't exist because the scope hasn't been entered?

3. If you are able to modify the definition of $x,
does the change affect the current scope only or the
current scope and all future instances of the current
scope? (Or something else entirely?)

> # If you want to grab a lexical, you *must* be in scope. If you're
> # already in scope there's no reason for "MY::" -- just use the
> # variable's real name.
> 
> But we've promised to support %MY::.

Here's Larry's intention from Apocalypse 2:

| In Perl 5, lexical scopes are unnamed and unnameable. In Perl 6,
| the current lexical scope will have a name that is visible within
| the lexical scope as the pseudo class MY, so that such a scope
| can, if it so chooses, delegate management of its lexical scope
| to some other module at compile time. In normal terms, that means

Note: no promises about run time!

| that when you use a module, you can let it import things
| lexically as well as packagely.

Where has it been determined that %MY:: will even exist, let
alone work the way you want it to? I don't even see where the
current scope will be accessible from Perl -- it might be an
internal implementation feature only.

> # It sounds like you just want a simple way of getting the variables
> # in scope -- are you writing a debugger? I agree with you that it would
> # be very nice to have a low-level debugging API that doesn't treat
> # lexicals differently than globals.
> 
> our($x)="global";
> {
> my($x)="local";
>   print join ' ', $x, ${"x"};
> }

That looks suspiciously like an ad hoc debugger to me... ;)

- Ken



Re: Deoptimizations

2001-09-02 Thread Ken Fox

"Bryan C. Warnock" wrote:
> No.  We're optimizing on what we know at the time.

Um, it looked like you were generating a graph with all possible
optimizations inserted. At what point do you commit to an optimization?
If you only commit at run-time, you need to do a *lot* of state
management. It seems to me that the over-head will be too high.

> I think the only way your going to be able to detect dynamic redefinitions
> is dynamically.  :-)

Not really. Does the program include string eval? That throws out
optimizations like folding and dead code elimination that use *any*
global. If the program re-defines a single sub using an assignment to
a symbol table slot, then only that sub limits optimization. The
optimizer starts with a set of assumptions and alters those assumptions
as it sees more of the program.

> I don't agree with eliminating re-definitions of compiled code (presumable
> "pre-compiled modules", since it's all compiled) - it's a distribution
> nightmare

It has nothing to do with distribution. It has to do with programming
style. Certain programming styles are not compatible with optimization.
All I'm saying is that the optimizer might prevent you from doing
certain things. If you want to do those things, then don't use the
optimizer.

BTW, some techniques, like memoizing, are always possible. Since the
compiler can re-build the cache when it re-defines a memoized function
the state management cost is only paid at compile time. Other things,
like dead-code elimination, are not possible because you must commit
to the optimization. How do you de-optimize eliminated code?

Sometimes it pays to be lazy. The trick is knowning when.

- Ken



Re: Deoptimizations

2001-09-02 Thread Ken Fox

Wizard wrote:
> Johan Vromans wrote:
> > 'use constant FOO => "foo"' could add some magic to never let FOO
> > being redefined (not a bad coice for a constant).
> 
> I like this idea best (for now). Perhaps 'constant sub foo' or 'sub
> foo:constant'?

I wouldn't consider this an optimization. Perl is just doing what
you tell it -- the programmer is doing the optimization.

The idea of inserting "optimization ops" seems silly too. It would
be faster to just replace the definition of the sub whenever the
sub changes. Perl 6 subs are just as fast as ops.

The optimizer can't optimize what it doesn't know. You're
asking for some sort of "Quantum::SuperPosition::Optimizer" that
stores all optimization states at once.

The only way we're going to optimize Perl is by detecting dynamic
re-definitions. We also need to eliminate re-definitions of
compiled code. If all modules are loaded as source (not bytecode or
native code), then re-definition is fine. IMHO it is too difficult
to build a compiler that records its' optimization assumptions and
attempts to verify that the semantics of a re-definition are
consistent with the previous assumptions. (Hmm, maybe the optimizer
could record unoptimized definitions? Those would be safe to
replace. Dan?)

- Ken



Re: Should MY:: be a real symbol table?

2001-09-02 Thread Ken Fox

Brent Dax wrote:
> Is there any real reason my() variables are stored in a pad instead
> of a symbol table?

Simon already told you why a pad is used.

But I think you misunderstand what a symbol table is. Variables
aren't *stored* in the symbol table. Only the definitions and storage
locations of variables are stored there. During execution of a program
symbol tables usually aren't needed -- the generated code embeds the
necessary information from the symbol table. For example, if you have

  void f() { int x; x = 1; ... }

then "x" in the symbol table may be defined as "4 byte integer at stack
pointer - 4". The code for getting/setting "x" doesn't use the symbol
table because it has instructions like "mov sp[-4], 1" or something.

> Once again, why isn't MY:: stored in some sort of anonymous symbol
> table?  This would allow us to do all the things we can normally do
> with a global, without the hassles of writing a magical pseudo-package.

You can't store lexical variable definitions in a global symbol
table. What happens when you try to access a local variable that
doesn't exist because it isn't in scope? Also, many of the scopes
that contain variables aren't named. How would you differentiate
between the following two "$x" variables?

  sub f() { if (...) { my $x; ... } else { my $x; ... } }

If you want to grab a lexical, you *must* be in scope. If you're
already in scope there's no reason for "MY::" -- just use the
variable's real name.

It sounds like you just want a simple way of getting the variables
in scope -- are you writing a debugger? I agree with you that it would
be very nice to have a low-level debugging API that doesn't treat
lexicals differently than globals.

- Ken



Re: Something to hash out

2001-08-29 Thread Ken Fox

Dan wrote:
> I like it. It's a race between those and Randal's .par and .rot (for the
> bytecode) extensions.

Please say the file extensions are only needed to prevent a clash of the
source file and bytecode if they're stored in the same directory...

We can say "parrot foo.whatever" and parrot will do the right thing
regardless if foo.whatever is source, bytecode or whatever else parrot
supports. Right?

- Ken



Re: PDD for the debugger API

2001-08-26 Thread Ken Fox

Uri Guttman <[EMAIL PROTECTED]> wrote:
> > "DS" == Dan Sugalski <[EMAIL PROTECTED]> writes:
>   DS> At 03:22 PM 8/18/2001 -0400, Uri Guttman wrote:
>   >> i didn't see any references to support debugging an external perl
>   >> process. ...
>
>   DS> Good point. ... listen on some port/pipe/doodad and set a flag
>   DS> to kick into the remote debugger if someone connects to it?
> 
> this is a tricky problem as we can't just grab the remote process like
> gdb can these days using /proc or something.

Why can't we do that?

The way I'd really like to kick in the debugger is to start up gdb
and just say:

  (gdb) call perl_debug_enable()
  (gdb) continue

It would be very useful to have a collection of gdb-callable
routines for integrating the Perl debugger with perl (internals)
debugging. I'd love to be able to do:

  (gdb) call perl_debug_break("my_sub")

The other thing that I would like is a way of pausing in Perl code
and waiting for a debugger to attach. This would make it much easier
to debug mod_perl for example.

Other than these two wishes, the current Perl debugger is fine.

It seems like overkill (and a security nightmare) to come up with a
special remote debugging solution. Is there some reason we're going
to need to debug perl on a platform without a real debugger? You
don't have any plans to compete with Java in the embedded device
space, do you Dan? ;)

- Ken



Re: Garbage collection (was Re: JWZ on s/Java/Perl/)

2001-02-15 Thread Ken Fox

Hong Zhang wrote:
> The memory barriers are always needed on SMP, whatever algorithm
> we are using.

I was just pointing out that barriers are an alternative to mutexes.
Ref count certainly would use mutexes instead of barriers.

> The memory barrier can be easily coded in assembly, or intrinsic
> function, such as __MB() on Alpha.

Perl ain't Java! We have to worry about XS code written in plain
old scary C. If we see some really amazing performance improvements
then I could imagine going with barriers, but I'm doubtful about
their portability and fragility.

Hmm. I just remembered the other GC technique that is very
fragile: ref counts. Maybe fragility isn't a problem after all. ;)

- Ken



Re: PDD 2: sample add()

2001-02-15 Thread Ken Fox

David Mitchell wrote:
> To get my head round PDD 2, I've just written the the outline
> for the body of the add() method for a hypophetical integer PMC class:

[... lots of complex code ...]

I think this example is a good reason to consider only having one
argument math ops. Instead of dst->add(arg1, arg2) why not just have
dst->add(arg)? Then the PVM can generate code that does the right
thing considering the types of all values in an expression.

It doesn't affect the ability to overload at all -- we just move
overloading to an earlier stage of compilation, i.e. before we emit
PVM instructions.

Examples:

  Perl code:  $x = 1 + 2
  Parse tree: op_assign($x, op_add(1, 2))
  PVM code:   $x = 1; $x += 2

  Perl code:  $x = 1 + 2 + 3
  Parse tree: op_assign($x, op_add(op_add(1, 2), 3))
  PVM code:   new $t; $t = 1; $t += 2; $x = $t; $t += 3

It will be more work for the optimizer, but I think it will produce
much more understandable PMC objects.

- Ken



Re: Please shoot down this GC idea...

2001-02-15 Thread Ken Fox

Damien Neil wrote:
DN> {
DN>my $fh = IO::File->new("file");
DN>do_stuff($fh);
DN> }
DN>
DN> sub do_stuff { ... }

Simon Cozens wrote:
SC> No, it can't, but it can certainly put a *test* for not having
SC> references there.

Dan Sugalski wrote:
DS> Yes it can tell, actually--we do have the full bytecode to the sub
DS> available to us ...

Dataflow can tell you a lot, but the garbage collector can provide
info too. An object can never point to an object younger than itself.
If the stack is the youngest generation, then whenever something on the
stack gets stored in an older object the stack object ages.

If we still have a young $fh when do_stuff() returns, then the object is
safe to collect as long as we know that the scope owning $fh isn't
returning it. We don't need dataflow for the functions we call; we just
need dataflow for the current scope. (We also have to run a normal
traversing collection on the stack -- it isn't good enough to just
$fh->DESTROY because $fh might be pointed to by another stack object.)

By the way, this is also a way to make finalizers useful most of the
time. We can collect (which means finalize) portions of the youngest
generation at the end of every scope. The only time you'd get hit with
a non-deterministic finalizer is if you ever saved the object in an
old generation.

By the way, a lot of people are confusing a PMC object with a Blessed
Perl Object. To the perl internals everything is an object with a vtbl.
Only some of those objects will be Blessed Perl Objects.

- Ken



Re: Garbage collection (was Re: JWZ on s/Java/Perl/)

2001-02-15 Thread Ken Fox

Alan Burlison wrote:
> I think you'll find that both GC *and* reference counting scheme will
> require the heay use of mutexes in a MT program.

There are several concurrent GC algorithms that don't use
mutexes -- but they usually depend on read or write barriers
which may be really hard for us to implement. Making them run
well always requires help from the OS memory manager and that
would hurt portability. (If we don't have OS support it means
auditing everybody's XS code to make sure they use wrappers
with barrier checks on all writes. Yuck.)

- Ken



Re: Thought for the day

2001-01-31 Thread Ken Fox

I like Linus' quote, but that spirit would probably push Perl too
far into the computer scientists' language traps. Here's a Frank
Lloyd Wright quote I think works a bit better:

  Five lines where three are enough is stupidity.
  Nine pounds where three are sufficient is stupidity.
  But to eliminate expressive words that intensify or
  vivify meaning in speaking or writing is not simplicity;
  nor is similar elimination in architecture simplicity--
  it too may be stupidity.

Jarkko Hietaniemi wrote:
> A true programmer is able to delete lines and still achieve the same
> functionality while simultaneously making the code shorter and simpler
> and therefore easier to understand and maintain.

There seems to be a perverse belief that to "add value" you must "add
code". I've had a lot of first-hand experience with this attitude --
often from people who should know better. One of my recent projects
took an 11,000 line gui client (Perl/Tk/Win32) and reduced it to 7,000
while porting to Unix, adding features, improving performance, etc. The
result? People don't say "wow, good work" they ask why the original
sucked.

Related to this point is the idea that software development processes
and quality methods matter more than developer capability. IMHO, given
all other things equal, better processes produce better software. The
problem is that teams are rarely equal. I think this is one reason why
Open Source ("bazaar methodology") gets much better ratings than it
should -- lots of free software is done by outstanding teams of people.
You replace those people with "industry norms" and everything falls
apart. (The one place where Open Source methods of all types are
vastly superior to everything else is their ability to educate.)

Sorry for the rant; the topic struck a nerve.

- Ken



Re: Guidelines for internals proposals and documentation

2000-11-17 Thread Ken Fox

I agree with Dan's proposals for PDDs. In particular I like the idea of
the WG chairs having decision power -- it should protect us somewhat from
design-by-committee syndrome.

However, I don't want to see early (premature) adoption of fundamental
pieces like the VM or parser. It makes sense to me to explore many possible
designs and pick and choose between them. Also, if we can keep external API
design separate from internal design I think we'll have more wiggle room
to experiment. (That's one of the big problems with perl 5 right now.)

One issue with the language RFC process that Larry and Mark-Jason Dominus
have discussed is inconsistency. (I still can't figure out whether it's a
strength or weakness.) We should do a better job of telling people about
how our PDDs fit into the bigger picture. I propose that all PDDs state
which other PDDs they require, which they supersede and which they conflict
with. This will make it a lot easier to identify coherent designs.

Dan Sugalski wrote:
> 2) We start counting from 2. (RFCs 0 and 1 are reserved, in case anyone
> beats me to them)

(I thought you said they were PDDs... ;)

Why don't you just quickly issue several PDDs on the subjects that you
want to reserve. They can be skeleton documents that just contain the
sections you want us to write. (I hope that the numbers 0 and 1 aren't
important -- they might be superseded by PDD 16 and 23...)

> 3) The format of the previous RFCs will be followed. The implementation
> section is optional, though strongly encouraged where appropriate.

I disagree. An implementation section is mandatory for a PDD. Anybody
that can't bother to take the time to sketch out a *possible* implementation
hasn't thought deeply about the subject. Anything without an implementation
section is just a draft. If a PDD is a proposal for an API, then most of
the PDD is the implementation section!

- Ken



Re: Elk - another paragon for us

2000-11-17 Thread Ken Fox

I used the Elk 2.x series extensively in several engineering
applications I wrote for Ford Motor. It was a nice system in
general, but perl's XS (and guts documentation) was much better
than Elk's. One of the biggest problems with Elk is that it
has a scanning garbage collector, not a ref count collector, and
the traversal functions for C/C++ libraries are not easy to
write! For example, I ran into problems where Elk would collect
my Motif widgets (main windows, dialogs, etc.) because I didn't
keep Scheme references to all the widgets and Elk didn't have
a traversal function for widgets. I ended up writing widget
traversal and submitting a patch. Once the code was in place it
worked, but the code was ugly -- it had to use the *internal*
widget API rather than the portable external API.

When we put scanning garbage collectors into perl, we should
provide an alternative for foreign code.

Elk 3.0 doesn't seem to have changed the API much from 2.x.
What do you think makes it a "paragon" for implementation?

- Ken



Re: Design

2000-11-05 Thread Ken Fox

John van V wrote:
> I have been predicting for a year now that perl6 will spawn another
> language which will be a marriage of perl/python/scheme/sh, hopefully
> enabling a billion or so humans in defiance of  AOL and Microsoft.

Wow! You're good! Most people were caught by surprise with the Perl 6
announcement... However, the "marriage" part of your prediction is
already *mostly* true for Perl *5*. I suspect any future Python
influence will probably be in the same vein as Lisp -- only the people
who really grok Lisp can see it.

I totally disagree with your last point though. Perl has stayed away
from Microsoft bashing. I'm not sure how many beautiful/useful ideas
have come from the Microsoft world, but at least it's nice to know we're
able to help people who use Microsoft systems.

- Ken



Re: virtual machine implementation options

2000-11-02 Thread Ken Fox

Steve Fink wrote:
> and I get the same numbers for -O (I used -O3) but different numbers
> without optimization. Maybe we should assume optimization?

The difference was probably -fomit-frame-pointer that I used in both
the -g and -O cases. Some of my code was fragile to optimization so I
wanted to make sure that it didn't depend on the frame pointer for
local variables.

- Ken



Re: virtual machine implementation options

2000-11-02 Thread Ken Fox

Jarkko Hietaniemi wrote:
> FWIW, I would like to see results for these little tests also for
> other platforms than just gcc on Intel (on Linux...) :-) I tried
> quickly hacking the test script and C source to be more portable
> but it took me more than a few minutes so I gave up...

I just tried on a Sun (dual Ultra Sparc 450 MHz running SunOS 5.6)
and gcc 2.95 compiled it without a hitch. (The TIL didn't work of
course.) BTW, only the switch and function call code is portable to
non-gcc compilers. And the TIL depends upon *both* gcc and Intel.
Anybody running gcc on Windows Intel?

Here are the times for the Sun machine:

 -g   -O
function call (original)  25.1 secs  16.4
switch w/ expanded opcodes18.6 secs   8.6
computed goto w/ expanded opcodes 15.5 secs   5.5

- Ken



Re: Design

2000-11-02 Thread Ken Fox

Nathan Torkington wrote:
>   Robustness
>   Portability
>   Maintainability
>   Testability
>   Reusability
>   Speed
>   Simplicity
>   Size

Hey, whoa. Aren't we pre-maturely optimizing the development
process? IMHO we're still in the day dreaming "what if" mode of
development. (We don't even have a language spec yet!) Now's
the time to come up with crazy new ideas that need a lot of
debate before we decide if they're useful.

Example: I'd like to see a few different species of vtbl before
we commit to an API. BTW, where's Chip? I was *really* hoping
that he'd write up his Topaz experience and clue us in on
architectural dead ends. It might be that the family tree of
Perl 6 should be:

  Perl 5 Topaz
 \/
  \  /
   Perl 6

- Ken



Re: virtual machine implementation options

2000-11-02 Thread Ken Fox

Steve Fink wrote:
> I just did all that because the magnitude of the difference between
> using separate functions and using the closer-to-assembly solutions
> (switch and goto) seemed too large. I can easily believe they're faster,
> but not by that much, and I didn't want to unnecessarily lose the
> flexibility, readability, and compilability of keeping everything
> cleanly separated out into separate functions.

My original purpose was just to play with dispatchers -- we really
won't know what's the best solution until more of Perl 6 starts to
evolve. The dispatcher is probably a very tiny piece of the performance
picture anyways, so I don't think it matters (much) that computed goto
is 10x faster than function call.

The only VMs sensitive to dispatch seem to be those with teeny weeny
instructions, like Java or actual hardware devices. Since perl has
*huge* instructions it doesn't spend much time in the dispatcher.

I think the best architecture would be one with two dispatch systems.
A really fast/simple one (possibly obscure) that's used for small
instructions like array indexing, counter increments, sub calls, etc.
and another flexible/general one (possibly obscure ;) that's used
for the "big" instructions like sort and eval. Maybe we could even
force all those big instructions into the vtbl.

The different function call dispatchers that you posted pretty much
round out the simple dispatch techniques. The only other one I'd like
to see is a binary decision tree rather than a sequential scan. (The
tests could be chosen to maximize branch predictions too.) Seems hard
to believe that there's a machine out there that could execute
several test and branch instructions in the time it takes to do a
table lookup though.

Anways, now for something *completely* different...

I built a simple TIL and also expanded the opcode sets for the switch
and goto based dispatchers. Here's the timing (same conditions as before)
for -O only:

function call (original)  12.6 secs
switch w/ expanded opcodes 6.2 secs
naive TIL  5.9 secs
computed goto w/ expanded opcodes  1.4 secs

That's not a typo for "native" -- my TIL is incredibly naive. I had
a tough time getting gcc to work with my dispatcher. It does a really
good job at "dead" code elimination and other optimizations that make
it really hard to interface with. Basically I'm stuck doing normal
function calls because anything more either breaks with optimization
or severely constrains coding style.

At least the execute() function has become pretty simple now. ;) All
it does is verify the code has been compiled and then:

  asm("call *%0" : : "r" (vm_executable));

The compiler (threader?) generates machine code in vm_executable and
opcode arguments in vm_data. Here's what vm_executable looks like:

(gdb) x/11i vm_executable
0x805e1c0 :  mov$0x8054580,%eax
0x805e1c5 :call   0x80485d4 
0x805e1ca :   mov$0x8054584,%eax
0x805e1cf :   call   0x80485d4 
0x805e1d4 :   call   0x80485e8 
0x805e1d9 :   mov$0x8054588,%eax
0x805e1de :   call   0x80485d4 
0x805e1e3 :   call   0x80485e8 
0x805e1e8 :   mov$0x805458c,%eax
0x805e1ed :   call   0x80485d4 
0x805e1f2 :   call   0x80485e8 

The register assignments you see there are for the opcodes that
take arguments. When vm_executable is built, an argument array is
also built so that vm_executable can stay pure code. Here's an
example of an opcode that takes arguments:

void op_push()
{
register int *args asm("eax");
*vm_sp++ = *args;
}

The TIL uses the machine stack, but it should be very easy to
swap stacks since we have to get cozy with the hardware anwyays.
Light-weight threads of "pure" Perl would be super easy.

At this point I'm just surprised it works at all. I have no
idea if the instructions I'm cranking out are efficient. (Should
they be aligned? Is there a faster mov?) I think it should be
pretty simple to emit test and branch instructions directly, so
in practice I'd expect this TIL to perform better than it does
now.

It seems like a good idea to try and coordinate efforts with
another group doing native code generation (like blackdown.org).
If we did inlining or peephole optimization I'm sure we'd be
re-inventing stuff they've already started on.

Anybody have ideas for making this *fast*?

- Ken

P.S. Sorry for that inane add.dat file I posted. I should have
posted the 10 line script that generated it. Anyways, nobody
chewed me out for posting MIME attachments so I guess I'll keep
doing it...


/* This code is in the public domain. Ken Fox, Nov 2000 */

#define ITERS 10
#define USE_TIL

#include 
#include 

typedef union {
int i;
void *p;
unsigned char b[4];
} cell;

void load(char *filename, cell *code)
{
FILE *in

virtual machine implementation options

2000-10-31 Thread Ken Fox
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2
1 10
2



/* This code is in the public domain. Ken Fox, Oct 2000 */

#define USE_GOTO

#include 
#include 

typedef union {
int i;
void *p;
} cell;

void load(char *filename, cell *code)
{
FILE *input;
cell *pc = code;

if (input = fopen(filename, "r")) {
int i;
while (fscanf(input, " %d ", &i) != EOF) {
pc->i = i;
++pc;
}
fclose(input);
}

pc->i = 0;
}

void dump(cell *code)
{
cell *pc = code;

printf("byte code:\n");
while (pc->i) {
printf("  %5d\n", pc->i);
++pc;
}
}

enum {
STOP = 0,
PUSH = 1,
ADD = 2,
PRINT = 3
};

int count_ops(cell *code)
{
int icount = 0;

cell *pc = code;

op: switch (pc->i) {
case STOP:
return icount;

case PUSH:
++icount;
pc += 2;
goto op;

case ADD:
++icount;
pc += 1;
goto op;

ca

Re: Threaded Perl bytecode (was: Re: stackless python)

2000-10-26 Thread Ken Fox

Chaim Frenkel wrote:
> We may not even need to copy the body.

The nice thing about the "copy body and replace with stub" solution
is that it doesn't impose any costs on threaded subs that don't get
re-bound at run-time. I agree with you that there are lots of
solutions though so re-binding shouldn't be an issue regardless of
what kind of VM is used.

Anybody know why Forth doesn't use one of these solutions?

BTW, Perl 5 already "suffers" from this problem for const subs --
they can't be re-bound at run-time. Maybe a pragma could be used to
enable/disable the ability to re-bind subs. The default could
depend on whether symbol table frobbing or eval has been seen by
the compiler. (We already do that kind of stuff with $` etc.)

> (Hmm, with Q::S, it could be all of them in constant time.)

Would a Quantum::Tunneling module use zero energy? Would be cool to
have on a palmtop.

- Ken



Re: Threaded Perl bytecode (was: Re: stackless python)

2000-10-25 Thread Ken Fox

Adam Turoff wrote:
> when dealing with threaded bytecode is that the threading specifically
> eliminates the indirection in the name of speed.

Yes. Chaim was saying that for the functions that need indirection,
they could use stubs. You don't need to guess in advance which ones
need indirection because at run-time you can just copy the old code
to a new location and *write over* the old location with a "fetch pointer
and tail call" stub. All bytecode pointers stay the same -- they just
point to the stub now. The only restriction on this technique is that
the no sub body can be smaller than the indirection stub. (We could
easily make a single bytecode op that does a symbol table lookup
and tail call so I don't see any practical restrictions at all.)

- Ken



Re: Special syntax for numeric constants [Was: A tentative list of vtable functions]

2000-10-25 Thread Ken Fox

David Mitchell wrote:
> Well, I was assuming that there would be *a* numeric class in scope
> - as defined be the innermost lexical 'use foo'.

And that numeric class would remove int and num from the scope?

> I assumed that Perl wouldn't be clever enough to know about all available
> numberic types and automatically chose the best representation; rather
> that it was the programmer's responsibilty via 'use' or some other syntax.

Well "some other syntax" leaves it pretty wide open doesn't it. ;) IMHO we
should shoot for "clever enough" (aka DWIM) and fall back to putting the
burden on the programmer if it gets too hard for Perl.

> I'm not familiar with Scheme, I'm afraid.

Scheme directly implements the sets of numbers (the numeric tower):

  integer -> real -> complex

It's complicated by having multiple representations for the numbers:
small fixed point, fixed point, and "big", but the main idea is
that Scheme figures out when to shift both type and representation
automatically. Unfortunately different implementations usually choose
different portions of the numeric tower to implement.

- Ken



Re: Special syntax for numeric constants [Was: A tentativelist of vtable functions]

2000-10-25 Thread Ken Fox

Dan Sugalski wrote:
> Numeric constants will probably fall into two classes--those perl's parser
> knows about and can convert to, and those it doesn't and just treats as
> strings.

I'm really excited to see what magic Larry is going to cook up for
extending the lexer and parser. His talk made it pretty clear that he
wants to make small languages easy to build from the Perl core.

If Larry does what I'm hoping, we'll be able to extend the lexer to
recognize new number formats and not have to kludge things together with
strings. Am I reading too much into the Atlanta talk or is that your
take on it too?

- Ken



Re: stackless python

2000-10-21 Thread Ken Fox

Joshua N Pritikin wrote:
> http://www.oreillynet.com/pub/a/python/2000/10/04/stackless-intro.html

That talks about moving from the C stack to a frame stack, continuations,
co-routines and microthreads. None of it made much sense. (Choosing
continuations as the basis of the system would be very Scheme-ish, but
not very fast...)

Anybody have any good summaries about what Stackless Python really is?

- Ken



Semantic analysis [Was: compile-time taint checking and the halting problem]

2000-10-21 Thread Ken Fox

"David L. Nicol" wrote:
> So what would be involved in adding hooks for arbitrary semantic analysis?

At the most fundamental level all you need is a parse tree with attributes
that you can write constraints against. IMHO there isn't much difference
in technique between "semantic analysis" and "optimization". If you can do
one you can do the other.

The thing that concerns me is that this is a *lot* more involved than a
set of hooks. It might be that the interpreter, compiler (backend code
generator) and VM all have different analysis requirements. For example,
the VM doesn't care if something is meaningful or optimized, just that
it's safe. For taintedness analysis, I think running code through the
compiler might make the most sense. It would reduce the performance
requirements and could provide more information to the programmer.

> What language features can be described to allow for introduction of
> arbitrary SA passes?  At what levels?

Language features would be input to the parse tree attributes -- they
don't allow or disallow analysis. Analysis is part of implementation and
goes under the language -- under the parser in fact. Syntax macros
will probably do some local analysis. (Damian's currying operators will
have to do some analysis to figure out when to stop currying for example.)
It might make sense to have pluggable optimizers instead of hooks for
global analysis though. "use less memory" could load a memory optimizer.
"use safe io" could load the taint analyzer. Maybe these pragmas only
work in the compiler though?

> What's a good introductory glossary of SA terms? Things get confusing
> with many reinvented wheels rolling around.

The dragon book is popular (duh!). I also found

  The Art of Compiler Design
  Thomas Pittman and James Peters
  ISBN 0-13-048190-4

very helpful in understanding attribute grammars. Once you get through
those two,

  Advanced Compiler Design and Implementation
  Steven S. Muchnick
  ISBN 1-55860-320-4

will move you from theory to practice. It has *extensive* coverage
of optimization techniques that will be applicable to semantic
analysis too. (I'm still slogging through it myself.) As long as I'm
listing compiler books, I might as well give my introductory
favorites:

  Modern Compiler Implementation in ML
  Andrew W. Appel
  ISBN 0-521-58274-1

  Very nice introductory book that covers grammars, code generation,
  optimization, garbage collection, "static single assignment" form and
  a lot of other stuff. There are versions of this book for Java and C,
  but I think the ML is best.

  Compiler Design in C
  Allen I. Holub
  ISBN 0-13-155045-4

  An introductory book that's closer to the "use the source
  Luke" philosophy. I really like Holub's writing, so that may bias
  me. Compilers are tough enough subjects though that just reading
  somebody you're comfortable with might make the difference.

- Ken



Special syntax for numeric constants [Was: A tentative list of vtable functions]

2000-10-18 Thread Ken Fox

David Mitchell wrote:
> Now of course if we have the luxury of deciding that core perl 'knows'
> about complex numbers, then of  the parser can be made to recognise ...

The core doesn't need to know -- that was my point. All the core needs
is the basic constant folding rules _it_already_has_ combined with a
macro to define "i". When you "use complex" the macro would be folded
into the parser. The core doesn't need any special support (other than
decent macros... ;)

> In summary: Perl shouldn't do interpetation of numeric literals, but
> should instead delegate it to the numeric class currently in scope.

I agree with you. The complication is that there isn't *a* numeric class
in scope -- there are *many* numeric classes. We could model numeric
classes as a tower ala Scheme in which case a constant is given to the
most fundamental class first and then the classes higher up the tower.
The core might just implement this by throwing a compile-time exception
that can be caught by the classes higher up the tower. (I think the only
real problem here is handling large precision numbers -- everything else
can be handled with macros and constant folding.)

A more complete solution might be to allow numeric modules to extend
the parser's definition of numeric constants and allow back-tracking if
an attempted parse fails. This has more control and flexibility than
compile time exceptions but demands a lot more from the parser.

It'd be nice to hear the PDL folks jump in with their opinion of what
kinds of numeric constants are useful.

- Ken



Re: A tentative list of vtable functions

2000-10-17 Thread Ken Fox

David Mitchell wrote:
> Jarkko Hietaniemi wrote:
> > Assume I want
> >
> >   $a = 2 + 3i;
> >
> > to work...
> 
> Which I what I suggest we abandon attempts to make the parser do intellignet
> decisons on numeric liternal, and instead just grab all the characters
> that appear to make up thye string constant ...
> 
> use complex;
> my $c =  2__3;  # 2 + 3i

That's really gross. 2 + 3i is just add(integer(2), complex(0,3)) with
compile-time constant folding eliminating the add(). I would even go so
far as to say that 3i is just syntactic sugar for multiply(integer(3),sqrt(-1))
with constant folding doing the simplification. The only rules we need are
the standard ones we must have for constant folding and *one* additional
macro that says bareword "i" used after a scalar should be re-written as
"* sqrt(-1)".

- Ken



Re: A tentative list of vtable functions

2000-10-17 Thread Ken Fox

"ye, wei" wrote:
> One C++ problem I just found out is memory management.  It seems
> that it's impossible to 'new' an object from an specified memory block.
> So it's impossible to put free'd objects in memory pool and re-allocate
> them next time.

Stuff like that isn't the problem with using C++. (In fact a class can
provide its own allocator. You can even provide a global allocator if
you like to live dangerously.)

The trouble is that the object model for C++ isn't exactly the object
model (I mean at the internals level like "SV") for Perl. That means we
still have to do a lot of work to get Perl to work right (and even more
work to defeat some C++ compiler assumptions to get Perl to work fast).

Another big problem with C++ is lack of internal documentation and
object code standards. Some of Perl's dynamic module loading capability
would be complex using C++ -- and possibly impossible with code built
by different compilers.

I think the general idea is that the advantages of C++ don't move us
far enough out of our comfortable local minimum to make it worthwhile.

- Ken



Re: Perl's parser and lexer will likely be in Perl (was Re: RFC 334 (v1) I'm {STILL} trying to understand this...)

2000-10-17 Thread Ken Fox

Simon Cozens wrote:
> To be perfectly honest, my preferred solution would be to have the tokenizer,
> lexer and parser as a single, hand-crafted LR(k) monstrosity.

Those are hard to understand because so much extra work has to be done to
compensate for lack of top-down state when doing a bottom-up match. I've
played around with a yacc grammar for C++ and it's very difficult to introduce
new constructs without breaking existing ones. Since Perl is much more
difficult than C++ to parse, I suspect that your LR(k) monstrosity will be
limited to only a few gurus to play with.

The other down-side is that we'd be doing a whole lot of custom work designed
just for parsing Perl instead of creating something more general and powerful
that can be used for other problems as well. For example, I'd imagine the PDL
folks would much rather extend a recursive-descent parser with back-tracking
than an LR(k) monstrosity.

Maybe I'm just lazier than you... ;)

- Ken



Re: Perl's parser and lexer will likely be in Perl (was Re: RFC 334 (v1) I'm {STILL} trying to understand this...)

2000-10-17 Thread Ken Fox

Simon Cozens wrote:
> It's time to drag out my quote of the week:
> 
> Recursive-descent, or predictive, parsing ONLY works on grammars
> where the first terminal symbol of each subexpression provides
> enough information to choose which production to use.

Recursive-descent parsers are nice because they are *much* easier to
generate errors with. They are also much easier to generate segmented
grammars which is nice for something like Perl because there are so
many quiet shifts into several different sub-languages.

The only real problem is prediction and that is *easily* solved with
look-ahead and/or back-tracking. IMHO back-tracking is preferable,
especially if there are cut-points where the search tree can be pruned.
I think it's very powerful to think of a grammar as a declarative
program which searches for the best-fit between itself and the input
stream.

> So, while I don't doubt that, with the state of Perl's regexes these
> days, it's possible to create something with enough sentience to
> tokenize Perl, I've really got to wonder whether it's sane.

I think the goal would be to increase the power of the regexes to
handle Perl grammar. This could be the coolest language tool since
yacc. (I'm intentionally not comparing Perl's regex to lex. We shouldn't
make the same stupid mistake as lex/yacc by splitting a language into
a token specification and a grammar with incompatible syntax.)

- Ken



Re: RFC 326 (v1) Symbols, symbols everywhere

2000-09-27 Thread Ken Fox

Dave Storrs wrote:
> It isn't terribly clear to me either

Well, he does give a couple references that would clear it up.
X11 Atoms are well documented.

> saying is that you can qs() a method name, get a "thingie" out, store the
> thingine in a scalar, and then that scalar becomes a direct portal to the
> method...somewhat like a coderef, but without a required deref.

Actually it's more trivial than that. When you "intern" a symbol, you're
adding a string-to-integer mapping to the compiler's symbol table. Whenever
the compiler sees the string, it replaces it with the corresponding
integer. (The type stays "symbol" though; I'm sort of mixing implementation
and semantics.) Think of it like a compile-time hash for barewords.

Dan was right to think of this as a C enum equivalent. The only real
differences being that you don't have a chance to define the integer
mapping and that the printable identity of the symbol is remembered by
the run-time.

I'm not sure I'm as enthusiastic about symbols speeding things up as
the RFC author though. I guess it speeds up hash lookups and stuff, but
that could be memoized by the compiler anyways.

- Ken



Re: Perl Implementation Language

2000-09-13 Thread Ken Fox

Simon Cozens wrote:
> On Tue, Sep 12, 2000 at 03:17:47PM -0400, Ken Fox wrote:
> > That's fine for the VM and the support libraries, but I'd *really* like
> > to see the parser/front-end in Perl. There are dozens of RFCs that require
> > some non-trivial extensions to the parser. It would be nice to code these
> > in Perl
> 
> Are there any better reasons than "It would be nice?"

The dogfood theory? One of the design goals for Perl is to make text
munging easy. Parsing falls into that category and therefore we should
use it, i.e. eat our own dogfood.

There are other reasons though.

Syntax extensions for Perl should be more naturally written in Perl because
one of the prototyping tools is to design equivalent implementations of a
new feature in Perl. Damian has taken this to a new extreme with his switch
statement.

A Perl programmer who needs a syntax extension may only use Perl, so mandating
any other language is an additional constraint. It may limit the number of
people able to work on Perl.

Parsing Perl in Perl may provide new insight on how to improve the speed
and expressiveness of perl.

Perl parsing code may not suffer from the computer-science-itus of ideas
like "maximal munch" or context free grammar restrictions. The community
is rich and interesting and may produce something better than (or at least
different from) what's been done before.

- Ken



Re: A tentative list of vtable functions

2000-09-13 Thread Ken Fox

Nick Ing-Simmons wrote:
> Ken Fox <[EMAIL PROTECTED]> writes:
> >Dan Sugalski wrote:
> >> For something like:
> >>
> >>@foo = @bar || @baz;
> >>
> >> I have no problem with the call sequence looking like (pseudo-codish here):
> >>
> >> set_context(ARRAY, ASSIGN);
> >> foo->store(bar->log_or(bar, baz));
> >
> >But log_or must short circuit --
> 
> And what above suggests it does not?
> It is up to bar's log_or not to evaluate baz if bar is considered true.

Dan suggested that log_or didn't have to short circuit. I guess I was
also reading in how code like

  @foo = @bar || some_big_hairy_function();

would work. Does log_or get a lazy second argument? Are lazy arguments
going to be fast enough to implement conditionals?

> >I think we have to preserve that behavior
> >for all types or the (hypothetical future) optimizer might break things.
> >It might translate "if" statements into ||, or vice versa. It might do dead
> >code elimination. It might do liveness analysis.
> 
> It already does and it is a pain when you are trying to give meaning
> to && and || for overloaded objects.
> 
> I happend to have a 'need' for || / && which "short circuit later" i.e.

I absolutely agree that || and && need to be over-loadable -- even for
changing the always-short-circuit behavior to DWIM behavior.

My only point is that we shouldn't do this in the vtable. The vtable
IMHO needs consistent and well defined semantics so that other parts of
the system can safely make assumptions about what transformations can
be legally done to the code.

I'd like to see over-loading || done with a macro system. Maybe
something like:

BEGIN {
  syntax $expr : $expr || $expr
  {
 my $t = $1;
 (has_method($t, 'op_logical_or')) ? $t->op_logical_or($2)
   : ($t) ? $t : $2
  }
}

(That should be read as a tree substitution macro with $1 and $2 bound
to the parse trees for the operands of the || operator.)

There could be several macros like this defined by "use overload".

> >IMHO syntax changes (like creating non-short circuiting logicals)
> 
> Semantic not syntax.

Maybe you're right. IMHO it's one of those fuzzy boundaries between
the syntax and semantics. An over-loaded || operator is a regular
method call. The non-over-loaded || operator is a special form of
an "if" statement.

- Ken



Re: New Perl rewrite - embedded Perl

2000-09-12 Thread Ken Fox

"ye, wei" wrote:
> Tom Christiansen wrote:
> > It [miniperl] isn't substantially smaller, so that does you no good.

The socket library seems to be the poster child for what to leave
out, but that's a weak argument. If Perl 6 gets all the functionality
requested by Damian or the PDL folks, it would make sense to design a
miniperl that can dynamically load the "expensive" stuff.

> By the way, I took some time learn Guile(Gnu version of Scheme). The Guile
> is an extension language as well, which allow you embed it in your program
> very easily.

Starting guile is extremely slow. That has led to people "freezing" guile
after the core modules have been loaded. What's the point in having a super
small core if it has to be pre-loaded with a bunch of modules and frozen
in order for it to be usable?

> The most interesting thing is its "Faster Integers", which don't have to malloc
> memory to store integer.

Perl 6 will be able to fit an integer into the SV directly so malloc won't
be necessary. (Boxing an SV into a single machine word using Guile's bit packing
tricks will probably not be used. It's not clear that packing has many benefits
on modern machines.)

I like guile (and Scheme in general), but Perl isn't a minimalist language.
Stealing from Scheme is good. Adopting Scheme's Small Is Beautiful philosophy
isn't.

- Ken



Re: Perl Implementation Language

2000-09-12 Thread Ken Fox

Dan Sugalski wrote:
> As for the language we implement perl in (and thus ultimately need to
> translate to the compiler-target language), I'm thinking of something like
> Chip's PIL.

That's fine for the VM and the support libraries, but I'd *really* like
to see the parser/front-end in Perl. There are dozens of RFCs that require
some non-trivial extensions to the parser. It would be nice to code these
in Perl -- especially if we can co-evolve the regex engine to support the
extensions. I think this would also help us document the interface between
the front-end and back-end (which could be a compiler or the VM). IMHO
this separation is worth it even if it costs us some performance.

- Ken



Re: A tentative list of vtable functions

2000-09-12 Thread Ken Fox

Dan Sugalski wrote:
> For something like:
> 
>@foo = @bar || @baz;
> 
> I have no problem with the call sequence looking like (pseudo-codish here):
> 
> set_context(ARRAY, ASSIGN);
> foo->store(bar->log_or(bar, baz));

But log_or must short circuit -- I think we have to preserve that behavior
for all types or the (hypothetical future) optimizer might break things.
It might translate "if" statements into ||, or vice versa. It might do dead
code elimination. It might do liveness analysis.

IMHO syntax changes (like creating non-short circuiting logicals) need to
be done above the opcode level before the optimizer or compiler sees anything.
That allows the opcodes to have stable well-known semantics.

- Ken



Re: A tentative list of vtable functions

2000-09-08 Thread Ken Fox

Dan Sugalski wrote:
> At 05:30 PM 8/31/00 -0400, Ken Fox wrote:
> >   before_get_value
> >   after_get_value
> >   before_set_value
> >   after_set_value
> >
> >There ought to be specializations of get_value and set_value
> >that call these hooks if they're defined -- no sense in making the
> >normal case slow.
> 
> You could override the vtable function in that case, I think.

It'd be nice if there were standard functions for this in the core.
I don't want every type author to write debugger code (which is
probably what the before_* and after_* methods will be most useful
for.)

> >We also need GC functions:
> >   children -- nested SV *s (for live object traversals)
> 
> I'm not sure. Maybe.

How else are you going to discover the internal SV references in an
object? I don't want to have a conservative GC.

> >   move -- move object to new memory location
> 
> Don't think this is needed. The base SV structures won't move, I think, and
> we should be able to move the variable's contents (hanging off the sv_any
> analog) at will.

Different types may cache internal pointers instead of offsets. Those
pointers might need to change if an object is moved. (I agree that SVs
should not move.)

> >   resize_granted   -- object's resize request was granted
> 
> I don't think this is needed--the guts of the set functions would handle
> this as needed.

An object might be smart enough to either compress itself or take
advantage of adjacent space becoming available. This method would let
the GC and the object discuss allocations.

> >Is is_same an identify function? Or does it check the type?
> 
> Identity. If two SVs pointed to the identical same data, this'd return true.

Shouldn't the sv_any pointers be the same then?

> >What purpose do the logical_* functions serve? Shouldn't there just be
> >one is_true function that the logical_* ops call?
> 
> In case one wants to overload
> 
>@foo || @bar
> 
> to do a piecewise logical or of the two arrays.

I don't like the idea of using the ops vtbl for syntax overloading. For
one thing there aren't individual ops for every syntax, so the vtbl alone
isn't sufficient. Secondly, I think vtbl ops should have very consistent
semantics to allow the optimizer some room to re-arrange things. Short
circuiting should not be customizable by each type for example.

- Ken



Re: the C JIT

2000-08-31 Thread Ken Fox

"David L. Nicol" wrote:
> No, I'm not, it's the direction that RFC 61 ends up if you let it
> take you there.

You seem to be confusing:

   (1) linking C code with Perl

with

   (2) compiling Perl to C code

There is a world of difference. Swig does (1) pretty well already.
If you want a first class blessed/tied interface you learn the perl
types and internals by reading perlguts. This is *not* XS.

If you want (2) then you've got a lot of work. For example, you can't
use built-in C data types or even the C function call stack because
they aren't compatible with Perl. It's possible to design a large
library of data structures to help, but then you've simply re-invented
libperl.so. The real problems of exception handling, closures, dynamic
scoping, etc. are just not possible to solve using simple C code.

- Ken



Re: A tentative list of vtable functions

2000-08-31 Thread Ken Fox

Dan Sugalski wrote:
> get_value
> set_value

Wouldn't these go on the SV and not on the inner type? Maybe I'm
thinking value when you're saying variable? The following seem useful
on variables too:

  before_get_value
  after_get_value
  before_set_value
  after_set_value

There ought to be specializations of get_value and set_value
that call these hooks if they're defined -- no sense in making the
normal case slow.

We also need GC functions:

  allocation_size  -- size of required (or actual) allocation
  children -- nested SV *s (for live object traversals)
  move -- move object to new memory location
  resize_granted   -- object's resize request was granted
  finalize -- release private resources before destruction

The clone function should take a depth parameter so that lazy
deep copies can be type specific.

Is is_same an identify function? Or does it check the type? Either
way it seems like that could be implemented on the SV and not on every
type.

What purpose do the logical_* functions serve? Shouldn't there just be
one is_true function that the logical_* ops call?

Persistence functions would be really useful too:

  freeze
  thaw

I'm not too worried about getting the vtbl right at the first because
it will be pretty obvious how it should go once the code starts to form.

- Ken



  1   2   >