Re: Stacks registers

2001-05-23 Thread Uri Guttman

 DS == Dan Sugalski [EMAIL PROTECTED] writes:

  DS Should Parrot be a register or stack-based system, and if a
  DS register-based one, should we go with typed registers?

first question: is this for the intermediate language or the back end
VM? they don't have to be the same.

since our goal is to support the polymorphic front/back end design, the
intermediate language (IL) should be easy to generate and easy to
compile to various targets. also it needs to have other features like
being modifiable for optimization, storable on disk, debuggable (for dev
types), etc.

this IL could be the VM we use in perl itself and it can also be
compiled to other VM's or c, etc.

also it could be processed/compiled to a faster perl VM for execution.


  DS My personal preference here is to make Parrot a register based
  DS system. I can wrap my head around that a lot easier than a stack
  DS system, it's something I'm comfortable with, there's lots of
  DS literature on optimizing this sort of system, and I just generally
  DS like it better. (I cut my programming teeth on 6502s... so sue me
  DS :) Stack based systems have a certain appeal--they're simpler
  DS generally, which is fine. I'm not too worried about simpler as
  DS much as I am faster, and I think we can get faster out of
  DS registers. (Or, if you prefer, named temporaries instead of
  DS registers. Whatever)

the PL/I compiler suite i hacked on used that style of IL. but is
allocated as many named temps as it needed by using N-tuples for each op
code. each OP would leave its result in a named temp which could be
referred to by later OPs. i think they were generic (as PL/I could mung
many more types of data than perl) but the compiler knew what held what
so that wasn't an issue.

  DS If (or when, I suppose, barring a Really Good Counter-Argument) we
  DS go the register route, then, should we have typed registers like
  DS most CPUs do? A set of PMC pointer registers, a set of integer
  DS registers, a set of floating-point registers, a set of string
  DS registers? And if we do go with typed registers, should they be
  DS linked together? (So that the int in Iregister 1 matches the
  DS integer value of the PMC in Pregister 1, and the float value in
  DS Fregister 1?) And if they're linked (And should they all be
  DS linked, or only some of them) should we guarantee consistency, or
  DS should the bytecode explicitly make various registers valid or
  DS invalid?

the problem with a fixed number of registers is register overflow. this
is true in all register based machines and VM's and is a pain. if we go
with a pure named temp architecture, you don't need to worry about
that. but it is slower to access and manage named temps than a fixed set
of registers (or even just a stack which is like a single reg machine).

  DS My current thoughts are this:

  DS We have a set of N registers. They're all linked. Nothing
  DS implicitly sets values in any of the registers (if you want an
  DS integer value, you need to make one). Each register has a set of
  DS validity markers for each type (int, flaot, string, PMC) that may
  DS or may not be bits. We have a stack of sorts that we can push the
  DS registers on to if we need.

that is a good start. the stack can handle the overflow but that means
more complexity in the compiler to deal with that. let's do some study
of various IL designs (including both compiled and interpreted) and see
their wins and losses. i think pure stack based is out as it is too
limiting. i have some experience with N-tuples and it does work well for
compiling as you don't care about the managing of registers at run time
then. you just keep a fat namespace table at compile time and throw it
out later. we have a compile and run phase so managing a large number of
N-tuples may be slow running but easy to generate and manipulate. that
is a typical tradeoff.

so my current best idea is a fixed size set of active N-tuples/registers
(any number larger than most long statements would work, say 100?) as
the registers. each N-tuple has its op code and argument register
indexes and which register it stores its result in. we have a stack for
the overflow handling and that is the hardest part.

we should have a small set of special registers to handle events,
booleans, the PC, signals, etc.

this makes compiling and optimization very easy. we just need to manage
the lifetime of registers for the N-tuples and deal with the (hopefully)
rare overflow. runtime interpretation is also fairly simple (not as easy
as a stack but not too bad) with decent speed. code generation to other
VM or storing to disk should be fairly straightforward.

  DS I'm definitely feeling unsure about this, so feel free (please!)
  DS to wade in with comments, criticisms, or personal attacks... :)

you are a doofus.

but a smart one. :)

uri

-- 
Uri Guttman  -  [EMAIL PROTECTED]  --  http://www.sysarch.com
SYStems ARCHitecture and Stem Development -- 

RE: Stacks registers

2001-05-23 Thread Hong Zhang


 Register based. Untyped registers; I'm hoping that the vtable stuff can be
 sufficiently optimized that there'll be no major win in 
 storing multiple copies of a PMC's data in different types knocking
around. 
 
 For those yet to be convinced by the benefits of registers over stacks,
try
 grokking in fullness what op scratchpads are about. Ooh look, registers.

I think stack based =~ register based. If we don't have Java-like jsr 
and ret, every bytecode inside one method always operates on the same
stack
depth, therefore we can just treat the locals + stack as a flat register
file. A single pass can translate stack based code into register based code.

For example:
  push local #3; = move #(max_local + opcode_stack_depth), #3

  push local #3; push local #4; add; pop local #5; = add #5, #3, #4

  push local #3; push local #4; call foo; pop #6; = call_2 #6, #3, #4

As long as stack based system is carefully designed, we can easily add
linear-cost translation step to convert it into register based bytecode,
and run it faster.

Hong



Re: Stacks registers

2001-05-23 Thread Bryan C . Warnock

On Wednesday 23 May 2001 12:59, Dan Sugalski wrote:
 Okay, folks, here's the current conundrum:

 Should Parrot be a register or stack-based system, and if a register-based
 one, should we go with typed registers?

 My personal preference here is to make Parrot a register based system. I
 can wrap my head around that a lot easier than a stack system, it's
 something I'm comfortable with, there's lots of literature on optimizing
 this sort of system, and I just generally like it better. (I cut my
 programming teeth on 6502s... so sue me :) Stack based systems have a
 certain appeal--they're simpler generally, which is fine. I'm not too
 worried about simpler as much as I am faster, and I think we can get
 faster out of registers. (Or, if you prefer, named temporaries instead
 of registers. Whatever)

 If (or when, I suppose, barring a Really Good Counter-Argument) we go the
 register route, then, should we have typed registers like most CPUs do? A
 set of PMC pointer registers, a set of integer registers, a set of
 floating-point registers, a set of string registers? And if we do go with
 typed registers, should they be linked together? (So that the int in
 Iregister 1 matches the integer value of the PMC in Pregister 1, and the
 float value in Fregister 1?) And if they're linked (And should they all be
 linked, or only some of them) should we guarantee consistency, or should
 the bytecode explicitly make various registers valid or invalid?

A little backwards my answer will be.  I'm fairly sure we won't need linked 
registers.  The PMC's vtable implemention should already take care of the 
int/float/string cross-referencing - by having the linked registers, you'd 
really only be creating a parallel structure to wade through.  Worse case, 
you're doing both - using the vtable to create the values for the other 
registers.  (After checking the register first to see if it were valid?)  
That doesn't sound like a win at all.  You could bypass the PMC's vtable and 
populate the registers directly, but then why have the PMC.  I don't see any 
reason for the linking.

Now, assuming I'm not way out in the left-field stands above, I don't see 
the necessity for typed registers (at least as described).  (I mean, one of 
the major reasons for typed registers is for hardware optimization, which we 
can't really borrow.  We're emulating that win with the vtables.)  However,
another common use for registers is to save on allocation - like a hardware 
zero register.  You could certainly call PL_undef and the ilk (in its new 
PMC form) a dedicated register, which means it's sort of type.  But I think 
you still make a save if you automatically assume that every register is 
simply a PMC.  

The only use I could think of would be for constants that you know aren't 
going to need to be converted from one form to another.  But then you need 
to handle the two paths in the rest of your vtable entries.  Unless you make 
everything run through the registers.

Your not going to make opcodes {shudder} dependent on implicit registers, 
are you?  :-)

-- 
Bryan C. Warnock
[EMAIL PROTECTED]



Re: Stacks registers

2001-05-23 Thread Graham Barr

On Wed, May 23, 2001 at 06:06:26PM +0100, Simon Cozens wrote:
 On Wed, May 23, 2001 at 12:59:01PM -0400, Dan Sugalski wrote:
  Should Parrot be a register or stack-based system, and if a register-based 
  one, should we go with typed registers?
 
 Register based. Untyped registers; I'm hoping that the vtable stuff can be
 sufficiently optimized that there'll be no major win in storing multiple
 copies of a PMC's data in different types knocking around. 

I would agree with that too.

Graham.



Re: Stacks registers

2001-05-23 Thread Uri Guttman

 GB == Graham Barr [EMAIL PROTECTED] writes:

  GB On Wed, May 23, 2001 at 10:30:32AM -0700, Hong Zhang wrote:
   I think stack based =~ register based. If we don't have Java-like jsr 

  GB That comment reminds me of how the register file is implemented in
  GB a sun sparc. They have a large register file, but only some are accessable
  GB at any given time, say 16. When you do a sub call you place your
  GB arguments in the high registers, say 4, and shift the window pointer
  GB by 12 (in this case).  What was r12-r15 now becomes r0-r3. On return
  GB the result is placed into r0-r3 which are then available to the
  GB caller as r12-r15.

  GB This allows very efficient argument passing without having to save
  GB registers to a stack and restor them later.

there have been arguments over whether register windows win or not. on
the plus side they make passing sub args very efficient (if you can fit
them all into the window). on the minus side, you have to do major
chunks of swapping registers to/from stack. the window size is a big
issue too and some think sparc has too small a window and too few total
registers.

and besides register windows is a hardware only solution (they just mung
the virtual base in a special register). i have not heard of it being
used in pure software as the extra costs of having many registers or a
deep stack in software is nothing while in hardware is has a real (and
heavy) cost.

uri

-- 
Uri Guttman  -  [EMAIL PROTECTED]  --  http://www.sysarch.com
SYStems ARCHitecture and Stem Development -- http://www.stemsystems.com
Learn Advanced Object Oriented Perl from Damian Conway - Boston, July 10-11
Class and Registration info: http://www.sysarch.com/perl/OOP_class.html



Re: Stacks registers

2001-05-23 Thread Simon Cozens

On Wed, May 23, 2001 at 01:36:14PM -0400, Bryan C. Warnock wrote:
 You could certainly call PL_undef and the ilk (in its new 
 PMC form) a dedicated register

Which is, incidentally, exactly what happens in disassembled Perl 5 bytecode.

-- 
Why waste negative entropy on comments, when you could use the same
entropy to create bugs instead?
-- Steve Elias



Re: Stacks registers

2001-05-23 Thread Uri Guttman

 DS == Dan Sugalski [EMAIL PROTECTED] writes:

  DS [I'm taking these all at once, so bear with me]
  DS Simon wrote:

   Register based. Untyped registers; I'm hoping that the vtable stuff can be
   sufficiently optimized that there'll be no major win in storing multiple
   copies of a PMC's data in different types knocking around.

  DS Maybe, but I'm thinking that adding two cached integers together (the 
  DS integer piece of the register being essentially a cache) will be faster 
  DS than two calls to get_integer.

that makes sense for common stuff like integer math and maybe even basic
string stuff like join, interpolation and (perl5) ..

we could have a string accumulator (shades of pdp8!) for building up
strings. the problem occurs when you have nested interpolation:

foo$bar{qq{$key1$key2}}

the compiler would have to convert that to inside out execution.

idea: there still is a special string accumulator but it is a pmc and
that is passed around instead of the full register. then a new pmc is
used to replace it. this should lower the copying of strings as they get
passed around. 

  DS Also, by having integer bits of registers, it means we can avoid
  DS making a full PMC for integer constants, or for integers (or
  DS floats, or strings) the interpreter wants to deal with
  DS internally. (scratch ints/floats/strings/whatever)

yep.

   I'm unclear what purpose having virtual resisters for ints and nums serves:
   can't we just put those in real registers? Or perhaps I'm missing the pt?

  DS The point of having int/num/string registers is for cache and
  DS convenience reasons. If Parrot's dealing with an integer and it
  DS *knows* that it's an integer (because, for example, we're dealing
  DS with some sort of internal counter, or Python code) there seems
  DS little reason to promote it to a full PMC.

hmm, sort of like your special integer scratch registers too. but since
we are coding in C we can't force anything to real hardware registers,
we have to let the (g)cc handle that for us.

  DS Hong wrote:


   every bytecode inside one method always operates on the same stack
   depth, therefore we can just treat the locals + stack as a flat
   register file. A single pass can translate stack based code into
   register based code.

  DS Fair enough, but then registers are just oddly named stack
  DS entities, which makes their being on the stack a matter of
  DS implementation. (Whether that means anything useful is another
  DS matter)

as most compiler books will note, all/most IL's can be easily translated
to one another. our goal is to pick one that works best for our needs
and then translate to others as desired (like jvm or .net).

  DS Uri wrote:

   first question: is this for the intermediate language or the back end
   VM? they don't have to be the same.

  DS Bytecode. I want the bytecode to be what the Parrot VM eats--it
  DS will be Parrot's assembly language.

but that still doesn't answer my question. do we directly generate
bytecode or an IL and then generate bytecode from that? bytecode is not
an IL but a storable form of one. you can create bytecode for a stack or
register or N-tuple machine. i am saying that we can choose our IL
independently of the bytecode design. bytecode is harder to optimize and
manipulate than most IL's. so i say we compile to an IL which can be
interpreted directly and which can also be stored/retrieved in a
bytecode form. but the bytecode doesn't have to be directly
executable. it may need to be expanded back into the IL (which could be
very fast with a simple mapping). 

   since our goal is to support the polymorphic front/back end design,
   the intermediate language (IL) should be easy to generate and easy
   to compile to various targets. also it needs to have other features
   like being modifiable for optimization, storable on disk,
   debuggable (for dev types), etc.

  DS This is the one spot it falls down. Though arguably translating
  DS bytecodes dealing with PMCs will be more work than bytecodes that
  DS deal with ints or strings.

that was my previous point. we have to decide the tradeoff between
speed, ease of manipulation of the IL and translations (if any) to/from
bytecode.

  DS If someone thinks the whole multiple views per register thing is a bad 
  DS idea, this would be a good line of attack to start with. :)

i think it is ok but there doesn't need to be a V flag for each
type. any given N-tuple knows knows the return value of any N-tuple it
references and so fetches it directly. just like in a real compiler, you
know that memory or register is an int and don't check it each time. but
it will change over time like our registers will. a register type is
tracked at compile time only and not looked at during runtime.

   the PL/I compiler suite i hacked on used that style of IL. but is
   allocated as many named temps as it needed by using N-tuples for
   each op code. each OP would leave its result in a named temp 

Re: Stacks registers

2001-05-23 Thread Uri Guttman

 NI == Nick Ing-Simmons [EMAIL PROTECTED] writes:

  NI We need to decide where a perl6 sub's local variables are going
  NI to live (in the recursive case) - if we need a stack anyway it
  NI may make sense for VM to have ways of indexing the local frame
  NI rather than having global registers (set per thread by the way?)

i made that thread point too in my long reply to dan.

but indexing directly into a stack frame is effectively a register
window. the problem is that you need to do an indirection through the
window base for every access and that is slow in software (but free in
hardware).

  NI What I do agree with is that 
  NIpush a
  NIpush b
  NIadd
  NIpop  r 

  NI is lousy way to code r = a + b - too much pointless copying. 
  NI We want 
  NIadd #a,#b,#r
  NI were #a is a small number indexing into somewhere where a is stored.

and that is an N-tuple opcode with a fixed size register set. register
windowing is not needed here, as pushing/popping as needed works well.

the win of a stack system is a cleaner and simpler op code api with
fewer op codes. but that doesn't gain us much as we have so many complex
ops that compiling them to simpler ones and all the extra runtime is a
loss.

BTW, an N-tuple op code is easily converted to/from an asm form which is
a feature that has been mentioned as desired. you can look at higher
level CISC instruction sets (e.g. vax) for examples. hell, the vax had a
polynomial op code and high level string ops!

uri

-- 
Uri Guttman  -  [EMAIL PROTECTED]  --  http://www.sysarch.com
SYStems ARCHitecture and Stem Development -- http://www.stemsystems.com
Learn Advanced Object Oriented Perl from Damian Conway - Boston, July 10-11
Class and Registration info: http://www.sysarch.com/perl/OOP_class.html



Re: Stacks registers

2001-05-23 Thread Alan Burlison

Nick Ing-Simmons wrote:

 That comment reminds me of how the register file is implemented in
 a sun sparc. They have a large register file, but only some are accessable
 at any given time, say 16.
 
 32 IIRC but principle is correct.

8 global registers, 8 out registers, 8 local registers and 8 in registers. 
Some are set aside for special purposes, e.g. %r14 is stack pointer, %r15 is
called subroutine return addr etc.  Effectively you have 6 'in' and 6 'out'
registers  Extra arguments above 6 are passed in the caller's stack frame.

 1. When you call deep enough to fall off the end of the large register
file an expensive system call is needed to save some registers
at the other end to memory and wrap, and then again when you
come back to the now-in-memory registers.

Not a system call but a trap - they aren't the same thing (pedant mode off
;-).  The register spill trap handler copies the relevant registers onto the
stack - each stack frame has space allocated for this.

Alan Burlison



Re: Stacks registers

2001-05-23 Thread Uri Guttman

 AB == Alan Burlison [EMAIL PROTECTED] writes:

  AB Uri Guttman wrote:
   but indexing directly into a stack frame is effectively a register
   window. the problem is that you need to do an indirection through the
   window base for every access and that is slow in software (but free in
   hardware).

  AB Not quite - a register window system has real registers, it's just
  AB at some point they may end up in memory.  That's not the same as
  AB indexing into memory on every access.

what i meant is that the indirection to access a given windowed register
using the window base is free (speed wise) in hardware, but doing the
same thing in software costs.

but comparing the hardware design with a pure software design is not too
useful. the costs and goals are so different. some conceptual analogies
are useful but saying register windows are fast in hardware (BTW that is
risc stuff) but not a speed win in software (where parrot is leaning
towards looking cisc-like) is not relevant.

uri

-- 
Uri Guttman  -  [EMAIL PROTECTED]  --  http://www.sysarch.com
SYStems ARCHitecture and Stem Development -- http://www.stemsystems.com
Learn Advanced Object Oriented Perl from Damian Conway - Boston, July 10-11
Class and Registration info: http://www.sysarch.com/perl/OOP_class.html