Re: Stacks & registers

Uri Guttman Wed, 23 May 2001 11:49:13 -0700
>>>>> "DS" == Dan Sugalski <[EMAIL PROTECTED]> writes:

  DS> [I'm taking these all at once, so bear with me]
  DS> Simon wrote:

  >> Register based. Untyped registers; I'm hoping that the vtable stuff can be
  >> sufficiently optimized that there'll be no major win in storing multiple
  >> copies of a PMC's data in different types knocking around.

  DS> Maybe, but I'm thinking that adding two cached integers together (the 
  DS> integer piece of the register being essentially a cache) will be faster 
  DS> than two calls to get_integer.

that makes sense for common stuff like integer math and maybe even basic
string stuff like join, interpolation and (perl5) ..

we could have a string accumulator (shades of pdp8!) for building up
strings. the problem occurs when you have nested interpolation:

        "foo$bar{qq{$key1$key2}}"

the compiler would have to convert that to inside out execution.

idea: there still is a special string accumulator but it is a pmc and
that is passed around instead of the full register. then a new pmc is
used to replace it. this should lower the copying of strings as they get
passed around. 

  DS> Also, by having integer bits of registers, it means we can avoid
  DS> making a full PMC for integer constants, or for integers (or
  DS> floats, or strings) the interpreter wants to deal with
  DS> internally. (scratch ints/floats/strings/whatever)

yep.

  >> I'm unclear what purpose having virtual resisters for ints and nums serves:
  >> can't we just put those in real registers? Or perhaps I'm missing the pt?

  DS> The point of having int/num/string registers is for cache and
  DS> convenience reasons. If Parrot's dealing with an integer and it
  DS> *knows* that it's an integer (because, for example, we're dealing
  DS> with some sort of internal counter, or Python code) there seems
  DS> little reason to promote it to a full PMC.

hmm, sort of like your special integer scratch registers too. but since
we are coding in C we can't force anything to real hardware registers,
we have to let the (g)cc handle that for us.

  DS> Hong wrote:


  >> every bytecode inside one method always operates on the same stack
  >> depth, therefore we can just treat the "locals + stack" as a flat
  >> register file. A single pass can translate stack based code into
  >> register based code.

  DS> Fair enough, but then registers are just oddly named stack
  DS> entities, which makes their being on the stack a matter of
  DS> implementation. (Whether that means anything useful is another
  DS> matter)

as most compiler books will note, all/most IL's can be easily translated
to one another. our goal is to pick one that works best for our needs
and then translate to others as desired (like jvm or .net).

  DS> Uri wrote:

  >> first question: is this for the intermediate language or the back end
  >> VM? they don't have to be the same.

  DS> Bytecode. I want the bytecode to be what the Parrot VM eats--it
  DS> will be Parrot's assembly language.

but that still doesn't answer my question. do we directly generate
bytecode or an IL and then generate bytecode from that? bytecode is not
an IL but a storable form of one. you can create bytecode for a stack or
register or N-tuple machine. i am saying that we can choose our IL
independently of the bytecode design. bytecode is harder to optimize and
manipulate than most IL's. so i say we compile to an IL which can be
interpreted directly and which can also be stored/retrieved in a
bytecode form. but the bytecode doesn't have to be directly
executable. it may need to be expanded back into the IL (which could be
very fast with a simple mapping). 

  >> since our goal is to support the polymorphic front/back end design,
  >> the intermediate language (IL) should be easy to generate and easy
  >> to compile to various targets. also it needs to have other features
  >> like being modifiable for optimization, storable on disk,
  >> debuggable (for dev types), etc.

  DS> This is the one spot it falls down. Though arguably translating
  DS> bytecodes dealing with PMCs will be more work than bytecodes that
  DS> deal with ints or strings.

that was my previous point. we have to decide the tradeoff between
speed, ease of manipulation of the IL and translations (if any) to/from
bytecode.

  DS> If someone thinks the whole "multiple views per register" thing is a bad 
  DS> idea, this would be a good line of attack to start with. :)

i think it is ok but there doesn't need to be a V flag for each
type. any given N-tuple knows knows the return value of any N-tuple it
references and so fetches it directly. just like in a real compiler, you
know that memory or register is an int and don't check it each time. but
it will change over time like our registers will. a register type is
tracked at compile time only and not looked at during runtime.

  >> the PL/I compiler suite i hacked on used that style of IL. but is
  >> allocated as many named temps as it needed by using N-tuples for
  >> each op code. each OP would leave its result in a named temp which
  >> could be referred to by later OPs. i think they were generic (as
  >> PL/I could mung many more types of data than perl) but the compiler
  >> knew what held what so that wasn't an issue.

  DS> That's still reasonably stack-based, and assumes a single return
  DS> value per sub, which we don't have. (Unless we consider a list as
  DS> a single value, which is reasonable)

sure, a list is really an array ref for purposes of N-tuple return
values.

  >> the problem with a fixed number of registers is register
  >> overflow. this is true in all register based machines and VM's and
  >> is a pain. if we go with a pure named temp architecture, you don't
  >> need to worry about that. but it is slower to access and manage
  >> named temps than a fixed set of registers (or even just a stack
  >> which is like a single reg machine).

  DS>  From all the literature I've dug through, a largish register set
  DS> (like 32 or 64) gets around 90+% of the overflow issues, and we
  DS> can still have register push/pop opcodes.

and i suggested 100 which is in the same ballbark. the issue with
push/pop of registers will occur more in sub calls than in individual
statements. one idea is to make the registers map directly to PMCs so we
can allocate them from the same pool. this simplifies passing in and
returning params to subs and N-tuples. N-tuples refer to a register
number (index into the special PMC array) and we can push/pop registers
very simply and fast using the PMC management code.

also if we look at windowing modes (for simplifying of sub calls), you
still want unwindowed registers. 

but a sub call can be simply done with a N-tuple op which is given the
code ref and the list of params. it copies refs (which can be from
registers or in 'memory' to the param PMC's into the special register @_
and jumps to the first op of the code ref. saving and storing registers
will be handled by ops generated by the compiler.

  >> i have some experience with N-tuples and it does work well for
  >> compiling as you don't care about the managing of registers at run
  >> time then. you just keep a fat namespace table at compile time and
  >> throw it out later. we have a compile and run phase so managing a
  >> large number of N-tuples may be slow running but easy to generate
  >> and manipulate. that is a typical tradeoff.

  DS> I am worried about making the base compilation inherently
  DS> expensive, which would be bad. (/OPTIMIZE=BREATHTAKING switches
  DS> notwithstanding, as you'll get what you ask for there) OTOH, this
  DS> sounds rather like what all the compiler books I have use in some
  DS> form, so I might be fretting over nothing.

i think so too. we can't afford a open sized register set at runtime
though. 

  >> we should have a small set of special registers to handle events,
  >> booleans, the PC, signals, etc.

  DS> Hadn't thought about that. Looks like a really good idea.

include @_ and $_ and others in that list.

by making them registers, we can easily support threads by having
different register base addresses for each thread. so a thread would
have a main structure which contains its stack and stack pointer, its
register base, and PC. no sharing by default.


  DS> Basically a register would be really fat, looking like:

  DS>     +-+-----------------+
  DS>     |V| PMC pointer     |
  DS>     +-+-----------------+
  DS>     |V| string          |
  DS>     +-+-----------------+
  DS>     |V| integer         |
  DS>     +-+-----------------+
  DS>     |V| float           |
  DS>     +-+-----------------+

  DS> With the "V" bit being a "this part is valid" marker. There'd be
  DS> opcodes like:

  DS>     makeint 1; Fills in the integer part of register 1
  DS>     makestr 1; Fills in the string part of register 1
  DS>     makepmc 1; Creates a valid PMC for register 1
  DS>     makenum 1; Fills in the float part of register 1
  DS>     makeall 1; Make all the invalid pieces of register 1 real

my idea above means no need for the V bit or the runtime slowdown. let
the compiler handle all of that.

  DS> Graham wrote:

  >> That comment reminds me of how the register file is implemented in
  >> a sun sparc. They have a large register file, but only some are
  >> accessable at any given time, say 16. When you do a sub call you
  >> place your arguments in the high registers, say 4, and shift the
  >> window pointer by 12 (in this case).  What was r12-r15 now becomes
  >> r0-r3. On return the result is placed into r0-r3 which are then
  >> available to the caller as r12-r15.
  >> 
  >> This allows very efficient argument passing without having to save
  >> registers to a stack and restor them later.

  DS> Hmmm. Yeah, there is that, and if we go to register-based
  DS> parameter passing (which I know Larry intends in some cases, and I
  DS> agree as it'll make C binding cleaner) it'll be a useful
  DS> thing. Perhaps two sets of registers, the parameter passing ones
  DS> and the work ones, would be in order.

i am not sure that register windows will be a win in software. as i said
in another post, it is a hardware win (and even that has been debated). 
how would you pass in a very large list to a sub using a register
window? that causes problems in hardware and would be tricky to manage
in software too.

  DS> Graham wrote:
  DS> I think that's it--is there anything I missed?

threaded in line code. is this another back end translation? N-tuples
map well to TIL as you can just replace the op code to a machine level
call to the sub instead of the dispatch loop doing it for you. we make
all the op code subs have an API that works with direct calls from TIL
or the dispatch loop. they all basically can take the N-Tuple as its
main arg with a list of additional args as needed (e.g. sub call args).

this is a very good discussion in general. i think it will lead to
something very good.

uri

-- 
Uri Guttman  ---------  [EMAIL PROTECTED]  ----------  http://www.sysarch.com
SYStems ARCHitecture and Stem Development ------ http://www.stemsystems.com
Learn Advanced Object Oriented Perl from Damian Conway - Boston, July 10-11
Class and Registration info:     http://www.sysarch.com/perl/OOP_class.html
Re: Stacks & registers

Reply via email to