Re: Stacks registers
DS == Dan Sugalski [EMAIL PROTECTED] writes: DS Should Parrot be a register or stack-based system, and if a DS register-based one, should we go with typed registers? first question: is this for the intermediate language or the back end VM? they don't have to be the same. since our goal is to support the polymorphic front/back end design, the intermediate language (IL) should be easy to generate and easy to compile to various targets. also it needs to have other features like being modifiable for optimization, storable on disk, debuggable (for dev types), etc. this IL could be the VM we use in perl itself and it can also be compiled to other VM's or c, etc. also it could be processed/compiled to a faster perl VM for execution. DS My personal preference here is to make Parrot a register based DS system. I can wrap my head around that a lot easier than a stack DS system, it's something I'm comfortable with, there's lots of DS literature on optimizing this sort of system, and I just generally DS like it better. (I cut my programming teeth on 6502s... so sue me DS :) Stack based systems have a certain appeal--they're simpler DS generally, which is fine. I'm not too worried about simpler as DS much as I am faster, and I think we can get faster out of DS registers. (Or, if you prefer, named temporaries instead of DS registers. Whatever) the PL/I compiler suite i hacked on used that style of IL. but is allocated as many named temps as it needed by using N-tuples for each op code. each OP would leave its result in a named temp which could be referred to by later OPs. i think they were generic (as PL/I could mung many more types of data than perl) but the compiler knew what held what so that wasn't an issue. DS If (or when, I suppose, barring a Really Good Counter-Argument) we DS go the register route, then, should we have typed registers like DS most CPUs do? A set of PMC pointer registers, a set of integer DS registers, a set of floating-point registers, a set of string DS registers? And if we do go with typed registers, should they be DS linked together? (So that the int in Iregister 1 matches the DS integer value of the PMC in Pregister 1, and the float value in DS Fregister 1?) And if they're linked (And should they all be DS linked, or only some of them) should we guarantee consistency, or DS should the bytecode explicitly make various registers valid or DS invalid? the problem with a fixed number of registers is register overflow. this is true in all register based machines and VM's and is a pain. if we go with a pure named temp architecture, you don't need to worry about that. but it is slower to access and manage named temps than a fixed set of registers (or even just a stack which is like a single reg machine). DS My current thoughts are this: DS We have a set of N registers. They're all linked. Nothing DS implicitly sets values in any of the registers (if you want an DS integer value, you need to make one). Each register has a set of DS validity markers for each type (int, flaot, string, PMC) that may DS or may not be bits. We have a stack of sorts that we can push the DS registers on to if we need. that is a good start. the stack can handle the overflow but that means more complexity in the compiler to deal with that. let's do some study of various IL designs (including both compiled and interpreted) and see their wins and losses. i think pure stack based is out as it is too limiting. i have some experience with N-tuples and it does work well for compiling as you don't care about the managing of registers at run time then. you just keep a fat namespace table at compile time and throw it out later. we have a compile and run phase so managing a large number of N-tuples may be slow running but easy to generate and manipulate. that is a typical tradeoff. so my current best idea is a fixed size set of active N-tuples/registers (any number larger than most long statements would work, say 100?) as the registers. each N-tuple has its op code and argument register indexes and which register it stores its result in. we have a stack for the overflow handling and that is the hardest part. we should have a small set of special registers to handle events, booleans, the PC, signals, etc. this makes compiling and optimization very easy. we just need to manage the lifetime of registers for the N-tuples and deal with the (hopefully) rare overflow. runtime interpretation is also fairly simple (not as easy as a stack but not too bad) with decent speed. code generation to other VM or storing to disk should be fairly straightforward. DS I'm definitely feeling unsure about this, so feel free (please!) DS to wade in with comments, criticisms, or personal attacks... :) you are a doofus. but a smart one. :) uri -- Uri Guttman - [EMAIL PROTECTED] -- http://www.sysarch.com SYStems ARCHitecture and Stem Development --
RE: Stacks registers
Register based. Untyped registers; I'm hoping that the vtable stuff can be sufficiently optimized that there'll be no major win in storing multiple copies of a PMC's data in different types knocking around. For those yet to be convinced by the benefits of registers over stacks, try grokking in fullness what op scratchpads are about. Ooh look, registers. I think stack based =~ register based. If we don't have Java-like jsr and ret, every bytecode inside one method always operates on the same stack depth, therefore we can just treat the locals + stack as a flat register file. A single pass can translate stack based code into register based code. For example: push local #3; = move #(max_local + opcode_stack_depth), #3 push local #3; push local #4; add; pop local #5; = add #5, #3, #4 push local #3; push local #4; call foo; pop #6; = call_2 #6, #3, #4 As long as stack based system is carefully designed, we can easily add linear-cost translation step to convert it into register based bytecode, and run it faster. Hong
Re: Stacks registers
On Wednesday 23 May 2001 12:59, Dan Sugalski wrote: Okay, folks, here's the current conundrum: Should Parrot be a register or stack-based system, and if a register-based one, should we go with typed registers? My personal preference here is to make Parrot a register based system. I can wrap my head around that a lot easier than a stack system, it's something I'm comfortable with, there's lots of literature on optimizing this sort of system, and I just generally like it better. (I cut my programming teeth on 6502s... so sue me :) Stack based systems have a certain appeal--they're simpler generally, which is fine. I'm not too worried about simpler as much as I am faster, and I think we can get faster out of registers. (Or, if you prefer, named temporaries instead of registers. Whatever) If (or when, I suppose, barring a Really Good Counter-Argument) we go the register route, then, should we have typed registers like most CPUs do? A set of PMC pointer registers, a set of integer registers, a set of floating-point registers, a set of string registers? And if we do go with typed registers, should they be linked together? (So that the int in Iregister 1 matches the integer value of the PMC in Pregister 1, and the float value in Fregister 1?) And if they're linked (And should they all be linked, or only some of them) should we guarantee consistency, or should the bytecode explicitly make various registers valid or invalid? A little backwards my answer will be. I'm fairly sure we won't need linked registers. The PMC's vtable implemention should already take care of the int/float/string cross-referencing - by having the linked registers, you'd really only be creating a parallel structure to wade through. Worse case, you're doing both - using the vtable to create the values for the other registers. (After checking the register first to see if it were valid?) That doesn't sound like a win at all. You could bypass the PMC's vtable and populate the registers directly, but then why have the PMC. I don't see any reason for the linking. Now, assuming I'm not way out in the left-field stands above, I don't see the necessity for typed registers (at least as described). (I mean, one of the major reasons for typed registers is for hardware optimization, which we can't really borrow. We're emulating that win with the vtables.) However, another common use for registers is to save on allocation - like a hardware zero register. You could certainly call PL_undef and the ilk (in its new PMC form) a dedicated register, which means it's sort of type. But I think you still make a save if you automatically assume that every register is simply a PMC. The only use I could think of would be for constants that you know aren't going to need to be converted from one form to another. But then you need to handle the two paths in the rest of your vtable entries. Unless you make everything run through the registers. Your not going to make opcodes {shudder} dependent on implicit registers, are you? :-) -- Bryan C. Warnock [EMAIL PROTECTED]
Re: Stacks registers
On Wed, May 23, 2001 at 06:06:26PM +0100, Simon Cozens wrote: On Wed, May 23, 2001 at 12:59:01PM -0400, Dan Sugalski wrote: Should Parrot be a register or stack-based system, and if a register-based one, should we go with typed registers? Register based. Untyped registers; I'm hoping that the vtable stuff can be sufficiently optimized that there'll be no major win in storing multiple copies of a PMC's data in different types knocking around. I would agree with that too. Graham.
Re: Stacks registers
GB == Graham Barr [EMAIL PROTECTED] writes: GB On Wed, May 23, 2001 at 10:30:32AM -0700, Hong Zhang wrote: I think stack based =~ register based. If we don't have Java-like jsr GB That comment reminds me of how the register file is implemented in GB a sun sparc. They have a large register file, but only some are accessable GB at any given time, say 16. When you do a sub call you place your GB arguments in the high registers, say 4, and shift the window pointer GB by 12 (in this case). What was r12-r15 now becomes r0-r3. On return GB the result is placed into r0-r3 which are then available to the GB caller as r12-r15. GB This allows very efficient argument passing without having to save GB registers to a stack and restor them later. there have been arguments over whether register windows win or not. on the plus side they make passing sub args very efficient (if you can fit them all into the window). on the minus side, you have to do major chunks of swapping registers to/from stack. the window size is a big issue too and some think sparc has too small a window and too few total registers. and besides register windows is a hardware only solution (they just mung the virtual base in a special register). i have not heard of it being used in pure software as the extra costs of having many registers or a deep stack in software is nothing while in hardware is has a real (and heavy) cost. uri -- Uri Guttman - [EMAIL PROTECTED] -- http://www.sysarch.com SYStems ARCHitecture and Stem Development -- http://www.stemsystems.com Learn Advanced Object Oriented Perl from Damian Conway - Boston, July 10-11 Class and Registration info: http://www.sysarch.com/perl/OOP_class.html
Re: Stacks registers
On Wed, May 23, 2001 at 01:36:14PM -0400, Bryan C. Warnock wrote: You could certainly call PL_undef and the ilk (in its new PMC form) a dedicated register Which is, incidentally, exactly what happens in disassembled Perl 5 bytecode. -- Why waste negative entropy on comments, when you could use the same entropy to create bugs instead? -- Steve Elias
Re: Stacks registers
DS == Dan Sugalski [EMAIL PROTECTED] writes: DS [I'm taking these all at once, so bear with me] DS Simon wrote: Register based. Untyped registers; I'm hoping that the vtable stuff can be sufficiently optimized that there'll be no major win in storing multiple copies of a PMC's data in different types knocking around. DS Maybe, but I'm thinking that adding two cached integers together (the DS integer piece of the register being essentially a cache) will be faster DS than two calls to get_integer. that makes sense for common stuff like integer math and maybe even basic string stuff like join, interpolation and (perl5) .. we could have a string accumulator (shades of pdp8!) for building up strings. the problem occurs when you have nested interpolation: foo$bar{qq{$key1$key2}} the compiler would have to convert that to inside out execution. idea: there still is a special string accumulator but it is a pmc and that is passed around instead of the full register. then a new pmc is used to replace it. this should lower the copying of strings as they get passed around. DS Also, by having integer bits of registers, it means we can avoid DS making a full PMC for integer constants, or for integers (or DS floats, or strings) the interpreter wants to deal with DS internally. (scratch ints/floats/strings/whatever) yep. I'm unclear what purpose having virtual resisters for ints and nums serves: can't we just put those in real registers? Or perhaps I'm missing the pt? DS The point of having int/num/string registers is for cache and DS convenience reasons. If Parrot's dealing with an integer and it DS *knows* that it's an integer (because, for example, we're dealing DS with some sort of internal counter, or Python code) there seems DS little reason to promote it to a full PMC. hmm, sort of like your special integer scratch registers too. but since we are coding in C we can't force anything to real hardware registers, we have to let the (g)cc handle that for us. DS Hong wrote: every bytecode inside one method always operates on the same stack depth, therefore we can just treat the locals + stack as a flat register file. A single pass can translate stack based code into register based code. DS Fair enough, but then registers are just oddly named stack DS entities, which makes their being on the stack a matter of DS implementation. (Whether that means anything useful is another DS matter) as most compiler books will note, all/most IL's can be easily translated to one another. our goal is to pick one that works best for our needs and then translate to others as desired (like jvm or .net). DS Uri wrote: first question: is this for the intermediate language or the back end VM? they don't have to be the same. DS Bytecode. I want the bytecode to be what the Parrot VM eats--it DS will be Parrot's assembly language. but that still doesn't answer my question. do we directly generate bytecode or an IL and then generate bytecode from that? bytecode is not an IL but a storable form of one. you can create bytecode for a stack or register or N-tuple machine. i am saying that we can choose our IL independently of the bytecode design. bytecode is harder to optimize and manipulate than most IL's. so i say we compile to an IL which can be interpreted directly and which can also be stored/retrieved in a bytecode form. but the bytecode doesn't have to be directly executable. it may need to be expanded back into the IL (which could be very fast with a simple mapping). since our goal is to support the polymorphic front/back end design, the intermediate language (IL) should be easy to generate and easy to compile to various targets. also it needs to have other features like being modifiable for optimization, storable on disk, debuggable (for dev types), etc. DS This is the one spot it falls down. Though arguably translating DS bytecodes dealing with PMCs will be more work than bytecodes that DS deal with ints or strings. that was my previous point. we have to decide the tradeoff between speed, ease of manipulation of the IL and translations (if any) to/from bytecode. DS If someone thinks the whole multiple views per register thing is a bad DS idea, this would be a good line of attack to start with. :) i think it is ok but there doesn't need to be a V flag for each type. any given N-tuple knows knows the return value of any N-tuple it references and so fetches it directly. just like in a real compiler, you know that memory or register is an int and don't check it each time. but it will change over time like our registers will. a register type is tracked at compile time only and not looked at during runtime. the PL/I compiler suite i hacked on used that style of IL. but is allocated as many named temps as it needed by using N-tuples for each op code. each OP would leave its result in a named temp
Re: Stacks registers
NI == Nick Ing-Simmons [EMAIL PROTECTED] writes: NI We need to decide where a perl6 sub's local variables are going NI to live (in the recursive case) - if we need a stack anyway it NI may make sense for VM to have ways of indexing the local frame NI rather than having global registers (set per thread by the way?) i made that thread point too in my long reply to dan. but indexing directly into a stack frame is effectively a register window. the problem is that you need to do an indirection through the window base for every access and that is slow in software (but free in hardware). NI What I do agree with is that NIpush a NIpush b NIadd NIpop r NI is lousy way to code r = a + b - too much pointless copying. NI We want NIadd #a,#b,#r NI were #a is a small number indexing into somewhere where a is stored. and that is an N-tuple opcode with a fixed size register set. register windowing is not needed here, as pushing/popping as needed works well. the win of a stack system is a cleaner and simpler op code api with fewer op codes. but that doesn't gain us much as we have so many complex ops that compiling them to simpler ones and all the extra runtime is a loss. BTW, an N-tuple op code is easily converted to/from an asm form which is a feature that has been mentioned as desired. you can look at higher level CISC instruction sets (e.g. vax) for examples. hell, the vax had a polynomial op code and high level string ops! uri -- Uri Guttman - [EMAIL PROTECTED] -- http://www.sysarch.com SYStems ARCHitecture and Stem Development -- http://www.stemsystems.com Learn Advanced Object Oriented Perl from Damian Conway - Boston, July 10-11 Class and Registration info: http://www.sysarch.com/perl/OOP_class.html
Re: Stacks registers
Nick Ing-Simmons wrote: That comment reminds me of how the register file is implemented in a sun sparc. They have a large register file, but only some are accessable at any given time, say 16. 32 IIRC but principle is correct. 8 global registers, 8 out registers, 8 local registers and 8 in registers. Some are set aside for special purposes, e.g. %r14 is stack pointer, %r15 is called subroutine return addr etc. Effectively you have 6 'in' and 6 'out' registers Extra arguments above 6 are passed in the caller's stack frame. 1. When you call deep enough to fall off the end of the large register file an expensive system call is needed to save some registers at the other end to memory and wrap, and then again when you come back to the now-in-memory registers. Not a system call but a trap - they aren't the same thing (pedant mode off ;-). The register spill trap handler copies the relevant registers onto the stack - each stack frame has space allocated for this. Alan Burlison
Re: Stacks registers
AB == Alan Burlison [EMAIL PROTECTED] writes: AB Uri Guttman wrote: but indexing directly into a stack frame is effectively a register window. the problem is that you need to do an indirection through the window base for every access and that is slow in software (but free in hardware). AB Not quite - a register window system has real registers, it's just AB at some point they may end up in memory. That's not the same as AB indexing into memory on every access. what i meant is that the indirection to access a given windowed register using the window base is free (speed wise) in hardware, but doing the same thing in software costs. but comparing the hardware design with a pure software design is not too useful. the costs and goals are so different. some conceptual analogies are useful but saying register windows are fast in hardware (BTW that is risc stuff) but not a speed win in software (where parrot is leaning towards looking cisc-like) is not relevant. uri -- Uri Guttman - [EMAIL PROTECTED] -- http://www.sysarch.com SYStems ARCHitecture and Stem Development -- http://www.stemsystems.com Learn Advanced Object Oriented Perl from Damian Conway - Boston, July 10-11 Class and Registration info: http://www.sysarch.com/perl/OOP_class.html