At 03:41 PM 8/6/2001 -0700, Hong Zhang wrote:

> > >The branch instruction is wrong. It should be "branch #num".
> > >The offset should be part of instruction, not from register.
> >
> > Nope, because that kills the potential for computed relative
> > branches. (It's in there on purpose) Branches should work from
> > both constants and registers.
>
>Even so, the "branch #num" should have better performance, and
>it is part of any machine language. Since we already have jump
>instruction, do we really need the "branch %r", which can be
>simulated by "add %r, %pc, #num; jump %r".

Yeah, but that's two opcodes to execute rather than one.

Now, I'm not proposing that the branch opcode function check to see whether 
it's gotten a register of some sort or a constant. That would be wasteful 
of cycles. Rather there are a set of branch opcodes, and which you get 
depends on  which form you're using.

OTOH, if the opcode overhead's small enough, it may end up being worth it 
to have less code to execute, but I'm not sure that another reasonably 
small (and probably inlined on many platforms) function's going to make a 
difference. Though I've been wrong on that front before.

> > >The register set seems too big. It reduces cache efficiency
> > >and uses too much stack.
> >
> > Yeah, that's something I'm worried about. 64 may be too much.
> > 16 is too few, so we might split the difference and go with 32
> > to start.
>
>If we define caller-save and callee save. The 64 register may
>not be bad, as long as caller-save set is small.

At least a full push without a copy to the new frame is dead cheap, so it's 
not much of a cost.

>If we don't define caller/callee save, we can still use 64
>register. However, we need add one tag bit to each function/
>stack frame to indicate whether is big frame or small frame.
>The big frame uses 64, the small use 16. The reg set is still
>64, but the small frame does not use anything beyond 16. So
>we don't have to save/restore them.
>
>It is not just for performance, the stack size and cache
>locationality are also big issues.

Don't forget that, since we're not copying registers when we push or pop a 
full set, there's very little L1 cache involvement here. Some L2, yes, and 
I'm not thrilled about that, but little L1.

However, you do bring up a good point that often people will only want a 
few registers copied up if we use them for passing parameters. I'll add a 
semi-copy opcode to the list that handles only the low 32 registers of a 
set when a new set's allocated. (If a set's more than 32 registers) We'll 
have to benchmark to see if it's worth having a bitmap of registers to copy 
or if we should just do the whole lowest set. (Or take them in sets of 4 or 
8 or something)

                                        Dan

--------------------------------------"it's like this"-------------------
Dan Sugalski                          even samurai
[EMAIL PROTECTED]                         have teddy bears and even
                                      teddy bears get drunk

Reply via email to