Dan Sugalski <[EMAIL PROTECTED]> wrote:
> Okay, at the moment I'm working on getting an implementation of
> classes and objects working. I'm also taking a look at calling speed,
> as I'd really like to not suck with our call times. :)

So first some numbers WRT speed:
Based on calling a bare subroutine (.Sub) with some variations:

        set I20, 1000000
        set I21, 0
        new P0, .Sub
        set_addr I22, func
        set P0, I22
        set I0, 0       # no prototype
        set I2, 0       # no PMC params
        set I3, 0       # void context
lp:
        saveall         # 1)
        invoke
        restoreall      # 2)
        inc I21
        lt I21, I20, lp
        end
func:
        ret

 1) and 2) omitted:       0.2s  (all: -O3 compiled imcc)
 1) and 2) as above:      1.2s
 1) + 2) = 4 * halfpopX:  1.0s
 perl 5.8.0:              1.25s

BTW: Putting the loop label before the "new P0, .Sub" more then doubles
the execution time (2.5s).

Cachegrind of course states that the memcpy in the register push/pop is
the culprit, the pushN/popN take almost double the time of the other.

I think, there was some discussion ago, if we couldn't use sliding
register windows e.g.:

  P0 ... P15, P16 ... P31
  ^regp       ^^^^^^^^^^^ caller fills regs+16 according to pdd03
              P0 .... P15, P16 ... P31
              ^^^^^^^^^^^ called sub receives params like pdd03
              ^regp
                          return values like  pdd03
  P0 ... P15, P16 ... P31
  ^regp                   return values are in pdd03 + 16


Or probably better:

  P0 .. P9, P10 .. P21, P22 .. P31
  incoming   local       outgoing
                        P0 .. P9, P10 .. P21, P22 .. P31
                         incoming   local       outgoing

This would need one additional redirection for register access. But for
saving and restoring registers, we would just move the register pointer
by x*sizeof(reg). A memcpy would only be necessary on register frame
boundarys - or not when we reallocate frames.

leo

Reply via email to