Dan Sugalski <[EMAIL PROTECTED]> wrote: > Okay, at the moment I'm working on getting an implementation of > classes and objects working. I'm also taking a look at calling speed, > as I'd really like to not suck with our call times. :)
So first some numbers WRT speed: Based on calling a bare subroutine (.Sub) with some variations: set I20, 1000000 set I21, 0 new P0, .Sub set_addr I22, func set P0, I22 set I0, 0 # no prototype set I2, 0 # no PMC params set I3, 0 # void context lp: saveall # 1) invoke restoreall # 2) inc I21 lt I21, I20, lp end func: ret 1) and 2) omitted: 0.2s (all: -O3 compiled imcc) 1) and 2) as above: 1.2s 1) + 2) = 4 * halfpopX: 1.0s perl 5.8.0: 1.25s BTW: Putting the loop label before the "new P0, .Sub" more then doubles the execution time (2.5s). Cachegrind of course states that the memcpy in the register push/pop is the culprit, the pushN/popN take almost double the time of the other. I think, there was some discussion ago, if we couldn't use sliding register windows e.g.: P0 ... P15, P16 ... P31 ^regp ^^^^^^^^^^^ caller fills regs+16 according to pdd03 P0 .... P15, P16 ... P31 ^^^^^^^^^^^ called sub receives params like pdd03 ^regp return values like pdd03 P0 ... P15, P16 ... P31 ^regp return values are in pdd03 + 16 Or probably better: P0 .. P9, P10 .. P21, P22 .. P31 incoming local outgoing P0 .. P9, P10 .. P21, P22 .. P31 incoming local outgoing This would need one additional redirection for register access. But for saving and restoring registers, we would just move the register pointer by x*sizeof(reg). A memcpy would only be necessary on register frame boundarys - or not when we reallocate frames. leo