> If array index is a constant, then offset is constant too. If array index is 
> a variable, than offset is product of element_size and offset. So there is a 
> runtime mul in later case.

This is a very interesting point.

Given that what I'm currently doing is building a stack machine bytecode 
interpreter (to replace my AST-walking interpreter) - and already seeing some 
serious perfomance upgrade, I'm trying to use every trick available, so these 
subtle details do matter.

Currently, my simple "stack" is pretty much like that:
    
    
    #[######################################################
        Constants
      ======================================================]#
    
    const
        MAX_STACK_SIZE = 10_000_000
    
    #[######################################################
        Types
      ======================================================]#
    
    type
        Stack[T]    = array[MAX_STACK_SIZE,T]
        
        Value       = uint64
    
    #[######################################################
        Global variables
      ======================================================]#
    
    var
        MainStack*  : Stack[Value]   # my main stack
        MSP*        : int            # pointer to the last element
    
    #[######################################################
        Implementation
      ======================================================]#
    
    template push*(v: Value)         = inc(MSP); MainStack[MSP] = v
    template pop*(): Value           = dec(MSP); MainStack[MSP+1]
    
    template popN*(x: int)           = dec(MSP,x)
    
    template top*(x:int=0): Value    = MainStack[MSP-x]
    
    
    Run

so... normally an **ADD** instruction, in my {.computedGoto.} interpreter loop, 
would be something like that:
    
    
    case OpCode
        # other cases
        of ADD_OP: push(pop()+pop()); inc(ip)
        # inc(MSP); MainStack[MSP] = ((dec(MSP); MainStack[MSP+1]) + (dec(MSP); 
MainStack[MSP+1]))
        # ...
    
    
    Run

which I'm optimizing further (I think... lol) by doing it like: 
    
    
    case OpCode
        # other cases
        of ADD_OP: top(1) = top(0)+top(1); dec(MSP); inc(ip)
        # MainStack[MSP-1] = MainStack[MSP]+MainStack[MSP-1]; dec(MSP)
        # ...
    
    
    Run

So... lots of different things going on...

Any ideas to make it better (and more performant) are more than welcome! :)

Reply via email to