> If array index is a constant, then offset is constant too. If array index is > a variable, than offset is product of element_size and offset. So there is a > runtime mul in later case.
This is a very interesting point. Given that what I'm currently doing is building a stack machine bytecode interpreter (to replace my AST-walking interpreter) - and already seeing some serious perfomance upgrade, I'm trying to use every trick available, so these subtle details do matter. Currently, my simple "stack" is pretty much like that: #[###################################################### Constants ======================================================]# const MAX_STACK_SIZE = 10_000_000 #[###################################################### Types ======================================================]# type Stack[T] = array[MAX_STACK_SIZE,T] Value = uint64 #[###################################################### Global variables ======================================================]# var MainStack* : Stack[Value] # my main stack MSP* : int # pointer to the last element #[###################################################### Implementation ======================================================]# template push*(v: Value) = inc(MSP); MainStack[MSP] = v template pop*(): Value = dec(MSP); MainStack[MSP+1] template popN*(x: int) = dec(MSP,x) template top*(x:int=0): Value = MainStack[MSP-x] Run so... normally an **ADD** instruction, in my {.computedGoto.} interpreter loop, would be something like that: case OpCode # other cases of ADD_OP: push(pop()+pop()); inc(ip) # inc(MSP); MainStack[MSP] = ((dec(MSP); MainStack[MSP+1]) + (dec(MSP); MainStack[MSP+1])) # ... Run which I'm optimizing further (I think... lol) by doing it like: case OpCode # other cases of ADD_OP: top(1) = top(0)+top(1); dec(MSP); inc(ip) # MainStack[MSP-1] = MainStack[MSP]+MainStack[MSP-1]; dec(MSP) # ... Run So... lots of different things going on... Any ideas to make it better (and more performant) are more than welcome! :)