Oh. One more thing: Have no idea if that helps or not - I am no expert by far. I would think > though that > > int a = myObj.mVar; > > turning into > > dptr = ld(myObj, FORWARDING_OFFSET) ;; myObj points to the reference > object (flags, pointer, ref-count, etc.) > a = ld(dptr, MVAR_OFFSET) ;; > > is better than the branch. > Quite the opposite. A not-taken branch is *lots* faster than the indirect load, for two reasons:
1. The chained loads here are serializing. The second one cannot proceed until the first completes. They could conceivably be 100 cycles each. 2. A test on the pointer value followed by a not-taken branch can generally be retired using otherwise unconsumed execution slots in a superscalar machine. If the branch is correctly predicted as not taken, it won't disturb the instruction pipeline, which is the main source of cost in a branch instruction. shap
_______________________________________________ bitc-dev mailing list [email protected] http://www.coyotos.org/mailman/listinfo/bitc-dev
