Oh. One more thing:

Have no idea if that helps or not - I am no expert by far. I would think
> though that
>
>     int a = myObj.mVar;
>
> turning into
>
>     dptr = ld(myObj, FORWARDING_OFFSET)  ;; myObj points to the reference
> object (flags, pointer, ref-count, etc.)
>     a = ld(dptr, MVAR_OFFSET) ;;
>
> is better than the branch.
>
Quite the opposite. A not-taken branch is *lots* faster than the indirect
load, for two reasons:

1. The chained loads here are serializing. The second one cannot proceed
until the first completes. They could conceivably be 100 cycles each.
2. A test on the pointer value followed by a not-taken branch can generally
be retired using otherwise unconsumed execution slots in a superscalar
machine. If the branch is correctly predicted as not taken, it won't
disturb the instruction pipeline, which is the main source of cost in a
branch instruction.


shap
_______________________________________________
bitc-dev mailing list
[email protected]
http://www.coyotos.org/mailman/listinfo/bitc-dev

Reply via email to