https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101891

--- Comment #9 from Arjan van de Ven <arjan at linux dot intel.com> ---
I don't have recent measurements since we did this work quite some time ago.

basically on the CPU level (speaking for Intel style cpus at least), a CPU can
eliminate (meaning: no execution resources used) 1 to 3 (depending on
generation) register to register per clock cycle.. There's ALSO a path in the
hardware for optimizing XOR <reg><reg> sequences to avoid execution
resources... when we did both we maximized the total number of these
eliminations...
while only XOR you can get bottlenecked on execution if you have too many.
(all the mov's should have no other instructions depending on them, so even
though they depend on the XOR, they're still fully 'orphan' for the out of
order engine)

Reply via email to