Re: tooling quality and some random rant

Walter Bright Fri, 18 Feb 2011 23:56:03 -0800

nedbrek wrote:

Reordering happens in the scheduler. A simple model is "Fetch", "Schedule","Retire". Fetch and retire are done in program order. For code that ishitting well in the cache, the biggest bottleneck is that "4" decoder (thecomplex instruction decoder). Reducing the number of complex instructionswill be a big win here (and settling them into the 4-1-1(-1) pattern).
Of course, on anything after Core 2, the "1" decoders can handle pushes,pops, and load-ops (r+=m) (although not load-op-store (m+=r)).
Also, "macro op fusion" allows you can get a branch along with the lastinstruction in decode, potentially giving you 5 macroinstructions per cyclefrom decode. Make sure it is the flags producing instruction (cmp-br).
(I used to work for Intel :)


I can't find any Intel documentation on this. Can you point me to some?

Re: tooling quality and some random rant

Reply via email to