On 1/8/2019 3:51 PM, Guy Sotomayor Jr via cctalk wrote:
Some architectures (I’m thinking of the latest Intel CPUs) have a small loop 
cache
whose aim is to keep a loop entirely within that cache.  That cache operates at 
the
full speed of the instruction fetch/execute (actually I think it keeps the 
decoded uOps)
cycles (e.g. you can’t go faster).  L1 caches impose a penalty and of course 
there is
the instruction decode time as well both of which are avoided.

TTFN - Guy

I bet I/O loops throw every thing off.
Ben.

Reply via email to