On 1/8/2019 3:51 PM, Guy Sotomayor Jr via cctalk wrote:
Some architectures (I’m thinking of the latest Intel CPUs) have a small loop
cache
whose aim is to keep a loop entirely within that cache. That cache operates at
the
full speed of the instruction fetch/execute (actually I think it keeps the
decoded uOps)
cycles (e.g. you can’t go faster). L1 caches impose a penalty and of course
there is
the instruction decode time as well both of which are avoided.
TTFN - Guy
I bet I/O loops throw every thing off.
Ben.