I don't see the point to the exercise as far as optimizing golang is concerned. 
 Your experiment just shows that Your compiler (GCC?) missed an optimization as 
far as reducing backend latency goes.

You may also find that swapping the order of some of the instructions such as 
the second and the third in the loop may also reduce backend latency further.

I am not on a high end Intel CPU now, but when I was I found that with a buffer 
size adjusted to the L1 cache size (8192 32-bit words or 32 Kilobytes) that 
eratspeed ran on an Intel 2700K @ 3.5 GHz at about 3.5 clock cycles per loop 
(about 405,000,000 loops for this range).

My current AMD Bulldozer CPU has a severe cache bottleneck and can't come close 
to this speed by a factor of about two.

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to