On Friday, June 17, 2016 at 4:48:06 PM UTC+2, gordo...@gmail.com wrote:
>
> I don't see the point to the exercise as far as optimizing golang is 
> concerned.


It is a general rule that using more registers results in faster code.
 

> Your experiment just shows that Your compiler (GCC?)


My post containing the example mentions that the compiler is clang 3.9.
 

> missed an optimization as far as reducing backend latency goes.
>

It is significant because it indicates that the clang compiler might be 
completely missing the general rule which I mentioned above.

You may also find that swapping the order of some of the instructions such 
> as the second and the third in the loop may also reduce backend latency 
> further.
>

Swapping the order of instruction in the example results in the same or 
lower performance.

I am not on a high end Intel CPU now, but when I was I found that with a 
> buffer size adjusted to the L1 cache size (8192 32-bit words or 32 
> Kilobytes) that eratspeed ran on an Intel 2700K @ 3.5 GHz at about 3.5 
> clock cycles per loop (about 405,000,000 loops for this range). 
>
> My current AMD Bulldozer CPU has a severe cache bottleneck and can't come 
> close to this speed by a factor of about two.


Which Bulldozer version do you have: original Bulldozer, Piledriver, 
Steamroller/Kaveri or Excavator/Carrizo?

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to