On Saturday, June 18, 2016 at 12:21:21 AM UTC+2, gordo...@gmail.com wrote: On Friday, June 17, 2016 at 4:48:06 PM UTC+2, gordo...@gmail.com wrote: > > I am not on a high end Intel CPU now, but when I was I found that with a > > buffer size adjusted to the L1 cache size (8192 32-bit words or 32 > > Kilobytes) that eratspeed ran on an Intel 2700K @ 3.5 GHz at about 3.5 > > clock cycles per loop (about 405,000,000 loops for this range).
> > My current AMD Bulldozer CPU has a severe cache bottleneck and can't come > > close to this speed by a factor of about two. > Which Bulldozer version do you have: original Bulldozer, Piledriver, > Steamroller/Kaveri or Excavator/Carrizo? One of the original's, a FX8120 (4 core, 8 processor) @ 3.1 GHz. Your CPU is bdver1. My CPU is bdver3. by the clock frequency, I assume you were running these tests on a high end Intel CPU? bdver3 CPU has some optimizations compared to bdver1, but I don't know whether this affects eratspeed code. Is your IPC lower than 2.80 when you compile https://play.golang.org/p/Sd6qlMQcHF with Go1.7-tip and run "perf stat --repeat=10 -- ./eratspeed-go1.7tip"? I am on Windows 7 64-bit, so don't have perf, which is specific to Linux. However, we can easily estimate the figure you request as follows: 1) The number of instructions are likely to be the same between your machine and mine as we are both using the same source code with the same compiler version with the same compiler settings. 2) The run times over the clock speed for my machine as compared to yours is about one third again, therefore my instruction rate will be 75% of yours or about 2.1 instructions per cycle. 3) The difference is almost certainly to be mostly the the backend stall time, which for your machine is 25% or 150 ms out of your run time; my run time at your clock frequency would be 200 ms more, therefore a total of 350 ms out of 800 ms so for my machine backend stall time will be about 43.75%. 4) The reason bdver1 is so bad is it has something called a Write Combining Cache (WCC) to take care of the write-through memory access of bd; which is fine and good; the problem is that WCC is only 4 Kilobytes in size, which means that for a loop like this that mostly does only writes (backend), the effective L1 cache size is only 4 Kilobytes instead of 16 Kilobytes, causing many misses and high cache latency. 5) This is even worse for multi-threading as the WCC is shared between the to processors per core, thus the effective size per processor is only 2 Kilobytes, and much of the advantage of having so many processors is wasted. 6) Later bd versions use something other than the WCC (AFAIK undisclosed in the literture) which doesn't have as bad a bottleneck although it is still there, as in your bdver3. 7) Intel processors do not have this problem as they don't have cache write-through (although they have other problems for multi-processing). 8) Without this problem, our CPU's execution time for eratspeed would be about 75% of the current time for yours and 56% for mine (not quite this good as there would still be a small amount of latency due to instructions being too adjacent. 9) We AMD loyalists can only wait and hope for AMD Zen due toward the end of this year, which is said to adopt much of the Intel design philosophy but better (as I am sure AMD hopes, too). The other interesting data point from your comparison of Clang C results and these golang results for eratspeed indicate that **on your machine**, C Clang runs in about 75% of the time of golang, or golang is about 33% slower. Part of that gain is Clang minimizing the number of (more complex) instructions, but part of that gain is also reducing backend latency by re-organizing the instructions. I don't have Clang installed on my machine, but I do have Haskell, which can use the LLLVM backend that Clang uses and produces comparable instruction rates. Golang eratspeed runs about 20% slower than that on my machine, but that is of course with the severe handicap of the high backend stalling. The difference will get much bigger in percentage for more efficient CPU's. -- You received this message because you are subscribed to the Google Groups "golang-nuts" group. To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.