Re: [go-nuts] Re: memclr optimazation does worse?

Michael Jones Thu, 15 Dec 2016 16:57:27 -0800

go version go1.7.4 linux/amd64

BenchmarkMemclr_100-36         500000000        31.5 ns/op
BenchmarkLoop_100-36           200000000        71.7 ns/op
BenchmarkMemclr_1000-36       50000000       257 ns/op
BenchmarkLoop_1000-36         20000000       612 ns/op
BenchmarkMemclr_10000-36       5000000      2675 ns/op
BenchmarkLoop_10000-36         2000000      6280 ns/op
BenchmarkMemclr_100000-36      500000     39956 ns/op
BenchmarkLoop_100000-36        200000     66346 ns/op
BenchmarkMemclr_200000-36      200000     79805 ns/op
BenchmarkLoop_200000-36        100000    132527 ns/op
BenchmarkMemclr_300000-36      200000    119613 ns/op
BenchmarkLoop_300000-36        100000    198872 ns/op
BenchmarkMemclr_400000-36      100000    160355 ns/op
BenchmarkLoop_400000-36         50000    265406 ns/op
BenchmarkMemclr_500000-36      100000    199190 ns/op
BenchmarkLoop_500000-36         50000    331522 ns/op
BenchmarkMemclr_1000000-36       50000    398051 ns/op
BenchmarkLoop_1000000-36         20000    663510 ns/op
BenchmarkMemclr_2000000-36       20000    796084 ns/op
BenchmarkLoop_2000000-36         10000   1326865 ns/op


Uniformly better on my AWS test system:

Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                36
On-line CPU(s) list:   0-35
Thread(s) per core:    2
Core(s) per socket:    9
Socket(s):             2
NUMA node(s):          2
Vendor ID:             GenuineIntel
CPU family:            6
Model:                 63
Model name:            Intel(R) Xeon(R) CPU E5-2666 v3 @ 2.90GHz
Stepping:              2
CPU MHz:               3199.968
BogoMIPS:              6101.39
Hypervisor vendor:     Xen
Virtualization type:   full
L1d cache:             32K
L1i cache:             32K
L2 cache:              256K
L3 cache:              25600K
NUMA node0 CPU(s):     0-8,18-26
NUMA node1 CPU(s):     9-17,27-35

On Thu, Dec 15, 2016 at 12:18 AM, rd <rd6...@gmail.com> wrote:

> TL,
>
> As peterGo, I was unable to reproduce your findings:
>
> uname -a
> Linux 4.8.0-30-generic #32-Ubuntu SMP Fri Dec 2 03:43:27 UTC 2016 x86_64
> x86_64 x86_64 GNU/Linux
>
> go version
> go version go1.7.4 linux/amd64
>
> cat /proc/cpuinfo
> CPU Intel(R) Core(TM) i7-6560U CPU @ 2.20GHz
>
> go test -bench=.
> [...]
> BenchmarkMemclr_2000000-4           3000        421532 ns/op
> BenchmarkLoop_2000000-4               2000        791318 ns/op
>
> So memclr is ~2x faster on my machine.
>
> In order to see what actually happens, lets use the pprof tool:
> go test -bench=. -cpuprofile test.prof
>
> Then `go tool pprof test.prof`, and `top 5` (sanity check):
>       flat  flat%   sum%        cum   cum%
>      1.69s 57.88% 57.88%      1.69s 57.88%  _/tmp/goperf.memsetLoop
>      1.22s 41.78% 99.66%      1.22s 41.78%  runtime.memclr
>
> So far so good, memsetloop and the _runtime_ memclr are being called.
>
> Going down the rabbit hole, lets look at the assembly:
>  (pprof) disasm memsetLoop
> Total: 2.92s
> ROUTINE ======================== _/tmp/goperf.memsetLoop
>      1.69s      1.69s (flat, cum) 57.88% of Total
>          .          .     46d770: MOVQ 0x10(SP), AX
>          .          .     46d775: MOVQ 0x8(SP), CX
>          .          .     46d77a: MOVL 0x20(SP), DX
>          .          .     46d77e: XORL BX, BX
>          .          .     46d780: CMPQ AX, BX
>          .          .     46d783: JGE 0x46d790
>      400ms      400ms     46d785: MOVL DX, 0(CX)(BX*4)
>      1.14s      1.14s     46d788: INCQ BX
>      150ms      150ms     46d78b: CMPQ AX, BX
>          .          .     46d78e: JL 0x46d785
>
> Standard loop, and definitively not using vectorized instructions
> (explains the difference on my CPU)
>
> For comparison, the finely hand-tuned memclr implementation is at
> https://golang.org/src/runtime/memclr_amd64.s (my computer being fairly
> recent, it takes full advantage of the large registers available).
>
> Can you try to perform the same exercise on your hardware? It will likely
> shed some lights on the peculiar results you are experiencing.
>
> Regards
> RD
>
> --
> You received this message because you are subscribed to the Google Groups
> "golang-nuts" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to golang-nuts+unsubscr...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.
>



-- 
Michael T. Jones
michael.jo...@gmail.com

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: [go-nuts] Re: memclr optimazation does worse?

Reply via email to