TL, As peterGo, I was unable to reproduce your findings:
uname -a Linux 4.8.0-30-generic #32-Ubuntu SMP Fri Dec 2 03:43:27 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux go version go version go1.7.4 linux/amd64 cat /proc/cpuinfo CPU Intel(R) Core(TM) i7-6560U CPU @ 2.20GHz go test -bench=. [...] BenchmarkMemclr_2000000-4 3000 421532 ns/op BenchmarkLoop_2000000-4 2000 791318 ns/op So memclr is ~2x faster on my machine. In order to see what actually happens, lets use the pprof tool: go test -bench=. -cpuprofile test.prof Then `go tool pprof test.prof`, and `top 5` (sanity check): flat flat% sum% cum cum% 1.69s 57.88% 57.88% 1.69s 57.88% _/tmp/goperf.memsetLoop 1.22s 41.78% 99.66% 1.22s 41.78% runtime.memclr So far so good, memsetloop and the _runtime_ memclr are being called. Going down the rabbit hole, lets look at the assembly: (pprof) disasm memsetLoop Total: 2.92s ROUTINE ======================== _/tmp/goperf.memsetLoop 1.69s 1.69s (flat, cum) 57.88% of Total . . 46d770: MOVQ 0x10(SP), AX . . 46d775: MOVQ 0x8(SP), CX . . 46d77a: MOVL 0x20(SP), DX . . 46d77e: XORL BX, BX . . 46d780: CMPQ AX, BX . . 46d783: JGE 0x46d790 400ms 400ms 46d785: MOVL DX, 0(CX)(BX*4) 1.14s 1.14s 46d788: INCQ BX 150ms 150ms 46d78b: CMPQ AX, BX . . 46d78e: JL 0x46d785 Standard loop, and definitively not using vectorized instructions (explains the difference on my CPU) For comparison, the finely hand-tuned memclr implementation is at https://golang.org/src/runtime/memclr_amd64.s (my computer being fairly recent, it takes full advantage of the large registers available). Can you try to perform the same exercise on your hardware? It will likely shed some lights on the peculiar results you are experiencing. Regards RD -- You received this message because you are subscribed to the Google Groups "golang-nuts" group. To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.