Memory is slow. While slice fits to cache, memclr is measurably faster. When slice doesn't fit cache, memclr at least not significantly faster.
I've heard, adaptive prefetching is turned on if there were 3 consequent accesses to same cache-line in increasing address order. So, perhaps optimised SSE/AVX zeroing doesn't trigger adaptive prefetch cause it uses less memory accesses. And then, it may vary much by CPU model: newer models may fix adaptive prefetch, so that memclr is great again. -- You received this message because you are subscribed to the Google Groups "golang-nuts" group. To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.