Go does a good job of making it easy to do the right thing, especially in tests (such as parallel benchmarks or setting env variables), and avoiding the need for tricks like package level variables or noinline directives seems a useful feature.
On Mon, 7 Jun 2021, at 4:47 PM, peterGo wrote: > Ian, > > I don't know whether it is feasible. It is unnecessary. > > A benchmark should be treated as a scientific experiment. If we do that with > Paul's benchmarks and write them in scientific form, then we get the expected > results. > > $ benchstat xpt.txt > name time/op > PopCountSimple-4 5.71ns ± 0% > PopCountSlowSimple-4 5.70ms ± 0% > PopCountAlmostSimple/BenchmarkPopCountAlmostSimple-4 5.70ns ± 0% > PopCountNearlySimple/PopCountNearlySimple-4 5.69ns ± 0% > $ > > Peter > > > On Monday, June 7, 2021 at 6:14:05 AM UTC-4 Ian Davis wrote: >> __ >> Would it be feasible for the Go tool to disable inlining and deadcode >> elimination of code within the bodies of Benchmarks and Tests? Not the code >> under test of course. It could be as crude as disabling these optimizations >> for files in _test.go files. >> >> >> >> On Sun, 6 Jun 2021, at 1:33 PM, Paul S. R. Chisholm wrote: >>> Thanks! I'd seen the "dead code elimination" comment somewhere and >>> questioned it, but not enough. >>> >>> If I worry about what some future Go compiler might optimize, I end up >>> worrying quite a lot! For example, could this code: >>> >>> func BenchmarkPopCountAlive(b *testing.B) { >>> sum = 0 >>> for i := 0; i < b.N; i++ { >>> sum += PopCount(0x1234567890abcdef) >>> } >>> } >>> >>> hypothetically be optimized to: >>> >>> func BenchmarkPopCountAlive(b *testing.B) { >>> sum = PopCount(0x1234567890abcdef) * b.N >>> } >>> >>> since PopCount() always returns the same value for the same argument? It >>> probably wouldn't, since that would break many existing benchmarks. >>> >>> Maybe more reasonably worrisome: If sum is initialized and incremented but >>> never read, could all references to it be optimized away? That's easy >>> enough to avoid, as is the optimization I worried about above: >>> >>> var sum = 0 // Avoid dead code elimination >>> const firstInput uint64 = 0x1234567890abcdef >>> >>> func BenchmarkPopCountSimple(b *testing.B) { >>> for i, input := 0, firstInput; i < b.N; i++ { >>> sum += PopCount(input) >>> input++ >>> } >>> if sum == 0 { >>> b.Error("BenchmarkPopCountSimple: sum == 0") >>> } >>> } >>> >>> Here are my new benchmark results: >>> >>> $ go version >>> go version go1.16.5 windows/amd64 >>> $ go test -cpu=1 -bench=. >>> goos: windows >>> goarch: amd64 >>> pkg: gopl.io/popcount >>> cpu: Intel(R) Core(TM) i5 CPU 760 @ 2.80GHz >>> BenchmarkPopCountSimple 271004790 4.419 ns/op >>> BenchmarkPopCountSlowSimple 278 4235890 ns/op >>> BenchmarkPopCountAlmostSimple/BenchmarkPopCountAlmostSimple >>> 272759134 4.434 ns/op >>> BenchmarkPopCountNearlySimple/PopCountNearlySimple >>> 145613077 8.172 ns/op >>> PASS >>> ok gopl.io/popcount 7.061s >>> >>> The anomalistically-low "simple" and "almost simple" results are now more >>> reasonable, but still nearly a factor of two lower than the "nearly simple" >>> results. They're still inlining the calls, as is the "slow simple" >>> benchmark, where the "nearly simple" one isn't: >>> >>> $ go test -cpu=1 -bench=. -gcflags=-m 2>&1 | egrep 'inlining call to >>> PopCount' >>> .\popcount_test.go:10:18: inlining call to PopCount >>> .\popcount_test.go:22:19: inlining call to PopCount >>> .\popcount_test.go:34:19: inlining call to PopCount >>> >>> Overkill, right? --PSRC >>> >>> P.S.: For better or worse, I discovered I can -- but shouldn't, based on >>> the feedback I got! -- get "fancy" formatting by copying from vscode and >>> pasting into Gmail. >>> >>> >>> >>> On Tuesday, June 1, 2021 at 12:01:51 PM UTC-4 peterGo wrote: >>>> Here is a more definitive reply than my original reply. >>>> >>>> >>>> I got as far as this >>>> func BenchmarkPopCountSimple(b *testing.B) { >>>> sum := 0 // Avoid dead code elimination. >>>> for i := 0; i < b.N; i++ { >>>> sum += PopCount(0x1234567890abcdef) >>>> } >>>> } >>>> >>>> As you can see from the objdump, BenchmarkPopCountSimple dead code is >>>> eliminated. >>>> >>>> func BenchmarkPopCountSimple(b *testing.B) { >>>> 0x4e38c0 31c9 XORL CX, CX >>>> >>>> for i := 0; i < b.N; i++ { >>>> 0x4e38c2 eb03 JMP 0x4e38c7 >>>> 0x4e38c4 48ffc1 INCQ CX >>>> 0x4e38c7 48398890010000 CMPQ CX, 0x190(AX) >>>> 0x4e38ce 7ff4 JG 0x4e38c4 >>>> } >>>> 0x4e38d0 c3 RET >>>> I added an additional BenchmarkPopCountAlive benchmark. >>>> >>>> var sum = 0 // Avoid dead code elimination. >>>> >>>> func BenchmarkPopCountAlive(b *testing.B) { >>>> sum = 0 >>>> >>>> for i := 0; i < b.N; i++ { >>>> sum += PopCount(0x1234567890abcdef) >>>> } >>>> } >>>> >>>> As you can see from the objdump, BenchmarkPopCountAlive code is not >>>> eliminated. >>>> >>>> For details, see the popcount.txt attachment. >>>> >>>> Peter >>>> >>>> >>>> On Monday, May 31, 2021 at 6:29:27 PM UTC-4 Paul S. R. Chisholm wrote: >>>>> This is not a serious problem, but it surprised me. (By the way, how can >>>>> I post a message with code formatting?) >>>>> >>>>> I'd like to create table-driven benchmarks: >>>>> https://blog.golang.org/subtests. >>>>> >>>>> To that end, I started with code from *The Go Programming Language*: >>>>> >>>>> // PopCount is based on an example from chapter 2 of THE GO PROGRAMMING >>>>> LANGUAGE. >>>>> // Copyright © 2016 Alan A. A. Donovan & Brian W. Kernighan. >>>>> // License: https://creativecommons.org/licenses/by-nc-sa/4.0/ >>>>> >>>>> package popcount >>>>> >>>>> // pc[i] is the population count of i. >>>>> var pc [256]byte >>>>> >>>>> func init() { >>>>> for i := range pc { >>>>> pc[i] = pc[i/2] + byte(i&1) >>>>> } >>>>> } >>>>> >>>>> // PopCount returns the population count (number of set bits) of x. >>>>> func PopCount(x uint64) int { >>>>> return int(pc[byte(x>>(0*8))] + >>>>> pc[byte(x>>(1*8))] + >>>>> pc[byte(x>>(2*8))] + >>>>> pc[byte(x>>(3*8))] + >>>>> pc[byte(x>>(4*8))] + >>>>> pc[byte(x>>(5*8))] + >>>>> pc[byte(x>>(6*8))] + >>>>> pc[byte(x>>(7*8))]) >>>>> } >>>>> >>>>> I then wrote a simple test: >>>>> >>>>> // popcount_simple_test.go >>>>> package popcount >>>>> >>>>> import "testing" >>>>> >>>>> func BenchmarkPopCountSimple(b *testing.B) { >>>>> sum := 0 // Avoid dead code elimination. >>>>> for i := 0; i < b.N; i++ { >>>>> sum += PopCount(0x1234567890abcdef) >>>>> } >>>>> } >>>>> >>>>> Because I was paranoid about dead code elimination, I wrote an identical >>>>> test that calls the function many times (inside the loop to call it b.N >>>>> times, which should make no difference if the hypothetically-still-dead >>>>> code is being eliminated): >>>>> >>>>> // popcount_slow_simple_test.go >>>>> package popcount >>>>> >>>>> import "testing" >>>>> >>>>> func BenchmarkPopCountSlowSimple(b *testing.B) { >>>>> sum := 0 // Avoid dead code elimination. >>>>> for i := 0; i < b.N; i++ { >>>>> // Exactly as before, but call the function many times. >>>>> for j:= 0; j < 1_000_000; j++ { >>>>> sum += PopCount(0x1234567890abcdef) >>>>> } >>>>> } >>>>> } >>>>> >>>>> Then I wrote an "almost simple" test that uses testing.B.Run() but is >>>>> hardwired to call the function being benchmarked: >>>>> >>>>> // popcount_almost_simple_test.go >>>>> package popcount >>>>> >>>>> import "testing" >>>>> >>>>> func BenchmarkPopCountAlmostSimple(b *testing.B) { >>>>> b.Run("BenchmarkPopCountAlmostSimple", func(b *testing.B) { >>>>> sum := 0 // Avoid dead code elimination. >>>>> for i := 0; i < b.N; i++ { >>>>> sum += PopCount(0x1234567890abcdef) >>>>> } >>>>> }) >>>>> } >>>>> >>>>> Finally, I wrote a "nearly-simple" test that passes arguments to Run() >>>>> that could have come from a table: >>>>> >>>>> // popcount_nearly_simple.go >>>>> package popcount >>>>> >>>>> import "testing" >>>>> >>>>> func BenchmarkPopCountNearlySimple(b *testing.B) { >>>>> f := PopCount >>>>> name := "PopCountNearlySimple" >>>>> b.Run(name, func(b *testing.B) { >>>>> sum := 0 // Avoid dead code elimination. >>>>> for i := 0; i < b.N; i++ { >>>>> sum += f(0x1234567890abcdef) >>>>> } >>>>> }) >>>>> } >>>>> >>>>> The simple and almost-simple results are nearly identical (happily, the >>>>> slow results are not), but the nearly-simple results are an order of >>>>> magnitude slower: >>>>> >>>>> $ go version >>>>> go version go1.16.4 windows/amd64 >>>>> $ go test -cpu=1 -bench=. >>>>> goos: windows >>>>> goarch: amd64 >>>>> pkg: gopl.io/popcount >>>>> cpu: Intel(R) Core(TM) i5 CPU 760 @ 2.80GHz >>>>> BenchmarkPopCountAlmostSimple/BenchmarkPopCountAlmostSimple >>>>> 1000000000 0.6102 ns/op >>>>> BenchmarkPopCountNearlySimple/PopCountNearlySimple >>>>> 194925662 6.197 ns/op >>>>> BenchmarkPopCountSimple >>>>> 1000000000 0.5994 ns/op >>>>> BenchmarkPopCountSlowSimple >>>>> 1953 606194 ns/op >>>>> PASS >>>>> ok gopl.io/popcount 4.534s >>>>> >>>>> After reading this article: >>>>>> https://medium.com/a-journey-with-go/go-inlining-strategy-limitation-6b6d7fc3b1be >>>>> I ran: >>>>> >>>>> $ go test -cpu=1 -bench=. -gcflags=-m 2>&1 | egrep 'inlining call to >>>>> PopCount' >>>>> .\popcount_almost_simple_test.go:9:19: inlining call to PopCount >>>>> .\popcount_simple_test.go:8:18: inlining call to PopCount >>>>> .\popcount_slow_simple_test.go:10:19: inlining call to PopCount >>>>> >>>>> That makes sense. 0.6 ns is less than a function call. The simple and >>>>> almost-simple benchmarks can inline the call to the hardwired function, >>>>> but the nearly-simple benchmark can't. >>>>> >>>>> In practice, this isn't much of a problem. Any function small enough to >>>>> be inlined is unlikely to be a performance bottleneck. If it ever is, a >>>>> non-table-driven benchmark can still measure it. >>>>> >>>>> Hope this helps. --PSRC >>>>> >>>>> P.S.: Out of curiosity, how can I post a message with fancy code examples >>>>> like this one? >>>>> https://groups.google.com/g/golang-nuts/c/5DgtH2Alt_I/m/hlsqdRSGAgAJ >>> >>> -- >>> You received this message because you are subscribed to the Google Groups >>> "golang-nuts" group. >>> To unsubscribe from this group and stop receiving emails from it, send an >>> email to golang-nuts...@googlegroups.com. >>> To view this discussion on the web visit >>> https://groups.google.com/d/msgid/golang-nuts/CAG%3D3cjnebauo-VDiX%2BQjPVnnSji1DA0DtxsTKkOcTW9eVfTN8g%40mail.gmail.com >>> >>> <https://groups.google.com/d/msgid/golang-nuts/CAG%3D3cjnebauo-VDiX%2BQjPVnnSji1DA0DtxsTKkOcTW9eVfTN8g%40mail.gmail.com?utm_medium=email&utm_source=footer>. >>> >>> >>> *Attachments:* >>> * popcount_test.go >> > > -- > You received this message because you are subscribed to the Google Groups > "golang-nuts" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to golang-nuts+unsubscr...@googlegroups.com. > To view this discussion on the web visit > https://groups.google.com/d/msgid/golang-nuts/ce3ef331-4156-4b29-97f8-c73456a7b6f7n%40googlegroups.com > > <https://groups.google.com/d/msgid/golang-nuts/ce3ef331-4156-4b29-97f8-c73456a7b6f7n%40googlegroups.com?utm_medium=email&utm_source=footer>. > > > *Attachments:* > * popcount_test.go -- You received this message because you are subscribed to the Google Groups "golang-nuts" group. To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/golang-nuts/9f2292de-5aeb-4a3e-9185-2f674910b3b3%40www.fastmail.com.