Following up on this, consider how long it actually takes to compute F(40) in the normal iterative way, single threaded -- this is the natural amount of work under discussion.
iterative: BenchmarkFibonacci40-36 51974828 21.9 ns/op 0 B/op 0 allocs/op 22 nanoseconds, about a 52 millionth of a second. Doing 40 evaluations takes 40 times longer (single threaded) but let's just go with the single evaluation for now. recursive (after changing commented-out call to FibonacciR): BenchmarkFibonacci40-36 3 353790912 ns/op 0 B/op 0 allocs/op 353,790,912 nanoseconds, about a third of a second. The program is not doing more essential work down the Fibonacci path, but is doing a great deal more redundant work because of the many redundant calls with the same arguments. f(40) is 102,334,155 and requires 39 additions the first way and 204,668,309 recursive function calls (2 f(n) - 1) and 102,334,154 additions the second way (all but the near-leaf elements of which have an addition) When we divide the 353,790,912 ns/op by the 204,668,309 function calls per op (up from one in the iterative case) we get 1.73 ns per function call, including the 102,334,154 required additions. Now I realize that the original question is about C performance vs Go performance under this storm of redundant function calls and additions, and that question further complicated by various ways and means of concurrent evaluation. Still, in any benchmark it is important to understand what is actually happening, what is being tested, and especially for this kind of microbenchmark, to understand all of this well enough to decode the meaning of it all as it might apply to a real program. Here is my opinion of the meaning as one might understand from this single line of testing: We can do more than 578 million real (not inlined) stack-pushing and popping function calls per second because they take 1.72 ns each even with the if tests and the addition of stack values. This is a lot of function-call intensity and it is hard to imagine anything other than recursive Fibonacci or Ackerman function evaluation that would do so many each doing so little. If it is true that any other language is really 2x faster here (or 10x or whatever), then when the body of the function is more than an if-test and an add, say an FFT or a sin() calculation, or writing to memory, or heaven forbid an I/O or OS interaction, or anything else "real" that implicitly takes 100 or 100k, or 100m the time of the function call, then the relative speeds will be immeasurably close to 1. This is the problem with the microbenchmark. It is important, but mostly to the compiler writers and CPU architects. For people who write programs to do things, the real benchmark is timing those things on machine A vs machine B, or language A vs language B, or whatever. Also, and perhaps not obvious, is that for so-called cloud providers and operators of huge data centers for internal purposes, the real benchmark is even more indirect, generally timing of those things * watts per second or $ per second. (Both across millions of CPUs in some cases of personal knowledge.) It is just not possible to understand such things by extrapolating from microbenchmarks. Not a defense of Go just a word to the wise. Michael On Tue, Jan 28, 2020 at 8:02 AM nilsocket <nilsoc...@gmail.com> wrote: > Thanks a lot for your answer. > > Thank you. > > On Tuesday, January 28, 2020 at 9:15:58 PM UTC+5:30, Ian Lance Taylor > wrote: >> >> On Tue, Jan 28, 2020 at 7:33 AM nilsocket <nils...@gmail.com> wrote: >> >>> Attached two files main.c and main.go, >>> both try to compute fib(40), 20 times on `x` no.of threads/goroutines. >>> fib(40) is recursive. >>> >>> *Prerequisites for C:* >>> >>> 1. libuv (https://github.com/libuv/libuv) >>> >>> >>> >>> *Compile*: >>> >>> - *C :* >>> - *Unoptimized:* >>> >>> * gcc main.c -lpthread -luv * >>> >>> >>> - *Optimized:* >>> gcc-O3 main.c -lpthread -luv >>> >>> >>> - *Go :* >>> - go build >>> >>> >>> *Run:* >>> >>> - >>> >>> *C:time UV_THREADPOOL_SIZE=x ./a.out // substitute `x` with physical >>> core count * >>> - >>> >>> *GO:./temp x // substitute `x` with physical core count * >>> >>> >>> Results: >>> >>> Thread CountC (Optimized)C (Unoptimized)GO >>> 1 4.38s 11.36s 11.7s >>> 4 1.12s 3.1s 2.9s >>> 8 1.1s 2.9s 2.7s >>> >>> >>> Laptop with 4 physical cores(8 logical cores). >>> >>> Why can't go provide Optimization flag? >>> I understand go developers want compiler to be simple, but the >>> difference seems too big to leave it out. >>> >>> But isn't there any possibility to abstract the complexity and provide >>> optimization flag? >>> >> >> >> Go doesn't provide an optimization flag because it always optimizes. >> >> Your program amounts to a microbenchmark of the tail recursion >> optimization. The Go compiler doesn't optimize tail recursion, because it >> confuses stack traces and breaks runtime.Callers. The C compiler has no >> such constraints. You might be interested in >> https://golang.org/issue/22624. >> >> There will always be microbenchmarks in which C code executes faster than >> Go code. Microbenchmarks can be useful if analyzed with care, but the real >> question for performance is always how a real program behaves. >> >> Ian >> > -- > You received this message because you are subscribed to the Google Groups > "golang-nuts" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to golang-nuts+unsubscr...@googlegroups.com. > To view this discussion on the web visit > https://groups.google.com/d/msgid/golang-nuts/5d21abd1-1910-4bbb-a24e-6080ceaf0c1c%40googlegroups.com > <https://groups.google.com/d/msgid/golang-nuts/5d21abd1-1910-4bbb-a24e-6080ceaf0c1c%40googlegroups.com?utm_medium=email&utm_source=footer> > . > -- *Michael T. jonesmichael.jo...@gmail.com <michael.jo...@gmail.com>* -- You received this message because you are subscribed to the Google Groups "golang-nuts" group. To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/golang-nuts/CALoEmQw3r9evYQpQH91TAF7R3HBuA7eNaDA7pEqBpyxjGiNmuw%40mail.gmail.com.