There is a point I want to add to the excellent discussion.

When your benchmark is small, it becomes far more susceptible to which
particular compiler optimizations that triggered. This means even small
optimizations which manage to reduce instruction count, memory bandwidth
usage, and register pressure provides a very large speedup. Likewise for
breaking long data dependency chains, thus increasing ILP in the code and
making more units in the CPU able to do work.

Small fluctuations in the instruction mix can yield very large speedups all
of a sudden.

There is another important point which is that of parallelism vs
concurrency. The problem is "embarrasingly parallel" and has no concurrency
at all. Yet, almost all the hard things, even in parallel settings, happen
when you need to communicate information between
threads/goroutines/fibers/processes. This benchmark cannot really measure
that. It isn't trivially given you are utilizing libuv at all in the C
setting either, due to this.

On Wed, Jan 29, 2020 at 10:01 PM Michael Jones <michael.jo...@gmail.com>
wrote:

> Following up on this, consider how long it actually takes to compute F(40)
> in the normal iterative way, single threaded -- this is the natural
> amount of work under discussion.
>
> iterative:
> BenchmarkFibonacci40-36     51974828        21.9 ns/op       0 B/op
> 0 allocs/op
>
> 22 nanoseconds, about a 52 millionth of a second. Doing 40 evaluations
> takes 40 times longer (single threaded) but let's just go with the single
> evaluation for now.
>
> recursive (after changing commented-out call to FibonacciR):
> BenchmarkFibonacci40-36           3 353790912 ns/op       0 B/op       0
> allocs/op
>
> 353,790,912 nanoseconds, about a third of a second. The program is not
> doing more essential work down the Fibonacci path, but is doing a great
> deal more redundant work because of the many redundant calls with the same
> arguments.
>
> f(40) is 102,334,155 and requires 39 additions the first way and
> 204,668,309 recursive function calls (2 f(n) - 1) and 102,334,154 additions
> the second way (all but the near-leaf elements of which have an addition)
> When we divide the 353,790,912 ns/op by the 204,668,309 function calls per
> op (up from one in the iterative case) we get 1.73 ns per function call,
> including the 102,334,154 required additions.
>
> Now I realize that the original question is about C performance vs Go
> performance under this storm of redundant function calls and additions, and
> that question further complicated by various ways and means of concurrent
> evaluation. Still, in any benchmark it is important to understand what is
> actually happening, what is being tested, and especially for this kind of
> microbenchmark, to understand all of this well enough to decode the meaning
> of it all as it might apply to a real program.
>
> Here is my opinion of the meaning as one might understand from this single
> line of testing:
>
> We can do more than 578 million real (not inlined) stack-pushing and
> popping function calls per second because they take 1.72 ns each even with
> the if tests and the addition of stack values. This is a lot of
> function-call intensity and it is hard to imagine anything other than
> recursive Fibonacci or Ackerman function evaluation that would do so many
> each doing so little.
>
>
>
> If it is true that any other language is really 2x faster here (or 10x or
> whatever), then when the body of the function is more than an if-test and
> an add, say an FFT or a sin() calculation, or writing to memory, or heaven
> forbid an I/O or OS interaction, or anything else "real" that implicitly
> takes 100 or 100k, or 100m the time of the function call, then the relative
> speeds will be immeasurably close to 1.
>
> This is the problem with the microbenchmark. It is important, but mostly
> to the compiler writers and CPU architects. For people who write programs
> to do things, the real benchmark is timing those things on machine A vs
> machine B, or language A vs language B, or whatever.
>
> Also, and perhaps not obvious, is that for so-called cloud providers and
> operators of huge data centers for internal purposes, the real benchmark is
> even more indirect, generally timing of those things * watts per second or
> $ per second. (Both across millions of CPUs in some cases of personal
> knowledge.) It is just not possible to understand such things by
> extrapolating from microbenchmarks. Not a defense of Go just a word to the
> wise.
>
> Michael
>
>
> On Tue, Jan 28, 2020 at 8:02 AM nilsocket <nilsoc...@gmail.com> wrote:
>
>> Thanks a lot for your answer.
>>
>> Thank you.
>>
>> On Tuesday, January 28, 2020 at 9:15:58 PM UTC+5:30, Ian Lance Taylor
>> wrote:
>>>
>>> On Tue, Jan 28, 2020 at 7:33 AM nilsocket <nils...@gmail.com> wrote:
>>>
>>>> Attached two files main.c and main.go,
>>>> both try to compute fib(40), 20 times on `x` no.of threads/goroutines.
>>>> fib(40) is recursive.
>>>>
>>>> *Prerequisites for C:*
>>>>
>>>>    1. libuv (https://github.com/libuv/libuv)
>>>>
>>>>
>>>>
>>>> *Compile*:
>>>>
>>>>    - *C :*
>>>>       - *Unoptimized:*
>>>>
>>>> * gcc main.c -lpthread -luv *
>>>>
>>>>
>>>>    - *Optimized:*
>>>>       gcc-O3 main.c -lpthread -luv
>>>>
>>>>
>>>>    - *Go :*
>>>>       - go build
>>>>
>>>>
>>>> *Run:*
>>>>
>>>>    -
>>>>
>>>> *C:time UV_THREADPOOL_SIZE=x ./a.out // substitute `x` with physical
>>>>    core count *
>>>>    -
>>>>
>>>> *GO:./temp x // substitute `x` with physical core count *
>>>>
>>>>
>>>> Results:
>>>>
>>>> Thread CountC (Optimized)C (Unoptimized)GO
>>>> 1 4.38s 11.36s 11.7s
>>>> 4 1.12s 3.1s 2.9s
>>>> 8 1.1s 2.9s 2.7s
>>>>
>>>>
>>>> Laptop with 4 physical cores(8 logical cores).
>>>>
>>>> Why can't go provide Optimization flag?
>>>> I understand go developers want compiler to be simple, but the
>>>> difference seems too big to leave it out.
>>>>
>>>> But isn't there any possibility to abstract the complexity and provide
>>>> optimization flag?
>>>>
>>>
>>>
>>> Go doesn't provide an optimization flag because it always optimizes.
>>>
>>> Your program amounts to a microbenchmark of the tail recursion
>>> optimization.  The Go compiler doesn't optimize tail recursion, because it
>>> confuses stack traces and breaks runtime.Callers.  The C compiler has no
>>> such constraints.  You might be interested in
>>> https://golang.org/issue/22624.
>>>
>>> There will always be microbenchmarks in which C code executes faster
>>> than Go code.  Microbenchmarks can be useful if analyzed with care, but the
>>> real question for performance is always how a real program behaves.
>>>
>>> Ian
>>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "golang-nuts" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to golang-nuts+unsubscr...@googlegroups.com.
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/golang-nuts/5d21abd1-1910-4bbb-a24e-6080ceaf0c1c%40googlegroups.com
>> <https://groups.google.com/d/msgid/golang-nuts/5d21abd1-1910-4bbb-a24e-6080ceaf0c1c%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>>
>
>
> --
>
> *Michael T. jonesmichael.jo...@gmail.com <michael.jo...@gmail.com>*
>
> --
> You received this message because you are subscribed to the Google Groups
> "golang-nuts" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to golang-nuts+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/golang-nuts/CALoEmQw3r9evYQpQH91TAF7R3HBuA7eNaDA7pEqBpyxjGiNmuw%40mail.gmail.com
> <https://groups.google.com/d/msgid/golang-nuts/CALoEmQw3r9evYQpQH91TAF7R3HBuA7eNaDA7pEqBpyxjGiNmuw%40mail.gmail.com?utm_medium=email&utm_source=footer>
> .
>


-- 
J.

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/golang-nuts/CAGrdgiXF3sgLxHi-Afw90wr72qmJ8N2Yq04X%2Bk4Aya7q7_dT%3DA%40mail.gmail.com.

Reply via email to