Re: the Fibonacci benchmark

2018-09-30 Thread adrianv
@cblake thanks for the explanation. I will check with the profiler. From what 
you said I tried this and get almost the same results - now for gcc and g++ and 
for clang a speed up of about 4x.


proc fib(n: int): uint64 =
  if n > 1 and n != 30: return fib(n - 1) + fib(n - 2)
  if n <= 1: return 1
  let x {.global.}: auto = fib(29) + fib(28)
  return x


echo fib(46)



Run

as a side note: reordering the ifs brought about 50% speedup


Re: the Fibonacci benchmark

2018-09-29 Thread cblake
It might be a lot of things, but I'm telling you - there is a `gcc` 
optimization, very finicky about context (and probably compiler flags, too), 
that reduces function calls by almost exactly his speed-up factor. There's 
really no need to guess if I'm right, though. Just add `-pg` to the gcc flags 
(in both cases), run `gprof` on the output, and count funcalls in both cases to 
confirm.


Re: the Fibonacci benchmark

2018-09-29 Thread mratsim
Besides constant folding, there might be [code 
alignment](https://dendibakh.github.io/blog/2018/01/18/Code_alignment_issues) 
that can cause massive perf difference.


Re: the Fibonacci benchmark

2018-09-29 Thread cblake
No `gcc` doesn't -- at least not for me with a factor of about 7X, almost 
identical to @adrianv 's results. I used both an earlier gcc-7 and a gcc-8.2 
just now.

If you don't want to disassemble your binary, that's ok. You can check this 
with `gcc -pg` and `gprof` which will count calls for you.

I also gave a complete example in C that does the optimization I mention in the 
prior thread which I will reproduce here because the new forum code hides it 
behind "load more posts". With `fib(30)` I get 118793 calls. With `fib(40)` I 
get 14612525 calls. With even a low `fib(20)` I get 950 calls. Does 950 calls 
seem like compile-time calculation to you? With `fib(46)` I get 262211393 calls.


#include 

float fibonacci(int x) {
return x < 2 ? (float)x : fibonacci(x - 1) + fibonacci(x - 2);
}

void NimMainInner() {
printf("%.0f\n", fibonacci(46));
}
int main() {
#ifdef FIB_SLOW
NimMainInner();
#else
void (*volatile inner)(void) = NimMainInner;
inner();
#endif
return 0;
}


Run

If one simply calls `NimMainInner()` directly instead of through the volatile 
function pointer `inner()`, `fib(20)` generates 10942 calls instead of 950.

I understand my example uses floating point, not integer arithmetic, but 
@adrianv 's numbers make it sure seem like the same situation. @adrianv can 
also compile with `-pg` and run `gprof` to check. Or he could adapt this C and 
compile it the same way his `nim.cfg` or whatever compiles his Nim generated C.

I'm not quite sure what more evidence you would require. It isn't hard to 
calculate how many function calls doubly recursive Fibonacci requires.


Re: the Fibonacci benchmark

2018-09-29 Thread Hlaaftana
Nim doesn't evaluate fib(46), GCC does. And it doesn't get up to fib(46), it 
evaluates up to fib(34) then does the remaining calculations at runtime. I'm 
sure this exact same occurence was mentioned in an older thread.


Re: the Fibonacci benchmark

2018-09-29 Thread cblake
I am not quite sure what "original" benchmark you mean - perhaps the 4.6 s for 
Nim vs 6.0 s for C on the original github page at the top of this thread. That 
may be small scale code layout/caching/branch predictor effects as you say.

However, the difference in this thread of 8X is _almost certainly not_ 
compile-time evaluation of fib(46), nor a coincidence, nor some generalizable 
Nim is effectively faster than C { though ;) is noted! :-) }. Compile-time 
evaluation would likely be many thousands of times or more faster than that - 
faster even than the memoized/linearized versions of the calculation which are 
already O(1000x) faster. Indeed, one gets "too many iterations" trying to 
`const x = fib(46)` from the Nim VM.

The merely ~8X faster observed twice in this thread is something else which I 
have already explained in the prior thread and in this thread linking to that. 
And that explanation/approach does generalize a little, and even across 
programming languages. Certain iterative/recursive code that does very little 
work like simple adds or swaps { permutations, I'm looking at you :-) } can 
often be sped up by either manually or automatically unpacking/unrolling the 
last few recursions so that more work is done on a per-function call basis or 
said otherwise there are fewer function calls.


Re: the Fibonacci benchmark

2018-09-29 Thread zahary
Don't get overexcited. This is probably just a coincidence coming from the 
random addresses of the compiled functions and branch instructions that affect 
things like the branch prediction unit in the CPU.


Re: the Fibonacci benchmark

2018-09-29 Thread dom96
> If Nim compiles to C Nim is not faster than C but Nim is able to produce 
> faster C code than human programmers.

Effectively making it faster than C ;)


Re: the Fibonacci benchmark

2018-09-29 Thread cblake
I also got `nim c` 's code running twice as fast as `nim cpp` 's code. See some 
previous discussion: 
[https://forum.nim-lang.org/t/1779](https://forum.nim-lang.org/t/1779) \- there 
are cases where Nim's generated C code "inspires" the C compiler to "unroll the 
last few recursions" and save 4x or 8x or maybe even 16x depending the number 
of funcalls (depending on the unroll amount). It's not quite constant folding. 
disassembly and looking for `callq` can be helpful here.

I think the result is correct by the more old school definition starting from 
"1 1 2 3 5" (with indexing starting at zero), not "0 1 1 2 3 5" as per 
[https://en.wikipedia.org/wiki/Fibonacci_number](https://en.wikipedia.org/wiki/Fibonacci_number)
 { or maybe the indexing is not what @miran expects, but he uses a ";-)" }. 
Anyway, there is sort of "more than one right answer" in a couple of ways 
depending on index origin or 0 inclusion.


Re: the Fibonacci benchmark

2018-09-29 Thread miran
On my end `nim cpp` is twice _slower_ than `nim c`.

Oh and btw, whichever version of code you use — the result is not correct ;) 
Try running `fib(2)` and you'll see what I'm talking about.


Re: the Fibonacci benchmark

2018-09-29 Thread DTxplorer
If Nim compiles to C Nim is not faster to C but Nim may produce faster C code 
than human programmers.


Re: the Fibonacci benchmark

2018-09-29 Thread Hlaaftana
I think this was posted a while ago and the answer was that G++ found a way to 
constant fold Nim's C++ output. This means it probably evaluated fib(46) before 
the program ever ran. A bit disappointing, sure, but the C output benchmark is 
still very good.


the Fibonacci benchmark

2018-09-29 Thread adrianv
Hi, on twitter I saw that Nim performed better on fibonacci benchmark than C 
and C++ ([https://github.com/drujensen/fib](https://github.com/drujensen/fib)) 
and wondered why. I did my own tests and these are quite interesting:Language| 
Time in s| compiled with  
---|---|---  
Nim| 0.456| nim cpp -d:release fib.nim  
Nim| 3.845| nim c -d:release fib.nim  
C++| 3.843| g++ -O3 -o fib fib.cpp  
C| 3.902| gcc -O3 -o fib fib.c  
  
why in the world is the Nim version compiled so much faster ?

then I tried a slightly different version of Nim code


proc fib(n: uint64): uint64 =
  if n > 2.uint64 : return fib(n - 1) + fib(n - 2)
  return n


echo fib(46)


Run

Language| Time in s| compiled with  
---|---|---  
Nim| 0.450| nim cpp -d:release fib.nim  
Nim| 3.209| nim c -d:release fib.nim  
  
its nice to see Nim to perform so well here - but why is the gap between the C 
and CPP compilation for the same code?