Re: Estimation of π using Leibniz series

2017-08-15 Thread Jehan
> Are you on Mac OS X by chance? We're calling c stdlib function for pow and 
> they may be significantly faster.

Yes, but that wouldn't explain the faster Julia code (and if it did, it 
wouldn't explain the difference between Julia and Nim that others are 
observing).


Re: Estimation of π using Leibniz series

2017-08-15 Thread nvill
Indeed could this be a library issue, I get ~4.5x faster results if I link to 
musl instead of glibc.


Re: Estimation of π using Leibniz series

2017-08-15 Thread def
Maybe you have some special Nim settings in your nim.cfg, Jehan? For me it's 
~3.5 seconds with an i7 6700k for both gcc-5 and clang. Or are you using an 
older Nim release? I'm running the current devel branch. **Are you on Mac OS X 
by chance? We're calling c stdlib function for pow and they may be 
significantly faster. Optimizing pow(-1.0, ...) seems pretty reasonable.** 
Compiling for x86 instead of x86-64 also makes this twice as slow here.


Re: Estimation of π using Leibniz series

2017-08-15 Thread Jehan
What baffles me is how slow this seems to be for everyone. I get some .7 
seconds (plus or minus some random noise) for both Nim and Julia, both with 
clang, gcc-5, and gcc-7. Julia is a tick slower (around .78 seconds for Julia 
vs. .72 seconds for Nim), but nothing that breaks even the single second 
barrier. That's on a 2.5 GHz Core i7, a mid-2014 Mac, so how hardly a 
particularly powerful system.

This is with the code copied and pasted from the first post in this thread and 
no modifications applied.


Re: Estimation of π using Leibniz series

2017-08-15 Thread LeuGim
@zolem: Ah, ok, then I didn't understand you correctly that time.


Re: Estimation of π using Leibniz series

2017-08-15 Thread zolern
@LeuGim I mean that LLVM/clang's pow is much faster than gcc's pow and not just 
in this particular case pow(-1, n), but is faster in general.


Re: Estimation of π using Leibniz series

2017-08-15 Thread jxy
This is no longer Nim's problem, but just for fun, (short of importing 
intrinsics)


{.passc:"-march=native".}
import times, math

proc leibniz0(terms: int): float =
  var res = 0.0
  for n in 0..terms:
res += pow(-1,float(n))/(2*float(n)+1)
  4*res

proc leibniz1(terms: int): float =
  var res = 0.0
  for n in 0..terms:
if (n and 1) == 0: res += 1/(2*float(n)+1)
else: res += -1/(2*float(n)+1)
  4*res

proc leibniz2[N:static[int]](terms: int): float =
  const
L = 1 shl N
L2 = L shl 1
T = 10
  var t: array[L*T, float]
  let r = (terms shr N) div T
  if (terms mod (L*T)) != 0: quit 1
  var res = 1/float(2*terms+1)
  for n in 0..

Re: Estimation of π using Leibniz series

2017-08-15 Thread LeuGim
@zolern:

My point was exactly that the library function is NOT to be considered as bad 
implemetned for not optimizing this case. Such optimizations come with cost 
(additional runtime checks), yet at least they have a cost of implementing them 
(their developers effort/time), so all possible optimizations cannot be done, 
library/compiler developers should choose more probable and more sane cases 
among all possible, to optimize them (say, -2 could also be optimized just to 
instantly return 4, and -2.5 to instantly return -6.25, ..., but there's 
infinity of numbers).

And this particular case (using POW for 1, -1, ...) is not of those worth both 
runtime checks and library developers effort. If the programmer considers 
efficiency (and readability too!) a little bit, why would he write it this way? 
So what point in optimizing it? GCC does better in this case.

Yet for such special cases special (if the programmer just likes writing this 
way) just-in-time optimizations can be made, like


proc pow(x: static[float], y: float): float = (when x == -1.0: 
float([1,-1][y.int mod 2]) else: math.pow(x, y))


(yet faster with template), or with term rewriting, smth like


template optPowMinusOne{pow(-1.0, x)}(x: float): float = float([1,-1][y.int 
mod 2])


(this didn't work for me though, may be someone can point what's wrong with it).


Re: Estimation of π using Leibniz series

2017-08-15 Thread zolern
@wiffel: It is true: last updates of LLVM and Clang among other things declared 
5x faster pow execution. Nim can do nothing about this.

@LeuGim: Yes, in this particular case POW is not the best choice, but it is 
some kind worrying and unpleasant when your code depends on library function 
that is unexpectedly bad implemented.


Re: Estimation of π using Leibniz series

2017-08-14 Thread LeuGim
Using exponentiation just for interlacing 1, -1, 1, -1, ... is pointless (apart 
from mathematical formulas on paper) and should not be done, so no matter if 
some compiler optimizes it.


Re: Estimation of π using Leibniz series

2017-08-14 Thread wiffel
@zolern : I'm wondering too why the original nim version is that slow.

Using the windows version of nim on my computer (i7-6650, 3.60GHz, Windows 10 
Pro 64-bit) is already faster at running my version of the program (see below) 
then running it under _windows/bash/ubuntu_ (what I did before). The same is 
true for the _Julia_ version.

Using _clang_ instead of _gcc_ makes it almost 3x faster. Since _Julia_ is 
using _LLVM_, probably the _clang_ version uses the same (faster?) library 
functions. I'm not sure ...

**clang version on windows/mingw**: 


nim c -r -d:release --cc:clang pi.nim
...
Elapsed time: 2.055
Pi: 3.141592663589326


**gcc version on windows/mingw**: 


nim c -r -d:release pi.nim
...
Elapsed time: 5.909
Pi: 3.141592663589326


**pi.nim**: 


import times, math

proc `/`(a, b: int): float =
  float(a) / float(b)

proc leibniz(terms: int): float =
  for i in 0 .. terms:
result += ((-1)^i) / (2*i+1)
  result *= 4.0

let
  t0 = cpuTime()
  pi = leibniz(100_000_000)
  tt = cpuTime() - t0
echo("Elapsed time: ", tt)
echo("Pi: ", pi)



Re: Estimation of π using Leibniz series

2017-08-14 Thread zolern
I am still confused that Nim's pow is so unexpectedly slow: Nim just calls C 
library function pow from , wtf?

Anyway, last edition (without pow) is just fast & furious  And Nim is awesome, 
no doubt!


Re: Estimation of π using Leibniz series

2017-08-14 Thread alfrednewman
Our **lovely Nim** is outstanding !


Re: Estimation of π using Leibniz series

2017-08-14 Thread zolern
I am pretty sure that Julias's POW takes care that first argument is -1 and 
optimized it with something like MOD  You can check it, I suppose that modified 
Julia code with MOD will take pretty same time as code with POW.


Re: Estimation of π using Leibniz series

2017-08-14 Thread alfrednewman
@zolern, thank you. Your code rocks.

However, we are cheating Julia because in her code there is a **POW** instead 
**MOD**.

I will modify / rerun the .jl script just to check the result.


Re: Estimation of π using Leibniz series

2017-08-14 Thread zolern
Well, my 10 cents 


import times, math

proc leibniz(terms: int): float =
   var res = 0.0
   
   for n in 0..terms:
  res = res + (if n mod 2 == 0: 1.0 else: -1.0) / float(2 * n + 1)
   return 4*res

let t0 = cpuTime()
echo(leibniz(100_000_000))
let t1 = cpuTime()
echo "Elapsed time: ", $(t1 - t0)


  * With -d:release compile option: 0.381 seconds
  * Without -d:release: 2.711 seconds



Original "pow" version:

  * With -d:release: 7.253 seconds
  * Withoud -d:release: 10.697 seconds




Re: Estimation of π using Leibniz series

2017-08-14 Thread nvill
Julia compiles with `-march=native` by default so try passing 
`--passC:"-march=native"` to Nim. I have this in my `~/.config/nim.cfg` along 
with `--passC:"-flto"` (for release mode).


Re: Estimation of π using Leibniz series

2017-08-14 Thread Tiberium
It seems that Julia JIT is aware of SSE extensions and it uses them. GCC or 
MSVC should be emitting them too, but it depends on C code


Re: Estimation of π using Leibniz series

2017-08-14 Thread alfrednewman
Thank you all.

@wiffel/Nibbler, in my Julia test, I was using Pro version 0.6.1 64 bit running 
on Windows 10.

Now, running the same .jl code on a Windows 8 64 bit (i5 CPU 650 @ 3.20GHz, 
3193 Mhz, 2 Cores, 4 Processors) I got the following:


2.763225 seconds (1.77 k allocations: 95.291 KiB)
Pi: 3.141592663589326


After JIT compilation: 


1.863196 seconds (1.69 k allocations: 88.809 KiB)
Pi: 3.141592663589326



Re: Estimation of π using Leibniz series

2017-08-14 Thread Nibbler
I compiled the same approximate version to C with gcc optimisations on, and 
found the execution time to be roughly comparable between Nim and C. Could 
Julia's JIT be doing some sort of optimisation that shortcuts the full code 
somehow?

With the Nim version:


import times, math

proc leibniz(terms: int): float =
var res = 0.0
for n in 0..terms:
res += pow(-1.0,float(n))/(2.0*float(n)+1.0)
return 4*res

let t0 = cpuTime()
echo(leibniz(100_000_000))
let t1 = cpuTime()
echo "Elapsed time: ", $(t1 - t0)



I got these output times:

C:projectsNim>nim_version 3.141592663589326 Elapsed time: 6.541

C:projectsNim>nim_version 3.141592663589326 Elapsed time: 6.676

C:projectsNim>nim_version 3.141592663589326 Elapsed time: 6.594

While with the same C version:


#include 
#include 
#include 

double leibniz(int terms) {
double res = 0.0;
for (int i = 0; i < terms; ++i) {
res += pow(-1.0, (double)i) / (2.0 * (double)i + 1.0);
}
return 4*res;
}

int main() {
clock_t start = clock();
double x = leibniz(1);
printf("%.15f\n", x);
printf("Time elapsed: %f\n", ((double)clock() - start) / 
CLOCKS_PER_SEC);
}



The times taken were (EDIT: used -Ofast instead and got faster times):

C:projectsc>c_version 3.141592643589326 Time elapsed: 6.206000

C:projectsc>c_version 3.141592643589326 Time elapsed: 6.204000

C:projectsc>c_version 3.141592643589326 Time elapsed: 6.217000

I realise I actually got a slightly different decimal number with C, but to be 
honest I am not a C programmer so I am sure I did something wrong in the 
formatting.


Re: Estimation of π using Leibniz series

2017-08-14 Thread wiffel
@alfrednewman

I tried to replicate your test. On my computer the following (slightly modified 
version) of the _nim_ program and the _julia_ program have the same runtime.

Whatever I do, I fail to run the _julia_ version in less than 2 seconds (as you 
had). Are you sure that test went OK?


import times, math

proc `/`(a, b: int): float =
  float(a) / float(b)

proc leibniz(terms: int): float =
  for i in 0..terms:
result += (-1)^i / (2*i+1)
  result *= 4.0

let
  t0 = cpuTime()
  pi = leibniz(100_000_000)
  tt = cpuTime() - t0
echo("Elapsed time: ", tt)
echo("Pi: ", pi)


gives


>> nim c -d:release pi.nim
>> time ./pi
Elapsed time: 5.671875
Pi: 3.141592663589326

real0m5.706s
user0m5.672s
sys 0m0.000s


and


function leibniz(terms)
  res = 0.0
  for i in 0:terms
res += (-1.0)^i/(2.0*i+1.0)
  end
  return res * 4.0
end

println("Pi: ", @time leibniz(100_000_000))


gives


>> time julia pi.jl
  5.770856 seconds (4.48 k allocations: 226.962 KB)
Pi: 3.141592663589326

real0m6.538s
user0m6.594s
sys 0m0.234s



Re: Estimation of π using Leibniz series

2017-08-14 Thread LeuGim
At least so: `res += float([1,-1][n mod 2])/(2.0*float(n)+1.0)`.


Re: Estimation of π using Leibniz series

2017-08-14 Thread andrea
That call to `pow` to change sign may be a possible reason of slowdown. 


Estimation of π using Leibniz series

2017-08-14 Thread alfrednewman
Hello,

How can I optimize the speed of the following proc:


import times, math

proc leibniz(terms: int): float =
var res = 0.0
for n in 0..terms:
res += pow(-1.0,float(n))/(2.0*float(n)+1.0)
return 4*res

let t0 = cpuTime()
echo(leibniz(100_000_000))
let t1 = cpuTime()
echo "Elapsed time: ", $(t1 - t0)


I have the following result in my computer: 3.141592663589326 Elapsed time: 8.23

This result is almost 5x faster than my CPython counter party, but on the other 
hand it is around 6x slower than Julia, given the following code: 


function leibniz(terms)
  res = 0.0
  for i in 0:terms
res += (-1.0)^i/(2.0*i+1.0)
  end
  return res *= 4.0
end

println("Pi: ", @time leibniz(100_000_000))


1.374829 seconds (1.72 k allocations: 90.561 KiB) Pi: 3.141592663589326