Re: math.log() benchmark of first 1 billion int using std.parallelism

2014-12-23 Thread via Digitalmars-d-learn
I'm getting faster execution on java thank dmd, gdc beats it 
though.


...although, what this topic really provides is a reason for me 
to get more RAM for my next laptop. How much do you people run 
with? I had to scale the java down to 300 million to avoid dying 
with 4G memory.


Re: math.log() benchmark of first 1 billion int using std.parallelism

2014-12-23 Thread Daniel Kozak via Digitalmars-d-learn

On Tuesday, 23 December 2014 at 12:31:47 UTC, Iov Gherman wrote:


Btw. I just noticed small issue with D vs. java, you start 
messure in D before allocation, but in case of Java after 
allocation


Here is the java result for parallel processing after moving 
the start time as the first line in main. Still best result:


4 secs, 50 ms average


Java:

Exec time: 6 secs, 421 ms

LDC (-O3 -release -mcpu=native -singleobj -inline 
-boundscheck=off)


time: 5 secs, 321 ms, 877 μs, and 2 hnsecs

GDC(-O3 -frelease -march=native -finline -fno-bounds-check)

time: 5 secs, 237 ms, 453 μs, and 7 hnsecs

DMD(-O -release -inline -noboundscheck)
time: 5 secs, 107 ms, 931 μs, and 3 hnsecs

So all d compilers beat Java in my case:

but I have made some change in D version:

import std.parallelism, std.math, std.stdio, std.datetime;
import core.memory;

enum XMS = 3*1024*1024*1024; //3GB

version(GNU)
{
real mylog(double x) pure nothrow
{
double result;
double y = LN2;
asm
{
"fldl   %2\n"
"fldl   %1\n"
"fyl2x\n"
: "=t" (result) : "m" (x), "m" (y);
}

return result;
}
}
else
{
real mylog(double x) pure nothrow
{
return yl2x(x, LN2);
}
}

void main() {

GC.reserve(XMS);
auto t1 = Clock.currTime();


auto logs = new double[1_000_000_000];  
foreach(i, ref elem; taskPool.parallel(logs, 200)) {
elem = mylog(i + 1.0);
}


auto t2 = Clock.currTime();
writeln("time: ", (t2 - t1)); 
}




Re: math.log() benchmark of first 1 billion int using std.parallelism

2014-12-23 Thread via Digitalmars-d-learn

On Tuesday, 23 December 2014 at 12:26:28 UTC, Iov Gherman wrote:

And what about single threaded version?


Just ran the single thread examples after I moved time start 
before array allocation, thanks for that, good catch. Still 
better results in Java:


- java:
21 secs, 612 ms

- with std.math:
dmd: 23 secs, 994 ms
ldc: 31 secs, 668 ms
gdc: 52 secs, 576 ms

- with core.stdc.math:
dmd: 30 secs, 724 ms
ldc: 30 secs, 988 ms
gdc: time: 25 secs, 970 ms


Note that log is done in software on x86 with different levels of 
precision and with different ability to handle corner cases. It 
is therefore a very bad benchmark tool.


Re: math.log() benchmark of first 1 billion int using std.parallelism

2014-12-23 Thread Iov Gherman via Digitalmars-d-learn

Forgot to mention that I pushed my changes to github.


Re: math.log() benchmark of first 1 billion int using std.parallelism

2014-12-23 Thread Iov Gherman via Digitalmars-d-learn


Btw. I just noticed small issue with D vs. java, you start 
messure in D before allocation, but in case of Java after 
allocation


Here is the java result for parallel processing after moving the 
start time as the first line in main. Still best result:


4 secs, 50 ms average


Re: math.log() benchmark of first 1 billion int using std.parallelism

2014-12-23 Thread Iov Gherman via Digitalmars-d-learn

And what about single threaded version?


Just ran the single thread examples after I moved time start 
before array allocation, thanks for that, good catch. Still 
better results in Java:


- java:
21 secs, 612 ms

- with std.math:
dmd: 23 secs, 994 ms
ldc: 31 secs, 668 ms
gdc: 52 secs, 576 ms

- with core.stdc.math:
dmd: 30 secs, 724 ms
ldc: 30 secs, 988 ms
gdc: time: 25 secs, 970 ms


Re: math.log() benchmark of first 1 billion int using std.parallelism

2014-12-23 Thread John Colvin via Digitalmars-d-learn

On Monday, 22 December 2014 at 17:16:49 UTC, Iov Gherman wrote:

On Monday, 22 December 2014 at 17:16:05 UTC, bachmeier wrote:

On Monday, 22 December 2014 at 17:05:19 UTC, Iov Gherman wrote:

Hi Guys,

First of all, thank you all for responding so quick, it is so 
nice to see D having such an active community.


As I said in my first post, I used no other parameters to dmd 
when compiling because I don't know too much about dmd 
compilation flags. I can't wait to try the flags Daniel 
suggested with dmd (-O -release -inline -noboundscheck) and 
the other two compilers (ldc2 and gdc). Thank you guys for 
your suggestions.


Meanwhile, I created a git repository on github and I put 
there all my code. If you find any errors please let me know. 
Because I am keeping the results in a big array the programs 
take approximately 8Gb of RAM. If you don't have enough RAM 
feel free to decrease the size of the array. For java code 
you will also need to change 'compile-run.bsh' and use the 
right memory parameters.



Thank you all for helping,
Iov


Link to your repo?


Sorry, forgot about it:
https://github.com/ghermaniov/benchmarks


For posix-style threads, a per-thread workload of 200 calls to 
log seems rather small. It would interesting to see a graph of 
execution-time as a function of workgroup-size.


Traditionally one would use a workgroup size of (nElements / 
nCores) or similar, in order to get all the cores working but 
also minimise pressure on the scheduler, inter-thread 
communication and so on.


Re: math.log() benchmark of first 1 billion int using std.parallelism

2014-12-23 Thread Daniel Kozak via Digitalmars-d-learn

On Tuesday, 23 December 2014 at 10:20:04 UTC, Iov Gherman wrote:

That's very different to my results.

I see no important difference between ldc and dmd when using 
std.math, but when using core.stdc.math ldc halves its time 
where dmd only manages to get to ~80%


I checked again today and the results are interesting, on my pc 
I don't see any difference between std.math and core.stdc.math 
with ldc. Here are the results with all compilers.


- with std.math:
dmd: 4 secs, 878 ms
ldc: 5 secs, 650 ms
gdc: 9 secs, 161 ms

- with core.stdc.math:
dmd: 5 secs, 991 ms
ldc: 5 secs, 572 ms
gdc: 7 secs, 957 ms


Btw. I just noticed small issue with D vs. java, you start 
messure in D before allocation, but in case of Java after 
allocation


Re: math.log() benchmark of first 1 billion int using std.parallelism

2014-12-23 Thread Daniel Kozak via Digitalmars-d-learn

On Tuesday, 23 December 2014 at 10:39:13 UTC, Iov Gherman wrote:


These multi-threaded benchmarks can be very sensitive to their 
environment, you should try running it with nice -20 and do 
multiple passes to get a vague idea of the variability in the 
result. Also, it's important to minimise the number of other 
running processes.


I did not use the nice parameter but I always ran them multiple 
times and choose the average time. My system has very few 
running processes, minimalist ArchLinux with Xfce4 so I don't 
think the running processes are affecting in any way my tests.


And what about single threaded version?

Btw. One reason why DMD is faster is because it use fyl2x X87 
instruction


here is version for others compilers:

import std.math, std.stdio, std.datetime;

enum SIZE = 100_000_000;

version(GNU)
{
real mylog(double x) pure nothrow
{
real result;
double y = LN2;
asm
{
"fldl   %2\n"
"fldl   %1\n"
"fyl2x"
: "=t" (result) : "m" (x), "m" (y);
}
return result;
}
}
else
{
real mylog(double x) pure nothrow
{
return yl2x(x, LN2);
}
}

void main() {

auto t1 = Clock.currTime();
auto logs = new double[SIZE];

foreach (i; 0 .. SIZE)
{
logs[i] = mylog(i + 1.0);
}

auto t2 = Clock.currTime();

writeln("time: ", (t2 - t1));
}

But it is faster only on all Intel CPU, but on one of my AMD it 
is slower than core.stdc.log


Re: math.log() benchmark of first 1 billion int using std.parallelism

2014-12-23 Thread Iov Gherman via Digitalmars-d-learn


These multi-threaded benchmarks can be very sensitive to their 
environment, you should try running it with nice -20 and do 
multiple passes to get a vague idea of the variability in the 
result. Also, it's important to minimise the number of other 
running processes.


I did not use the nice parameter but I always ran them multiple 
times and choose the average time. My system has very few running 
processes, minimalist ArchLinux with Xfce4 so I don't think the 
running processes are affecting in any way my tests.


Re: math.log() benchmark of first 1 billion int using std.parallelism

2014-12-23 Thread John Colvin via Digitalmars-d-learn

On Tuesday, 23 December 2014 at 10:20:04 UTC, Iov Gherman wrote:

That's very different to my results.

I see no important difference between ldc and dmd when using 
std.math, but when using core.stdc.math ldc halves its time 
where dmd only manages to get to ~80%


I checked again today and the results are interesting, on my pc 
I don't see any difference between std.math and core.stdc.math 
with ldc. Here are the results with all compilers.


- with std.math:
dmd: 4 secs, 878 ms
ldc: 5 secs, 650 ms
gdc: 9 secs, 161 ms

- with core.stdc.math:
dmd: 5 secs, 991 ms
ldc: 5 secs, 572 ms
gdc: 7 secs, 957 ms


These multi-threaded benchmarks can be very sensitive to their 
environment, you should try running it with nice -20 and do 
multiple passes to get a vague idea of the variability in the 
result. Also, it's important to minimise the number of other 
running processes.


Re: math.log() benchmark of first 1 billion int using std.parallelism

2014-12-23 Thread Iov Gherman via Digitalmars-d-learn

That's very different to my results.

I see no important difference between ldc and dmd when using 
std.math, but when using core.stdc.math ldc halves its time 
where dmd only manages to get to ~80%


I checked again today and the results are interesting, on my pc I 
don't see any difference between std.math and core.stdc.math with 
ldc. Here are the results with all compilers.


- with std.math:
dmd: 4 secs, 878 ms
ldc: 5 secs, 650 ms
gdc: 9 secs, 161 ms

- with core.stdc.math:
dmd: 5 secs, 991 ms
ldc: 5 secs, 572 ms
gdc: 7 secs, 957 ms


Re: math.log() benchmark of first 1 billion int using std.parallelism

2014-12-23 Thread John Colvin via Digitalmars-d-learn

On Tuesday, 23 December 2014 at 07:26:27 UTC, Daniel Kozak wrote:


That's very different to my results.

I see no important difference between ldc and dmd when using 
std.math, but when using core.stdc.math ldc halves its time 
where dmd only manages to get to ~80%


What CPU do you have? On my Intel Core i3 I have similar 
experience as Iov Gherman, but on my Amd FX4200 I have same 
results as you. Seems std.math.log is not good for my AMD CPU :)


Intel Core i5-4278U


Re: math.log() benchmark of first 1 billion int using std.parallelism

2014-12-22 Thread Daniel Kozak via Digitalmars-d-learn


That's very different to my results.

I see no important difference between ldc and dmd when using 
std.math, but when using core.stdc.math ldc halves its time 
where dmd only manages to get to ~80%


What CPU do you have? On my Intel Core i3 I have similar 
experience as Iov Gherman, but on my Amd FX4200 I have same 
results as you. Seems std.math.log is not good for my AMD CPU :)




Re: math.log() benchmark of first 1 billion int using std.parallelism

2014-12-22 Thread John Colvin via Digitalmars-d-learn

On Monday, 22 December 2014 at 18:27:48 UTC, Iov Gherman wrote:

On Monday, 22 December 2014 at 17:50:20 UTC, John Colvin wrote:

On Monday, 22 December 2014 at 17:28:12 UTC, Iov Gherman wrote:
So, I did some more testing with the one processing in 
paralel:


--- dmd:
4 secs, 977 ms

--- dmd with flags: -O -release -inline -noboundscheck:
4 secs, 635 ms

--- ldc:
6 secs, 271 ms

--- gdc:
10 secs, 439 ms

I also pushed the new bash scripts to the git repository.


Flag suggestions:

ldc2 -O3 -release -mcpu=native -singleobj

gdc -O3 -frelease -march=native


Tried it, here are the results:

--- ldc:
6 secs, 271 ms

--- ldc -O3 -release -mcpu=native -singleobj:
5 secs, 686 ms

--- gdc:
10 secs, 439 ms

--- gdc -O3 -frelease -march=native:
9 secs, 180 ms


That's very different to my results.

I see no important difference between ldc and dmd when using 
std.math, but when using core.stdc.math ldc halves its time where 
dmd only manages to get to ~80%


Re: math.log() benchmark of first 1 billion int using std.parallelism

2014-12-22 Thread via Digitalmars-d-learn

On Monday, 22 December 2014 at 18:23:29 UTC, Iov Gherman wrote:

On Monday, 22 December 2014 at 18:00:18 UTC, aldanor wrote:

On Monday, 22 December 2014 at 17:28:12 UTC, Iov Gherman wrote:
So, I did some more testing with the one processing in 
paralel:


--- dmd:
4 secs, 977 ms

--- dmd with flags: -O -release -inline -noboundscheck:
4 secs, 635 ms

--- ldc:
6 secs, 271 ms

--- gdc:
10 secs, 439 ms

I also pushed the new bash scripts to the git repository.


import std.math, std.stdio, std.datetime;

--> try replacing "std.math" with "core.stdc.math".


Tried it, it is worst:
6 secs, 78 ms while the initial one was 4 secs, 977 ms and 
sometimes even better.


Strange... for me, core.stdc.math.log is about twice as fast as 
std.math.log.


Re: math.log() benchmark of first 1 billion int using std.parallelism

2014-12-22 Thread Iov Gherman via Digitalmars-d-learn

On Monday, 22 December 2014 at 17:50:20 UTC, John Colvin wrote:

On Monday, 22 December 2014 at 17:28:12 UTC, Iov Gherman wrote:

So, I did some more testing with the one processing in paralel:

--- dmd:
4 secs, 977 ms

--- dmd with flags: -O -release -inline -noboundscheck:
4 secs, 635 ms

--- ldc:
6 secs, 271 ms

--- gdc:
10 secs, 439 ms

I also pushed the new bash scripts to the git repository.


Flag suggestions:

ldc2 -O3 -release -mcpu=native -singleobj

gdc -O3 -frelease -march=native


Tried it, here are the results:

--- ldc:
6 secs, 271 ms

--- ldc -O3 -release -mcpu=native -singleobj:
5 secs, 686 ms

--- gdc:
10 secs, 439 ms

--- gdc -O3 -frelease -march=native:
9 secs, 180 ms



Re: math.log() benchmark of first 1 billion int using std.parallelism

2014-12-22 Thread Iov Gherman via Digitalmars-d-learn

On Monday, 22 December 2014 at 18:00:18 UTC, aldanor wrote:

On Monday, 22 December 2014 at 17:28:12 UTC, Iov Gherman wrote:

So, I did some more testing with the one processing in paralel:

--- dmd:
4 secs, 977 ms

--- dmd with flags: -O -release -inline -noboundscheck:
4 secs, 635 ms

--- ldc:
6 secs, 271 ms

--- gdc:
10 secs, 439 ms

I also pushed the new bash scripts to the git repository.


import std.math, std.stdio, std.datetime;

--> try replacing "std.math" with "core.stdc.math".


Tried it, it is worst:
6 secs, 78 ms while the initial one was 4 secs, 977 ms and 
sometimes even better.




Re: math.log() benchmark of first 1 billion int using std.parallelism

2014-12-22 Thread aldanor via Digitalmars-d-learn

On Monday, 22 December 2014 at 17:28:12 UTC, Iov Gherman wrote:

So, I did some more testing with the one processing in paralel:

--- dmd:
4 secs, 977 ms

--- dmd with flags: -O -release -inline -noboundscheck:
4 secs, 635 ms

--- ldc:
6 secs, 271 ms

--- gdc:
10 secs, 439 ms

I also pushed the new bash scripts to the git repository.


import std.math, std.stdio, std.datetime;

--> try replacing "std.math" with "core.stdc.math".


Re: math.log() benchmark of first 1 billion int using std.parallelism

2014-12-22 Thread John Colvin via Digitalmars-d-learn

On Monday, 22 December 2014 at 17:28:12 UTC, Iov Gherman wrote:

So, I did some more testing with the one processing in paralel:

--- dmd:
4 secs, 977 ms

--- dmd with flags: -O -release -inline -noboundscheck:
4 secs, 635 ms

--- ldc:
6 secs, 271 ms

--- gdc:
10 secs, 439 ms

I also pushed the new bash scripts to the git repository.


Flag suggestions:

ldc2 -O3 -release -mcpu=native -singleobj

gdc -O3 -frelease -march=native


Re: math.log() benchmark of first 1 billion int using std.parallelism

2014-12-22 Thread Iov Gherman via Digitalmars-d-learn

So, I did some more testing with the one processing in paralel:

--- dmd:
4 secs, 977 ms

--- dmd with flags: -O -release -inline -noboundscheck:
4 secs, 635 ms

--- ldc:
6 secs, 271 ms

--- gdc:
10 secs, 439 ms

I also pushed the new bash scripts to the git repository.


Re: math.log() benchmark of first 1 billion int using std.parallelism

2014-12-22 Thread Iov Gherman via Digitalmars-d-learn

On Monday, 22 December 2014 at 17:16:05 UTC, bachmeier wrote:

On Monday, 22 December 2014 at 17:05:19 UTC, Iov Gherman wrote:

Hi Guys,

First of all, thank you all for responding so quick, it is so 
nice to see D having such an active community.


As I said in my first post, I used no other parameters to dmd 
when compiling because I don't know too much about dmd 
compilation flags. I can't wait to try the flags Daniel 
suggested with dmd (-O -release -inline -noboundscheck) and 
the other two compilers (ldc2 and gdc). Thank you guys for 
your suggestions.


Meanwhile, I created a git repository on github and I put 
there all my code. If you find any errors please let me know. 
Because I am keeping the results in a big array the programs 
take approximately 8Gb of RAM. If you don't have enough RAM 
feel free to decrease the size of the array. For java code you 
will also need to change 'compile-run.bsh' and use the right 
memory parameters.



Thank you all for helping,
Iov


Link to your repo?


Sorry, forgot about it:
https://github.com/ghermaniov/benchmarks



Re: math.log() benchmark of first 1 billion int using std.parallelism

2014-12-22 Thread bachmeier via Digitalmars-d-learn

On Monday, 22 December 2014 at 17:05:19 UTC, Iov Gherman wrote:

Hi Guys,

First of all, thank you all for responding so quick, it is so 
nice to see D having such an active community.


As I said in my first post, I used no other parameters to dmd 
when compiling because I don't know too much about dmd 
compilation flags. I can't wait to try the flags Daniel 
suggested with dmd (-O -release -inline -noboundscheck) and the 
other two compilers (ldc2 and gdc). Thank you guys for your 
suggestions.


Meanwhile, I created a git repository on github and I put there 
all my code. If you find any errors please let me know. Because 
I am keeping the results in a big array the programs take 
approximately 8Gb of RAM. If you don't have enough RAM feel 
free to decrease the size of the array. For java code you will 
also need to change 'compile-run.bsh' and use the right memory 
parameters.



Thank you all for helping,
Iov


Link to your repo?


Re: math.log() benchmark of first 1 billion int using std.parallelism

2014-12-22 Thread Iov Gherman via Digitalmars-d-learn

Hi Guys,

First of all, thank you all for responding so quick, it is so 
nice to see D having such an active community.


As I said in my first post, I used no other parameters to dmd 
when compiling because I don't know too much about dmd 
compilation flags. I can't wait to try the flags Daniel suggested 
with dmd (-O -release -inline -noboundscheck) and the other two 
compilers (ldc2 and gdc). Thank you guys for your suggestions.


Meanwhile, I created a git repository on github and I put there 
all my code. If you find any errors please let me know. Because I 
am keeping the results in a big array the programs take 
approximately 8Gb of RAM. If you don't have enough RAM feel free 
to decrease the size of the array. For java code you will also 
need to change 'compile-run.bsh' and use the right memory 
parameters.



Thank you all for helping,
Iov


Re: math.log() benchmark of first 1 billion int using std.parallelism

2014-12-22 Thread aldanor via Digitalmars-d-learn

On Monday, 22 December 2014 at 10:40:45 UTC, Daniel Kozak wrote:
On Monday, 22 December 2014 at 10:35:52 UTC, Daniel Kozak via 
Digitalmars-d-learn wrote:


I run Arch Linux on my PC. I compiled D programs using 
dmd-2.066 and used no compile arguments (dmd prog.d)


You should try use some arguments -O -release -inline 
-noboundscheck

and maybe try use gdc or ldc should help with performance

can you post your code in all languages somewhere? I like to 
try it on

my machine :)


Btw. try use C log function, maybe it would be faster:

import core.stdc.math;


Just tried it out myself (E5 Xeon / Linux):

D version: 19.64 sec (avg 3 runs)

import core.stdc.math;

void main() {
double s = 0;
foreach (i; 1 .. 1_000_000_000)
s += log(i);
}

// build flags: -O -release

C version: 19.80 sec (avg 3 runs)

#include 

int main() {
double s = 0;
long i;
for (i = 1; i < 10; i++)
s += log(i);
return 0;
}

// build flags: -O3 -lm


Re: math.log() benchmark of first 1 billion int using std.parallelism

2014-12-22 Thread aldanor via Digitalmars-d-learn

On Monday, 22 December 2014 at 11:11:07 UTC, aldanor wrote:


Just tried it out myself (E5 Xeon / Linux):

D version: 19.64 sec (avg 3 runs)

import core.stdc.math;

void main() {
double s = 0;
foreach (i; 1 .. 1_000_000_000)
s += log(i);
}

// build flags: -O -release

C version: 19.80 sec (avg 3 runs)

#include 

int main() {
double s = 0;
long i;
for (i = 1; i < 10; i++)
s += log(i);
return 0;
}

// build flags: -O3 -lm


Replacing "import core.stdc.math" with "import std.math" in the D 
example increases the avg runtime from 19.64 to 23.87 seconds 
(~20% slower) which is consistent with OP's statement.


Re: math.log() benchmark of first 1 billion int using std.parallelism

2014-12-22 Thread Russel Winder via Digitalmars-d-learn

On Mon, 2014-12-22 at 10:12 +, Iov Gherman via Digitalmars-d-learn wrote:
> […]
> - D: 24 secs, 32 ms.
> - Java: 20 secs, 881 ms.
> - C: 21 secs
> - Go: 37 secs
> 
Without the source codes and the commands used to create and run, it 
is impossible to offer constructive criticism of the results. However a
priori the above does not surprise me. I'll wager ldc2 or gdc will 
beat dmd for CPU-bound code, so as others have said for benchmarking 
use ldc2 or gdc with all optimization on (-O3). If you used gc for Go 
then switch to gccgo (again with -O3) and see a huge performance 
improvement on CPU-bound code.

Java beating C and C++ is fairly normal these days due to the tricks 
you can play with JIT over AOT optimization. Once Java has proper 
support for GPGPU, it will be hard for native code languages to get 
any new converts from JVM.

Put the source up and I and others will try things out.
-- 
Russel.
=
Dr Russel Winder  t: +44 20 7585 2200   voip: sip:russel.win...@ekiga.net
41 Buckmaster Roadm: +44 7770 465 077   xmpp: rus...@winder.org.uk
London SW11 1EN, UK   w: www.russel.org.uk  skype: russel_winder



Re: math.log() benchmark of first 1 billion int using std.parallelism

2014-12-22 Thread Daniel Kozak via Digitalmars-d-learn
On Monday, 22 December 2014 at 10:35:52 UTC, Daniel Kozak via 
Digitalmars-d-learn wrote:


I run Arch Linux on my PC. I compiled D programs using 
dmd-2.066 and used no compile arguments (dmd prog.d)


You should try use some arguments -O -release -inline 
-noboundscheck

and maybe try use gdc or ldc should help with performance

can you post your code in all languages somewhere? I like to 
try it on

my machine :)


Btw. try use C log function, maybe it would be faster:

import core.stdc.math;


Re: math.log() benchmark of first 1 billion int using std.parallelism

2014-12-22 Thread Daniel Kozak via Digitalmars-d-learn

> I run Arch Linux on my PC. I compiled D programs using dmd-2.066 
> and used no compile arguments (dmd prog.d)

You should try use some arguments -O -release -inline -noboundscheck
and maybe try use gdc or ldc should help with performance

can you post your code in all languages somewhere? I like to try it on
my machine :)



Re: math.log() benchmark of first 1 billion int using std.parallelism

2014-12-22 Thread bachmeier via Digitalmars-d-learn

On Monday, 22 December 2014 at 10:12:52 UTC, Iov Gherman wrote:
Now, can anyone explain why this program ran faster in Java? I 
ran both programs multiple times and the results were always 
close to this execution times.


Can the implementation of log() function be the reason for a 
slower execution time in D?


I then decided to ran the same program in a single thread, a 
simple foreach/for loop. I tried it in C and Go also. This are 
the results:

- D: 24 secs, 32 ms.
- Java: 20 secs, 881 ms.
- C: 21 secs
- Go: 37 secs

I run Arch Linux on my PC. I compiled D programs using 
dmd-2.066 and used no compile arguments (dmd prog.d).
I used Oracle's Java 8 (tried 7 and 6, seems like with Java 6 
the performance is a bit better then 7 and 8).

To compile the C program I used: gcc 4.9.2
For Go program I used go 1.4

I really really like the built in support in D for parallel 
processing and how easy is to schedule tasks taking advantage 
of workUnitSize.


Thanks,
Iov


DMD is generally going to produce the slowest code. LDC and GDC 
will normally do better.


math.log() benchmark of first 1 billion int using std.parallelism

2014-12-22 Thread Iov Gherman via Digitalmars-d-learn

Hi everybody,

I am a java developer and used C/C++ only for some home projects 
so I never mastered native programming.


I am currently learning D and I find it fascinating. I was 
reading the documentation about std.parallelism and I wanted to 
experiment a bit with the example "Find the logarithm of every 
number from 1 to 10_000_000 in parallel".


So, first, I changed the limit to 1 billion and ran it. I was 
blown away by the performance, the program ran in: 4 secs, 670 ms 
and I used a workUnitSize of 200. I have an i7 4th generation 
processor with 8 cores.


Then I was curios to try the same test in Java just to see how 
much slower will that be (at least that was what I expected). I 
used Java's ExecutorService with a pool of 8 cores and created 
5_000_000 tasks, each task was calculating log() for 200 numbers. 
The whole program ran in 3 secs, 315 ms.


Now, can anyone explain why this program ran faster in Java? I 
ran both programs multiple times and the results were always 
close to this execution times.


Can the implementation of log() function be the reason for a 
slower execution time in D?


I then decided to ran the same program in a single thread, a 
simple foreach/for loop. I tried it in C and Go also. This are 
the results:

- D: 24 secs, 32 ms.
- Java: 20 secs, 881 ms.
- C: 21 secs
- Go: 37 secs

I run Arch Linux on my PC. I compiled D programs using dmd-2.066 
and used no compile arguments (dmd prog.d).
I used Oracle's Java 8 (tried 7 and 6, seems like with Java 6 the 
performance is a bit better then 7 and 8).

To compile the C program I used: gcc 4.9.2
For Go program I used go 1.4

I really really like the built in support in D for parallel 
processing and how easy is to schedule tasks taking advantage of 
workUnitSize.


Thanks,
Iov