2024-03-28 Thread Salih Dincer via Digitalmars-d-learn

On Friday, 29 March 2024 at 00:04:14 UTC, Serg Gini wrote:

On Thursday, 28 March 2024 at 23:15:26 UTC, Salih Dincer wrote:
There is no such thing as parallel programming in D anyway. At 
least it has modules, but I didn't see it being works. 
Whenever I use toys built in foreach() it always ends in 

I think it just works :)
Which issues did you have with it?

A year has passed and I have tried almost everything! Either it 
went into an infinite loop or nothing changed at the speed. At 
least things are not as simple as openMP on the D side! First I 
tried this code snippet: futile attempt!

struct RowlandSequence {
  import std.numeric : gcd;
  import std.format : format;
  import std.conv : text;

  long b, r, a = 3;
  enum empty = false;

  string[] front() {
string result = format("%s, %s", b, r);
return [text(a), result];

  void popFront() {
long result = 1;
while(result == 1) {
  result = gcd(r++, b);
  b += result;
a = result;

enum BP {
  f = 1, b = 7, r = 2, a = 1, /*
  f = 109, b = 186837516, r = 62279173, //*/
  s = 5

void main()
  RowlandSequence rs;
  long start, skip;

  with(BP) {
rs = RowlandSequence(b, r);
start = f;
skip = s;

  import std.stdio, std.parallelism;
  import std.range : take;

  auto rsFirst128 = rs.take(128);
  foreach(r; rsFirst128.parallel)
if(r[0].length > skip)
  start.writeln(": ", r);

2024-03-28 Thread Serg Gini via Digitalmars-d-learn

On Thursday, 28 March 2024 at 23:15:26 UTC, Salih Dincer wrote:
There is no such thing as parallel programming in D anyway. At 
least it has modules, but I didn't see it being works. Whenever 
I use toys built in foreach() it always ends in disappointment

I think it just works :)
Which issues did you have with it?

2024-03-28 Thread Salih Dincer via Digitalmars-d-learn

On Thursday, 28 March 2024 at 20:18:10 UTC, rkompass wrote:

I didn't know that OpenMP programming could be that easy.
Binary size is 16K, same order of magnitude, although somewhat 

D advantage is gone here, I would say.

There is no such thing as parallel programming in D anyway. At 
least it has modules, but I didn't see it being works. Whenever I 
use toys built in foreach() it always ends in disappointment :)


2024-03-28 Thread Sergey via Digitalmars-d-learn

On Thursday, 28 March 2024 at 20:18:10 UTC, rkompass wrote:

D advantage is gone here, I would say.

It's hard to compare actually.
Std.parallelism has a bit different mechanics, and I think easier 
to use. The syntax is nicer.

OpenMP is an well-known and highly adopted tool, which is also 
quite flexible, but usually used with initially sequential code. 
And the syntax is not very intuitive.

Interesting point from Dr Russel here:

However since 2012 OpenMP also got some development and 
improvement and HPC world is pretty conservative. So it is one of 
the most popular tool in the area:
With MPI.. But probably with AI and GPU revolution the balance 
will shift a bit to CUDA-like technologies.

2024-03-28 Thread rkompass via Digitalmars-d-learn

On Thursday, 28 March 2024 at 14:07:43 UTC, Salih Dincer wrote:

On Thursday, 28 March 2024 at 11:50:38 UTC, rkompass wrote:

Turning back to this: Are there similarly simple libraries for 
C, that allow for

parallel computation?

You can achieve parallelism in C using libraries such as 
OpenMP, which provides a set of compiler directives and runtime 
library routines for parallel programming.

Here’s an example of how you might modify the code to use 
OpenMP for parallel processing:

 . . .

  #pragma omp parallel for reduction(+:result)
  for (int s = ITERS; s >= 0; s -= STEPS) {
result += leibniz(s);
 . . . ```
To compile this code with OpenMP support, you would use a 
command like gcc -fopenmp your_program.c. This tells the GCC 
compiler to enable OpenMP directives. The #pragma omp parallel 
for directive tells the compiler to parallelize the loop, and 
the reduction clause is used to safely accumulate the result 
variable across multiple threads.


Nice, thank you.
It worked endlessly until I saw I had to correct the `for` to
  `for (int s = ITERS; s > ITERS-STEPS; s--)`
Now the result is:
Execution time: 0.212483 (seconds).
This result is sooo similar!

I didn't know that OpenMP programming could be that easy.
Binary size is 16K, same order of magnitude, although somewhat 

D advantage is gone here, I would say.

2024-03-28 Thread Salih Dincer via Digitalmars-d-learn

On Thursday, 28 March 2024 at 11:50:38 UTC, rkompass wrote:

Turning back to this: Are there similarly simple libraries for 
C, that allow for

parallel computation?

You can achieve parallelism in C using libraries such as OpenMP, 
which provides a set of compiler directives and runtime library 
routines for parallel programming.

Here’s an example of how you might modify the code to use OpenMP 
for parallel processing:


#define ITERS 10
#define STEPS 31

double leibniz(int i) {
  double r = (i == ITERS) ? 0.5 * ((i % 2) ? -1.0 : 1.0) / (i * 
2.0 + 1.0) : 0.0;

  for (--i; i >= 0; i -= STEPS)
r += ((i % 2) ? -1.0 : 1.0) / (i * 2.0 + 1.0);
  return r * 4.0;

int main() {
  double start_time = omp_get_wtime();

  double result = 0.0;

  #pragma omp parallel for reduction(+:result)
  for (int s = ITERS; s >= 0; s -= STEPS) {
result += leibniz(s);

  // Calculate the time taken
  double time_taken = omp_get_wtime() - start_time;

  printf("%.16f\n", result);
  printf("%f (seconds)\n", time_taken);

  return 0;
To compile this code with OpenMP support, you would use a command 
like gcc -fopenmp your_program.c. This tells the GCC compiler to 
enable OpenMP directives. The #pragma omp parallel for directive 
tells the compiler to parallelize the loop, and the reduction 
clause is used to safely accumulate the result variable across 
multiple threads.


2024-03-28 Thread rkompass via Digitalmars-d-learn

On Thursday, 28 March 2024 at 01:09:34 UTC, Salih Dincer wrote:
Good thing you're digressing; I am 45 years old and I still 
cannot say that I am finished as a student! For me this is 
version 4 and it looks like we don't need a 3rd variable other 
than the function parameter and return value:

So we go with another digression. I discovered parallel, also 
avoided the extra variable, as suggested by Salih:

import std.range;
import std.parallelism;
import core.stdc.stdio: printf;
import std.datetime.stopwatch;

enum ITERS = 1_000_000_000;
enum STEPS = 31; // 5 is fine, even numbers (e.g. 10) may give 
bad precision (for math reason ???)

pure double leibniz(int i) {  // sum up the small values first
	double r = (i == ITERS) ? 0.5 * ((i%2) ? -1.0 : 1.0) / (i * 2.0 
+ 1.0) : 0.0;

for (--i; i >= 0; i-= STEPS)
r += ((i%2) ? -1.0 : 1.0) / (i * 2.0 + 1.0);
return r * 4.0;

void main() {
auto start = iota(ITERS, ITERS-STEPS, -1).array;
auto sw = StopWatch(AutoStart.yes);
double result = 0.0;
foreach(s; start.parallel)
result += leibniz(s);
double total_time =!"nsecs";
printf("%.16f\n", result);
printf("Execution time: %f\n", total_time / 1e9);
Execution time: 0.211667
My laptop has 6 cores and obviously 5 are used in parallel by 

The original question related to a comparison between C, D and 
Turning back to this: Are there similarly simple libraries for C, 
that allow for

parallel computation?

2024-03-27 Thread Salih Dincer via Digitalmars-d-learn

On Wednesday, 27 March 2024 at 08:22:42 UTC, rkompass wrote:
I apologize for digressing a little bit further - just to share 
insights to other learners.

Good thing you're digressing; I am 45 years old and I still 
cannot say that I am finished as a student! For me this is 
version 4 and it looks like we don't need a 3rd variable other 
than the function parameter and return value:

auto leibniz_v4(int i) @nogc pure {
  double n = 0.5*((i%2) ? -1.0 : 1.0) / (i * 2.0 + 1.0);

  while(--i >= 0)
n += ((i%2) ? -1.0 : 1.0) / (i * 2.0 + 1.0);

  return n * 4.0;
} /*
3.141592653589 793238462643383279502884197169399375105
3.14159365359077420 (v1)
Avg execution time: 0.33


2024-03-27 Thread rkompass via Digitalmars-d-learn
I apologize for digressing a little bit further - just to share 
insights to other learners.

I had the question, why my binary was so big (> 4M), discovered 

`gdc -Wall -O2 -frelease -shared-libphobos` options (now >200K).
Then I tried to avoid GC, just learnt about this: The GC in the 
Leibnitz code is there only for the writeln. With a change to 
(again standard C) printf the
`@nogc` modifier can be applied, the binary then gets down to 
~17K, a comparable size of the C counterpart.

Another observation regarding precision:
The iteration proceeds in the wrong order. Adding small 
contributions first and bigger last leads to less loss when 
summing up the small parts below the final real/double LSB limit.

So I'm now at this code (abolishing the avarage of 20 interations 
as unnesseary)

// import std.stdio;  // writeln will lead to the garbage 
collector to be included

import core.stdc.stdio: printf;
import std.datetime.stopwatch;

const int ITERATIONS = 1_000_000_000;

@nogc pure double leibniz(int it) {  // sum up the small values 

  double n = 0.5*((it%2) ? -1.0 : 1.0) / (it * 2.0 + 1.0);
  for (int i = it-1; i >= 0; i--)
n += ((i%2) ? -1.0 : 1.0) / (i * 2.0 + 1.0);
  return n * 4.0;

@nogc void main() {
double result;
double total_time = 0;
auto sw = StopWatch(AutoStart.yes);
result = leibniz(ITERATIONS);
total_time =!"nsecs";
printf("%.16f\n", result);
printf("Execution time: %f\n", total_time / 1e9);
Execution time: 1.068111

2024-03-26 Thread Lance Bachmeier via Digitalmars-d-learn

On Tuesday, 26 March 2024 at 14:25:53 UTC, Lance Bachmeier wrote:

On Sunday, 24 March 2024 at 19:31:19 UTC, Csaba wrote:
I know that benchmarks are always controversial and depend on 
a lot of factors. So far, I read that D performs very well in 
benchmarks, as well, if not better, as C.

I wrote a little program that approximates PI using the 
Leibniz formula. I implemented the same thing in C, D and 
Python, all of them execute 1,000,000 iterations 20 times and 
display the average time elapsed.

Here are the results:

C: 0.04s
Python: 0.33s
D: 0.73s

What the hell? D slower than Python? This cannot be real. I am 
sure I am making a mistake here. I'm sharing all 3 programs 


As you can see the function that does the job is exactly the 
same in C and D.

Here are the compile/run commands used:

C: `gcc leibniz.c -lm -oleibc`
D: `gdc leibniz.d -frelease -oleibd`
Python: `python3`

PS. my CPU is AMD A8-5500B and my OS is Ubuntu Linux, if that 

As others suggested, pow is the problem. I noticed that the C 
versions are often much faster than their D counterparts. (And 
I don't view that as a problem, since both are built into the 
language - my only thought is that the D version should call 
the C version).


import std.math:pow;


import core.stdc.math: pow;

and leaving everything unchanged, I get

C: Avg execution time: 0.007918
D (original): Avg execution time: 0.102612
D (using core.stdc.math): Avg execution time: 0.008134

So more or less the exact same numbers if you use 

And then the other thing is changing

const int BENCHMARKS = 20;


enum BENCHMARKS = 20;

which should allow substitution of the constant directly into the 
rest of the program, which gives

Avg execution time: 0.007564

On my Ubuntu 22.04 machine, therefore, the LDC binary with no 
flags is slightly faster than the C code compiled with your flags.

2024-03-26 Thread Lance Bachmeier via Digitalmars-d-learn

On Sunday, 24 March 2024 at 19:31:19 UTC, Csaba wrote:
I know that benchmarks are always controversial and depend on a 
lot of factors. So far, I read that D performs very well in 
benchmarks, as well, if not better, as C.

I wrote a little program that approximates PI using the Leibniz 
formula. I implemented the same thing in C, D and Python, all 
of them execute 1,000,000 iterations 20 times and display the 
average time elapsed.

Here are the results:

C: 0.04s
Python: 0.33s
D: 0.73s

What the hell? D slower than Python? This cannot be real. I am 
sure I am making a mistake here. I'm sharing all 3 programs 


As you can see the function that does the job is exactly the 
same in C and D.

Here are the compile/run commands used:

C: `gcc leibniz.c -lm -oleibc`
D: `gdc leibniz.d -frelease -oleibd`
Python: `python3`

PS. my CPU is AMD A8-5500B and my OS is Ubuntu Linux, if that 

As others suggested, pow is the problem. I noticed that the C 
versions are often much faster than their D counterparts. (And I 
don't view that as a problem, since both are built into the 
language - my only thought is that the D version should call the 
C version).


import std.math:pow;


import core.stdc.math: pow;

and leaving everything unchanged, I get

C: Avg execution time: 0.007918
D (original): Avg execution time: 0.102612
D (using core.stdc.math): Avg execution time: 0.008134

So more or less the exact same numbers if you use core.stdc.math.

2024-03-26 Thread Salih Dincer via Digitalmars-d-learn

On Monday, 25 March 2024 at 14:02:08 UTC, rkompass wrote:

Of course you may also combine the up(+) and down(-) step to 

1/i - 1/(i+2) = 2/(i*(i+2))

double leibniz(int iter) {
  double n = 0.0;
  for (int i = 1; i < iter; i+=4)
n += 2.0 / (i * (i+2.0));
  return n * 4.0;
or even combine both approaches. But of, course mathematically 
much more is possible. This was not about approximating pi as 
fast as possible...

The above first approach still works with the original speed, 
only makes the result a little bit nicer.

It's obvious that you are a good mathematician. You used sequence 
A005563.  First of all, I must apologize to the questioner for 
digressing from the topic. But I saw that there is a calculation 
difference between real and double. My goal was to see if there 
would be a change in speed.  For example, with 250 million cycles 
(iter/4) I got the following result:

3.14159265158976691 (250 5million (with real)
3.14159264457621568 (250 million with double)
3.14159265358979324 (std.math.constants.PI)

First of all, my question is: Why do we see this calculation 
error with double?  Could the changes I made to the algorithm 
have caused this?  Here's an executable code snippet:

enum step = 4;
enum loop = 250_000_000;

auto leibniz(T)(int iter)
  T n = 2/3.0;
  for(int i = 5; i < iter; i += step)
T a = (2.0 + i) * i; //
n += 2/a;
  return n * step;

import std.stdio : writefln;

void main()
  enum iter = loop * step-10;

  iter.leibniz!double.writefln!"%.17f (double)";
  iter.leibniz!real.writefln!"%.17f (real)";

  imported!"std.math".PI.writefln!"%.17f (enum)";
} /* Prints:

3.14159264457621568 (double)
3.14159265158976689 (real)
3.14159265358979324 (enum)

In fact, there are algorithms that calculate accurately up to 12 
decimal places with fewer cycles. (e.g. )


2024-03-26 Thread Csaba via Digitalmars-d-learn

On Sunday, 24 March 2024 at 21:21:13 UTC, kdevel wrote:

Usually you do not translate mathematical expressions directly 
into code:

   n += pow(-1.0, i - 1.0) / (i * 2.0 - 1.0);

The term containing the `pow` invocation computes the 
alternating sequence -1, 1, -1, ..., which can be replaced by 

   immutable int [2] sign = [-1, 1];
   n += sign [i & 1] / (i * 2.0 - 1.0);

This saves the expensive call to the pow function.

I know that the code can be simplified/optimized, I just wanted 
to compare the same expression in C and D.

2024-03-25 Thread rkompass via Digitalmars-d-learn

On Sunday, 24 March 2024 at 23:02:19 UTC, Sergey wrote:

On Sunday, 24 March 2024 at 22:16:06 UTC, rkompass wrote:
Are there some simple switches / settings to get a smaller 

1) If possible you can use "betterC" - to disable runtime
2) otherwise
--release --O3 --flto=full -fvisibility=hidden 
-defaultlib=phobos2-ldc-lto,druntime-ldc-lto -L=-dead_strip 
-L=-x -L=-S -L=-lz


Thank you. I succeeded with `gdc -Wall -O2 -frelease 

A little remark:
The approximation to pi is slow, but oscillates up and down much 
more than its average. So doing the average of 2 steps gives many 
more precise digits. We can simulate this by doing a last step 
with half the size:

double leibniz(int it) {
  double n = 1.0;
  for (int i = 1; i < it; i++)
n += ((i%2) ? -1.0 : 1.0) / (i * 2.0 + 1.0);
  n += 0.5*((it%2) ? -1.0 : 1.0) / (it * 2.0 + 1.0);
  return n * 4.0;
Of course you may also combine the up(+) and down(-) step to one:

1/i - 1/(i+2) = 2/(i*(i+2))

double leibniz(int iter) {
  double n = 0.0;
  for (int i = 1; i < iter; i+=4)
n += 2.0 / (i * (i+2.0));
  return n * 4.0;
or even combine both approaches. But of, course mathematically 
much more is possible. This was not about approximating pi as 
fast as possible...

The above first approach still works with the original speed, 
only makes the result a little bit nicer.

2024-03-24 Thread Salih Dincer via Digitalmars-d-learn

On Sunday, 24 March 2024 at 22:16:06 UTC, Kdevel wrote:
The term containing the `pow` invocation computes the 
alternating sequence -1, 1, -1, ..., which can be replaced by 

   immutable int [2] sign = [-1, 1];
   n += sign [i & 1] / (i * 2.0 - 1.0);

This saves the expensive call to the pow function.

I also used this code:
import std.stdio : writefln;
import std.datetime.stopwatch;

enum ITERATIONS = 1_000_000;
enum BENCHMARKS = 20;

auto leibniz(bool speed = true)(int iter) {
  double n = 1.0;

  static if(speed) const sign = [-1, 1];

  for(int i = 2; i < iter; i++) {
static if(speed) {
  const m = i << 1;
  n += sign [i & 1] / (m - 1.0);
} else {
  n += pow(-1, i - 1) / (i * 2.0 - 1.0);
  return n * 4.0;

auto pow(F, G)(F x, G n) @nogc @trusted pure nothrow {
import std.traits : Unsigned, Unqual;

real p = 1.0, v = void;
Unsigned!(Unqual!G) m = n;

if(n < 0) {
if(n == -1) return 1 / x;
m = cast(typeof(m))(0 - n);
v = p / x;
} else {
switch(n) {
  case 0: return 1.0;
  case 1: return x;
  case 2: return x * x;
v = x;
while(true) {
if(m & 1) p *= v;
m >>= 1;
if(!m) break;
v *= v;
return p;

void main()
double result;
long total_time = 0;

for(int i = 0; i < BENCHMARKS; i++)
auto sw = StopWatch(;

result = ITERATIONS.leibniz;//!false;

total_time +=!"nsecs";

writefln("Avg execution time: %f\n", total_time / BENCHMARKS 
/ 1e9);


and results:

dmd -run "leibnizTest.d"
Avg execution time: 0.002005

If I compile with leibniz!false(ITERATIONS) the average execution 
time increases slightly:

Avg execution time: 0.044435

However, if you pay attention, it is not connected to an external 
library and a power function that works with integers is used. 
Normally the following function of the library should be called:

Unqual!(Largest!(F, G)) pow(F, G)(F x, G y) @nogc @trusted pure 

if (isFloatingPoint!(F) && isFloatingPoint!(G))

Now, the person asking the question will ask why it is slow even 
though we use exactly the same codes in C; rightly. You may think 
that the more watermelon you carry in your arms, the slower you 
naturally become. I think the important thing is not to drop the 
watermelons :)


2024-03-24 Thread Sergey via Digitalmars-d-learn

On Sunday, 24 March 2024 at 22:16:06 UTC, rkompass wrote:
Are there some simple switches / settings to get a smaller 

1) If possible you can use "betterC" - to disable runtime
2) otherwise
--release --O3 --flto=full -fvisibility=hidden 
-defaultlib=phobos2-ldc-lto,druntime-ldc-lto -L=-dead_strip -L=-x 
-L=-S -L=-lz


2024-03-24 Thread rkompass via Digitalmars-d-learn
The term containing the `pow` invocation computes the 
alternating sequence -1, 1, -1, ..., which can be replaced by 

   immutable int [2] sign = [-1, 1];
   n += sign [i & 1] / (i * 2.0 - 1.0);

This saves the expensive call to the pow function.

I used the loop:
for (int i = 1; i < iter; i++)
n += ((i%2) ? -1.0 : 1.0) / (i * 2.0 + 1.0);
in both C and D, with gcc and gdc and got average execution times:

--- C -
original:     loop replacement:   -O2:
0.009989   0.003198 ... 0.001335

--- D -
original:  loop replacement:  -O2:
0.230346      0.003083   ...   0.001309

almost no difference.

But the D binary is much larger on my Linux:
 4600920 bytes instead of 15504 bytes for the C version.

Are there some simple switches / settings to get a smaller binary?

2024-03-24 Thread kdevel via Digitalmars-d-learn

On Sunday, 24 March 2024 at 19:31:19 UTC, Csaba wrote:
I know that benchmarks are always controversial and depend on a 
lot of factors. So far, I read that D performs very well in 
benchmarks, as well, if not better, as C.

I wrote a little program that approximates PI using the Leibniz 
formula. I implemented the same thing in C, D and Python, all 
of them execute 1,000,000 iterations 20 times and display the 
average time elapsed.

Here are the results:

C: 0.04s
Python: 0.33s
D: 0.73s

What the hell? D slower than Python? This cannot be real. I am 
sure I am making a mistake here. I'm sharing all 3 programs 


Usually you do not translate mathematical expressions directly 
into code:

   n += pow(-1.0, i - 1.0) / (i * 2.0 - 1.0);

The term containing the `pow` invocation computes the alternating 
sequence -1, 1, -1, ..., which can be replaced by e.g.

   immutable int [2] sign = [-1, 1];
   n += sign [i & 1] / (i * 2.0 - 1.0);

This saves the expensive call to the pow function.

2024-03-24 Thread Sergey via Digitalmars-d-learn

On Sunday, 24 March 2024 at 19:31:19 UTC, Csaba wrote:
As you can see the function that does the job is exactly the 
same in C and D.

Not really..

The speed of Leibniz algo is mostly the same. You can check the 
code in this benchmark for example:

What you could fix in your code:
* you can use enum for BENCHMARKS and ITERATIONS
* use pow from core.stdc.math
* use sw.reset() in a loop

So the main part could look like this:
auto sw = StopWatch(;
foreach (i; 0..BENCHMARKS) {
result += leibniz(ITERATIONS);
total_time +=!"nsecs";

2024-03-24 Thread matheus via Digitalmars-d-learn

On Sunday, 24 March 2024 at 19:31:19 UTC, Csaba wrote:


Here are the results:

C: 0.04s
Python: 0.33s
D: 0.73s


I think a few things can be going on, but one way to go is trying 
using optimization flags like "-O2", and run again.

But anyway, looking through Assembly generated:


The Leibniz's function is very close each other, except for one 
thing, the "pow" function on D side. It's a template, maybe you 
should start from there, in fact I'd try the pow from C to see 
what happens.


Why is this code slow?

2024-03-24 Thread Csaba via Digitalmars-d-learn
I know that benchmarks are always controversial and depend on a 
lot of factors. So far, I read that D performs very well in 
benchmarks, as well, if not better, as C.

I wrote a little program that approximates PI using the Leibniz 
formula. I implemented the same thing in C, D and Python, all of 
them execute 1,000,000 iterations 20 times and display the 
average time elapsed.

Here are the results:

C: 0.04s
Python: 0.33s
D: 0.73s

What the hell? D slower than Python? This cannot be real. I am 
sure I am making a mistake here. I'm sharing all 3 programs here:


As you can see the function that does the job is exactly the same 
in C and D.

Here are the compile/run commands used:

C: `gcc leibniz.c -lm -oleibc`
D: `gdc leibniz.d -frelease -oleibd`
Python: `python3`

PS. my CPU is AMD A8-5500B and my OS is Ubuntu Linux, if that 