Re: Error running concurrent process and storing results in array

2020-05-05 Thread drug via Digitalmars-d-learn

06.05.2020 09:43, data pulverizer пишет:

On Wednesday, 6 May 2020 at 05:50:23 UTC, drug wrote:
General advice - try to avoid using `array` and `new` in hot code. 
Memory allocating is slow in general, except if you use carefully 
crafted custom memory allocators. And that can easily be the reason of 
40% cpu usage because the cores are waiting for the memory subsystem.


I changed the Matrix object from class to struct and timing went from 
about 19 seconds with ldc2 and flags `-O5` to 13.69 seconds, but CPU 
usage is still at ~ 40% still using `taskPool.parallel(iota(n))`. The 
`.array` method is my method for the Matrix object just returning 
internal data array object so it shouldn't copy. Julia is now at about 
34 seconds (D was at about 30 seconds while just using dmd with no 
optimizations), to make things more interesting I also did an 
implementation in Chapel which is now at around 9 seconds with `--fast` 
flag.


Thing are really interesting. So there is a space to improve performance 
in 2.5 times :-)
Yes, `array` is smart enough and if you call it on another array it is 
no op.
What means `--fast` in Chapel? Do you try `--fast-math` in ldc? Don't 
know if 05 use this flag


Re: Error running concurrent process and storing results in array

2020-05-05 Thread drug via Digitalmars-d-learn

06.05.2020 09:24, data pulverizer пишет:

On Wednesday, 6 May 2020 at 05:44:47 UTC, drug wrote:


proc is already a delegate, so &proc is a pointer to the delegate, 
just pass a `proc` itself


Thanks done that but getting a range violation on z which was not there 
before.


```
core.exception.RangeError@onlineapp.d(3): Range violation

??:? _d_arrayboundsp [0x55de2d83a6b5]
onlineapp.d:3 void onlineapp.process(double, double, long, 
shared(double[])) [0x55de2d8234fd]

onlineapp.d:16 void onlineapp.main().__lambda1() [0x55de2d823658]
??:? void core.thread.osthread.Thread.run() [0x55de2d83bdf9]
??:? thread_entryPoint [0x55de2d85303d]
??:? [0x7fc1d6088668]
```



confirmed. I think that's because `proc` delegates captures `i` variable 
of `for` loop. I managed to get rid of range violation by using `foreach`:

```
foreach(i; 0..n) // instead of for(long i = 0; i < n;)
```
I guess that `proc` delegate cant capture `i` var of `foreach` loop so 
the range violation doesn't happen.


you use `proc` delegate to pass arguments to `process` function. I would 
recommend for this purpose to derive a class from class Thread. Then you 
can pass the arguments in ctor of the derived class like:

```
foreach(long i; 0..n)
new DerivedThread(double)(i), cast(double)(i + 1), i, z).start(); 
thread_joinAll();

```

not tested example of derived thread
```
class DerivedThread
{
this(double x, double y, long i, shared(double[]) z)
{
this.x = x;
this.y = y;
this.i = i;
this.z = z;
super(&run);
}
private:
void run()
{
 process(x, y, i, z);
}
double x, y;
long i;
shared(double[]) z;
}
```



Re: Error running concurrent process and storing results in array

2020-05-05 Thread data pulverizer via Digitalmars-d-learn

On Wednesday, 6 May 2020 at 05:50:23 UTC, drug wrote:
General advice - try to avoid using `array` and `new` in hot 
code. Memory allocating is slow in general, except if you use 
carefully crafted custom memory allocators. And that can easily 
be the reason of 40% cpu usage because the cores are waiting 
for the memory subsystem.


I changed the Matrix object from class to struct and timing went 
from about 19 seconds with ldc2 and flags `-O5` to 13.69 seconds, 
but CPU usage is still at ~ 40% still using 
`taskPool.parallel(iota(n))`. The `.array` method is my method 
for the Matrix object just returning internal data array object 
so it shouldn't copy. Julia is now at about 34 seconds (D was at 
about 30 seconds while just using dmd with no optimizations), to 
make things more interesting I also did an implementation in 
Chapel which is now at around 9 seconds with `--fast` flag.


Re: Error running concurrent process and storing results in array

2020-05-05 Thread data pulverizer via Digitalmars-d-learn

On Wednesday, 6 May 2020 at 05:44:47 UTC, drug wrote:


proc is already a delegate, so &proc is a pointer to the 
delegate, just pass a `proc` itself


Thanks done that but getting a range violation on z which was not 
there before.


```
core.exception.RangeError@onlineapp.d(3): Range violation

??:? _d_arrayboundsp [0x55de2d83a6b5]
onlineapp.d:3 void onlineapp.process(double, double, long, 
shared(double[])) [0x55de2d8234fd]

onlineapp.d:16 void onlineapp.main().__lambda1() [0x55de2d823658]
??:? void core.thread.osthread.Thread.run() [0x55de2d83bdf9]
??:? thread_entryPoint [0x55de2d85303d]
??:? [0x7fc1d6088668]
```



Re: Error running concurrent process and storing results in array

2020-05-05 Thread drug via Digitalmars-d-learn

06.05.2020 07:52, data pulverizer пишет:

On Wednesday, 6 May 2020 at 04:04:14 UTC, Mathias LANG wrote:

On Wednesday, 6 May 2020 at 03:41:11 UTC, data pulverizer wrote:
Yes, that's exactly what I want the actual computation I'm running is 
much more expensive and much larger. It shouldn't matter if I have 
like 100_000_000 threads should it? The threads should just be queued 
until the cpu works on it?


It does matter quite a bit. Each thread has its own resources 
allocated to it, and some part of the language will need to interact 
with *all* threads, e.g. the GC.
In general, if you want to parallelize something, you should aim to 
have as many threads as you have cores. Having 100M threads will mean 
you have to do a lot of context switches. You might want to look up 
the difference between tasks and threads.


Sorry, I meant 10_000 not 100_000_000 I square the number by mistake 
because I'm calculating a 10_000 x 10_000 matrix it's only 10_000 tasks, 
so 1 task does 10_000 calculations. The actual bit of code I'm 
parallelising is here:


```
auto calculateKernelMatrix(T)(AbstractKernel!(T) K, Matrix!(T) data)
{
   long n = data.ncol;
   auto mat = new Matrix!(T)(n, n);

   foreach(j; taskPool.parallel(iota(n)))
   {
     auto arrj = data.refColumnSelect(j).array;
     for(long i = j; i < n; ++i)
     {
   mat[i, j] = K.kernel(data.refColumnSelect(i).array, arrj);
   mat[j, i] = mat[i, j];
     }
   }
   return mat;
}
```

At the moment this code is running a little bit faster than threaded 
simd optimised Julia code, but as I said in an earlier reply to Ali when 
I look at my system monitor, I can see that all the D threads are active 
and running at ~ 40% usage, meaning that they are mostly doing nothing. 
The Julia code runs all threads at 100% and is still a tiny bit slower 
so my (maybe incorrect?) assumption is that I could get more performance 
from D. The method `refColumnSelect(j).array` is (trying to) reference a 
column from a matrix (1D array with computed index referencing) which I 
select from the matrix using:


```
return new Matrix!(T)(data[startIndex..(startIndex + nrow)], [nrow, 1]);
```

If I use the above code, I am I wrong in assuming that the sliced data 
(T[]) is referenced rather than copied? That so if I do:


```
auto myData = data[5...10];
```

myData is referencing elements [5..10] of data and not creating a new 
array with elements data[5..10] copied?


General advice - try to avoid using `array` and `new` in hot code. 
Memory allocating is slow in general, except if you use carefully 
crafted custom memory allocators. And that can easily be the reason of 
40% cpu usage because the cores are waiting for the memory subsystem.


Re: Error running concurrent process and storing results in array

2020-05-05 Thread data pulverizer via Digitalmars-d-learn

On Wednesday, 6 May 2020 at 04:52:30 UTC, data pulverizer wrote:
myData is referencing elements [5..10] of data and not creating 
a new array with elements data[5..10] copied?


Just checked this and can confirm that the data is not being 
copied so that is not the source of cpu idling: 
https://ddili.org/ders/d.en/slices.html




Re: Error running concurrent process and storing results in array

2020-05-05 Thread drug via Digitalmars-d-learn

06.05.2020 07:25, data pulverizer пишет:

On Wednesday, 6 May 2020 at 03:56:04 UTC, Ali Çehreli wrote:
On 5/5/20 8:41 PM, data pulverizer wrote:> On Wednesday, 6 May 2020 at 
03:33:12 UTC, Mathias LANG wrote:

>> On Wednesday, 6 May 2020 at 03:25:41 UTC, data pulverizer
wrote:

> Is there something I need to do to wait for each thread to
finish
> computation?

thread_joinAll(). I have an example here:

http://ddili.org/ders/d.en/concurrency.html#ix_concurrency.thread_joinAll


This worked nicely thank you very much

... I want to point out that there is also std.parallelism, which may 
be better suited in many cases.


I actually started off using std.parallelism and it worked well but the 
CPU usage on all the threads was less than half on my system monitor 
meaning there is more performance to be wrung out of my computer, which 
is why I am now looking into spawn. When you suggested using 
thread_joinAll() I saw that is in `core.thread.osthread` module. It 
might be shaving the yak this point but I have tried using `Thread` 
instead of `spawn`:


```
void process(double x, double y, long i, shared(double[]) z)
{
   z[i] = x*y;
}

void main()
{
   import core.thread.osthread;
   import std.stdio: writeln;

   long n = 100;
   shared(double[]) z = new double[n];
   for(long i = 0; i < n; ++i)
   {
     auto proc = (){
   process(cast(double)(i), cast(double)(i + 1), i, z);
   return;
     };


proc is already a delegate, so &proc is a pointer to the delegate, just 
pass a `proc` itself



     new Thread(&proc).start();
   }
   thread_joinAll();
   writeln("z: ", z);
}
```
and I am getting the following error:

```
onlineapp.d(20): Error: none of the overloads of this are callable using 
argument types (void delegate() @system*), candidates are:
/dlang/dmd/linux/bin64/../../src/druntime/import/core/thread/osthread.d(646):
core.thread.osthread.Thread.this(void function() fn, ulong sz = 0LU)
/dlang/dmd/linux/bin64/../../src/druntime/import/core/thread/osthread.d(671):
core.thread.osthread.Thread.this(void delegate() dg, ulong sz = 0LU)
/dlang/dmd/linux/bin64/../../src/druntime/import/core/thread/osthread.d(1540):
core.thread.osthread.Thread.this(ulong sz = 0LU)

```







Re: Error running concurrent process and storing results in array

2020-05-05 Thread data pulverizer via Digitalmars-d-learn

On Wednesday, 6 May 2020 at 04:04:14 UTC, Mathias LANG wrote:

On Wednesday, 6 May 2020 at 03:41:11 UTC, data pulverizer wrote:
Yes, that's exactly what I want the actual computation I'm 
running is much more expensive and much larger. It shouldn't 
matter if I have like 100_000_000 threads should it? The 
threads should just be queued until the cpu works on it?


It does matter quite a bit. Each thread has its own resources 
allocated to it, and some part of the language will need to 
interact with *all* threads, e.g. the GC.
In general, if you want to parallelize something, you should 
aim to have as many threads as you have cores. Having 100M 
threads will mean you have to do a lot of context switches. You 
might want to look up the difference between tasks and threads.


Sorry, I meant 10_000 not 100_000_000 I square the number by 
mistake because I'm calculating a 10_000 x 10_000 matrix it's 
only 10_000 tasks, so 1 task does 10_000 calculations. The actual 
bit of code I'm parallelising is here:


```
auto calculateKernelMatrix(T)(AbstractKernel!(T) K, Matrix!(T) 
data)

{
  long n = data.ncol;
  auto mat = new Matrix!(T)(n, n);

  foreach(j; taskPool.parallel(iota(n)))
  {
auto arrj = data.refColumnSelect(j).array;
for(long i = j; i < n; ++i)
{
  mat[i, j] = K.kernel(data.refColumnSelect(i).array, arrj);
  mat[j, i] = mat[i, j];
}
  }
  return mat;
}
```

At the moment this code is running a little bit faster than 
threaded simd optimised Julia code, but as I said in an earlier 
reply to Ali when I look at my system monitor, I can see that all 
the D threads are active and running at ~ 40% usage, meaning that 
they are mostly doing nothing. The Julia code runs all threads at 
100% and is still a tiny bit slower so my (maybe incorrect?) 
assumption is that I could get more performance from D. The 
method `refColumnSelect(j).array` is (trying to) reference a 
column from a matrix (1D array with computed index referencing) 
which I select from the matrix using:


```
return new Matrix!(T)(data[startIndex..(startIndex + nrow)], 
[nrow, 1]);

```

If I use the above code, I am I wrong in assuming that the sliced 
data (T[]) is referenced rather than copied? That so if I do:


```
auto myData = data[5...10];
```

myData is referencing elements [5..10] of data and not creating a 
new array with elements data[5..10] copied?


Re: Error running concurrent process and storing results in array

2020-05-05 Thread data pulverizer via Digitalmars-d-learn

On Wednesday, 6 May 2020 at 03:56:04 UTC, Ali Çehreli wrote:
On 5/5/20 8:41 PM, data pulverizer wrote:> On Wednesday, 6 May 
2020 at 03:33:12 UTC, Mathias LANG wrote:

>> On Wednesday, 6 May 2020 at 03:25:41 UTC, data pulverizer
wrote:

> Is there something I need to do to wait for each thread to
finish
> computation?

thread_joinAll(). I have an example here:

  
http://ddili.org/ders/d.en/concurrency.html#ix_concurrency.thread_joinAll


This worked nicely thank you very much

... I want to point out that there is also std.parallelism, 
which may be better suited in many cases.


I actually started off using std.parallelism and it worked well 
but the CPU usage on all the threads was less than half on my 
system monitor meaning there is more performance to be wrung out 
of my computer, which is why I am now looking into spawn. When 
you suggested using thread_joinAll() I saw that is in 
`core.thread.osthread` module. It might be shaving the yak this 
point but I have tried using `Thread` instead of `spawn`:


```
void process(double x, double y, long i, shared(double[]) z)
{
  z[i] = x*y;
}

void main()
{
  import core.thread.osthread;
  import std.stdio: writeln;

  long n = 100;
  shared(double[]) z = new double[n];
  for(long i = 0; i < n; ++i)
  {
auto proc = (){
  process(cast(double)(i), cast(double)(i + 1), i, z);
  return;
};
new Thread(&proc).start();
  }
  thread_joinAll();
  writeln("z: ", z);
}
```
and I am getting the following error:

```
onlineapp.d(20): Error: none of the overloads of this are 
callable using argument types (void delegate() @system*), 
candidates are:

/dlang/dmd/linux/bin64/../../src/druntime/import/core/thread/osthread.d(646):   
 core.thread.osthread.Thread.this(void function() fn, ulong sz = 0LU)
/dlang/dmd/linux/bin64/../../src/druntime/import/core/thread/osthread.d(671):   
 core.thread.osthread.Thread.this(void delegate() dg, ulong sz = 0LU)
/dlang/dmd/linux/bin64/../../src/druntime/import/core/thread/osthread.d(1540):  
  core.thread.osthread.Thread.this(ulong sz = 0LU)
```





Re: Error running concurrent process and storing results in array

2020-05-05 Thread Mathias LANG via Digitalmars-d-learn

On Wednesday, 6 May 2020 at 03:41:11 UTC, data pulverizer wrote:


Is there something I need to do to wait for each thread to 
finish computation?


Yeah, you need to synchronize so that your main thread wait on 
all the other threads to finish.

Look up `Thread.join`.

Yes, that's exactly what I want the actual computation I'm 
running is much more expensive and much larger. It shouldn't 
matter if I have like 100_000_000 threads should it? The 
threads should just be queued until the cpu works on it?


It does matter quite a bit. Each thread has its own resources 
allocated to it, and some part of the language will need to 
interact with *all* threads, e.g. the GC.
In general, if you want to parallelize something, you should aim 
to have as many threads as you have cores. Having 100M threads 
will mean you have to do a lot of context switches. You might 
want to look up the difference between tasks and threads.


Re: Error running concurrent process and storing results in array

2020-05-05 Thread Ali Çehreli via Digitalmars-d-learn
On 5/5/20 8:41 PM, data pulverizer wrote:> On Wednesday, 6 May 2020 at 
03:33:12 UTC, Mathias LANG wrote:

>> On Wednesday, 6 May 2020 at 03:25:41 UTC, data pulverizer wrote:

> Is there something I need to do to wait for each thread to finish
> computation?

thread_joinAll(). I have an example here:

  http://ddili.org/ders/d.en/concurrency.html#ix_concurrency.thread_joinAll

Although I understand that you're experimenting with std.concurrency, I 
want to point out that there is also std.parallelism, which may be 
better suited in many cases. Again, here are some examples:


  http://ddili.org/ders/d.en/parallelism.html

Ali



Re: Error running concurrent process and storing results in array

2020-05-05 Thread data pulverizer via Digitalmars-d-learn

On Wednesday, 6 May 2020 at 03:33:12 UTC, Mathias LANG wrote:

On Wednesday, 6 May 2020 at 03:25:41 UTC, data pulverizer wrote:

[...]


The problem here is that `process` is a delegate, not a 
function. The compiler *should* know it's a function, but for 
some reason it does not. Making the function static, or moving 
it outside of the scope of main, will fix it.


I moved the `process` function out of main and it is now running 
but it prints out


```
z: [nan, 2, nan, 12, 20, nan, nan, nan, nan, 90, nan, 132, nan, 
nan, 210, nan, nan, nan, nan, nan, nan, nan, nan, nan, 600, nan, 
nan, nan, nan, nan, 930, 992, 1056, nan, 1190, nan, nan, nan, 
nan, nan, 1640, 1722, nan, nan, nan, nan, nan, nan, nan, nan, 
nan, nan, nan, nan, nan, 3080, nan, nan, 3422, 3540, nan, nan, 
nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, 
nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, 
nan, 8010, nan, nan, nan, nan, nan, nan, 9312, nan, nan, 9900]

```
Is there something I need to do to wait for each thread to finish 
computation?


For reference, this will spawn 100 threads to do a simple 
computation so probably not what you would want, I expect. But 
I suppose this is just example code and the underlying 
computation is much more expensive ?


Yes, that's exactly what I want the actual computation I'm 
running is much more expensive and much larger. It shouldn't 
matter if I have like 100_000_000 threads should it? The threads 
should just be queued until the cpu works on it?


Thanks


Re: Error running concurrent process and storing results in array

2020-05-05 Thread drug via Digitalmars-d-learn

06.05.2020 06:25, data pulverizer пишет:


```
onlineapp.d(14): Error: template std.concurrency.spawn cannot deduce 
function from argument types !()(void delegate(double x, double y, long 
i, shared(double[]) z) pure nothrow @nogc @safe, double, double, long, 
shared(double[])), candidates are:
/dlang/dmd/linux/bin64/../../src/phobos/std/concurrency.d(460):  
spawn(F, T...)(F fn, T args)
   with F = void delegate(double, double, long, shared(double[])) pure 
nothrow @nogc @safe,

    T = (double, double, long, shared(double[]))
   must satisfy the following constraint:
    isSpawnable!(F, T)
```



I think the problem is in `process` attributes (error message you posted 
is strange, is it the full message?)
Make your `process` function a template one to let the compiler to 
deduce its attributes. Or set them manually.


Re: Error running concurrent process and storing results in array

2020-05-05 Thread Mathias LANG via Digitalmars-d-learn

On Wednesday, 6 May 2020 at 03:25:41 UTC, data pulverizer wrote:

[...]


The problem here is that `process` is a delegate, not a function. 
The compiler *should* know it's a function, but for some reason 
it does not. Making the function static, or moving it outside of 
the scope of main, will fix it.


For reference, this will spawn 100 threads to do a simple 
computation so probably not what you would want, I expect. But I 
suppose this is just example code and the underlying computation 
is much more expensive ?


Error running concurrent process and storing results in array

2020-05-05 Thread data pulverizer via Digitalmars-d-learn
I have been using std.parallelism and that has worked quite 
nicely but it is not fully utilising all the cpu resources in my 
computation so I though it could be good to run it concurrently 
to see if I can get better performance. However I am very new to 
std.concurrency and the baby version of the code I am trying to 
run:


```
void main()
{
  import std.concurrency;
  import std.stdio: writeln;

  void process(double x, double y, long i, shared(double[]) z)
  {
z[i] = x*y;
  }
  long n = 100;
  shared(double[]) z = new double[n];
  for(long i = 0; i < n; ++i)
  {
spawn(&process, cast(double)(i), cast(double)(i + 1), i, z);
  }
  writeln("z: ", z);
}
```


Illicits the following error:

```
onlineapp.d(14): Error: template std.concurrency.spawn cannot 
deduce function from argument types !()(void delegate(double x, 
double y, long i, shared(double[]) z) pure nothrow @nogc @safe, 
double, double, long, shared(double[])), candidates are:
/dlang/dmd/linux/bin64/../../src/phobos/std/concurrency.d(460):   
 spawn(F, T...)(F fn, T args)
  with F = void delegate(double, double, long, shared(double[])) 
pure nothrow @nogc @safe,

   T = (double, double, long, shared(double[]))
  must satisfy the following constraint:
   isSpawnable!(F, T)
```




Re: Beginner's Comparison Benchmark

2020-05-05 Thread Steven Schveighoffer via Digitalmars-d-learn

On 5/5/20 4:07 PM, RegeleIONESCU wrote:

Hello!

I made a little test(counting to 1 billion by adding 1)to compare 
execution speed of a small counting for loop in C, D, Julia and Python.
= 

The C version:  |The D version:   |The Julia version: |The 
Python Version
#include   |import std.stdio;    |function counter() |def 
counter():

int a=0;    |int main(){  |  z = 0 | z = 0
int main(){ |int a = 0;   |  for i=1:bil   | for i 
in range(1, bil):

int i;  |for(int i=0; i<=bil; |   z=z+1    | z=z+1
for(i=0; i= 


Test Results without optimization:
C  |DLANG   |JULIA  | Python
real 0m2,981s  | real 0m3,051s  | real 0m0,413s | real 2m19,501s
user 0m2,973s  | user 0m2,975s  | user 0m0,270s | user 2m18,095s
sys  0m0,001s  | sys  0m0,006s  | sys  0m0,181s | sys 0m0,033s
= 


Test Results with optimization:
C - GCC -O3    |DLANG LDC2 --O3 |JULIA --optimize=3 | Python -O
real 0m0,002s  | real 0m0,006s  | real 0m0,408s | real 2m21,801s
user 0m0,001s  | user 0m0,003s  | user 0m0,269s | user 2m19,964s
sys  0m0,001s  | sys  0m0,003s  | sys  0m0,177s | sys 0m0,050s
= 

= 


bil is the shortcut for 10
gcc 9.3.0
ldc2 1.21.0
python 3.8.2
julia 1.4.1
all on Ubuntu 20.04 - 64bit
Host CPU: k8-sse3

Unoptimized C and D are slow compared with Julia. Optimization increases 
the execution speed very much for C and D but has almost no effect on 
Julia.

Python, the slowest of all, when optimized, runs even slower :)))

Although I see some times are better than others, I do not really know 
the difference between user and sys, I do not know which one is the time 
the app run.


I am just a beginner, I am not a specialist. I made it just out of 
curiosity. If there is any error in my method please let me know.


1: you are interested in "real" time, that's how much time the whole 
thing took.
2: if you want to run benchmarks, you want to run multiple tests, and 
throw out the outliers, or use an average.
3: with simple things like this, the compiler is smarter than you ;) It 
doesn't really take 0.002s to do what you wrote, what happens is the 
optimizer recognizes what you are doing and changes your code to:


writeln(1_000_000_001);

(yes, you can use underscores to make literals more readable in D)

doing benchmarks like this is really tricky.

Julia probably recognizes the thing too, but has to optimize at runtime? 
Not sure.


-Steve


Beginner's Comparison Benchmark

2020-05-05 Thread RegeleIONESCU via Digitalmars-d-learn

Hello!

I made a little test(counting to 1 billion by adding 1)to compare 
execution speed of a small counting for loop in C, D, Julia and 
Python.

=
The C version:  |The D version:   |The Julia version:  
|The Python Version
#include   |import std.stdio;|function counter()  
|def counter():
int a=0;|int main(){  |  z = 0 | 
z = 0
int main(){ |int a = 0;   |  for i=1:bil   | 
for i in range(1, bil):
int i;  |for(int i=0; i<=bil; |   z=z+1|  
z=z+1

for(i=0; ia=a+1;  | a=a+1;  |print(z)
|counter()

}   | }   |end |
printf("%d", a);| write(a);   |counter()   |
}   |return 0;||
|}||
=
Test Results without optimization:
C  |DLANG   |JULIA  | Python
real 0m2,981s  | real 0m3,051s  | real 0m0,413s | real 
2m19,501s
user 0m2,973s  | user 0m2,975s  | user 0m0,270s | user 
2m18,095s
sys  0m0,001s  | sys  0m0,006s  | sys  0m0,181s | sys  
0m0,033s

=
Test Results with optimization:
C - GCC -O3|DLANG LDC2 --O3 |JULIA --optimize=3 | Python -O
real 0m0,002s  | real 0m0,006s  | real 0m0,408s | real 
2m21,801s
user 0m0,001s  | user 0m0,003s  | user 0m0,269s | user 
2m19,964s
sys  0m0,001s  | sys  0m0,003s  | sys  0m0,177s | sys  
0m0,050s

=
=
bil is the shortcut for 10
gcc 9.3.0
ldc2 1.21.0
python 3.8.2
julia 1.4.1
all on Ubuntu 20.04 - 64bit
Host CPU: k8-sse3

Unoptimized C and D are slow compared with Julia. Optimization 
increases the execution speed very much for C and D but has 
almost no effect on Julia.

Python, the slowest of all, when optimized, runs even slower :)))

Although I see some times are better than others, I do not really 
know the difference between user and sys, I do not know which one 
is the time the app run.


I am just a beginner, I am not a specialist. I made it just out 
of curiosity. If there is any error in my method please let me 
know.


Re: std.uni, std.ascii, std.encoding, std.utf ugh!

2020-05-05 Thread WebFreak001 via Digitalmars-d-learn

On Tuesday, 5 May 2020 at 18:41:50 UTC, learner wrote:

Good morning,

Trying to do this:

```
bool foo(string s) nothrow { return s.all!isDigit; }
```

I realised that the conversion from char to dchar could throw.

I need to validate and operate over ascii strings and utf8 
strings, possibly in separate functions, what's the best way to 
transition between:


```
immutable(ubyte)[] -> validate utf8 -> string -> nothrow usage 
-> isDigit etc
immutable(ubyte)[] -> validate ascii -> AsciiString? -> nothrow 
usage -> isDigit etc
string -> validate ascii -> AsciiString? -> nothrow 
usage -> isDigit etc

```

Thank you


if you want nothrow operations on the sequence of characters 
(bytes) of the strings, use `str.representation` to get 
`immutable(ubyte)[]` and work on that. This is useful for example 
for doing indexOf (countUntil), startsWith, endsWith, etc. Make 
sure at least one of your inputs is validated though to avoid 
potentially handling or cutting off unfinished code points. I 
think this is the best way to go if you want to do simple things.


If your algorithm is sufficiently complex that you would like to 
still decode but not crash, you can also manually call .decode 
with UseReplacementDchar.yes to make it emit \uFFFD for invalid 
characters.


To get the best of both worlds, use `.byUTF!dchar` which gives 
you an input range to iterate over and defaults to using 
replacement dchar. You can then call the various algorithm & 
array functions on it.


Unless you are working with different encodings than UTF-8 (like 
doing file or network operations) you shouldn't be needing 
std.encoding.


Also short explanation about the different modules:
std.ascii - simple functions to check and modify ASCII characters 
for various properties. Very easy to memorize everything inside 
it, you could easily rewrite what you need from scratch yourself. 
But of course this only handles all the basic ASCII characters, 
meaning it's only really useful for doing low-level almost binary 
file handling, not good for user facing parts which need to be 
international.


std.utf - ONLY encoding/decoding of unicode code points to UTF-8 
/ UTF-16 / UTF-32 byte representation. Doesn't have any idea what 
the characters actually mean, only checks for format and has 
limits on code point values. You could still reasonably rewrite 
this from scratch if you ever choose to.


std.uni - All the categorization of every character into all the 
different unicode types and algorithms modifying / combining / 
normalizing / etc. codepoints into other codepoints. Doesn't do 
anything with UTF encoding. I honestly wouldn't want to be the 
one who rewrites this or ports this to another language.


Re: real operations imprecise?

2020-05-05 Thread WebFreak001 via Digitalmars-d-learn

On Tuesday, 5 May 2020 at 14:15:03 UTC, H. S. Teoh wrote:
On Tue, May 05, 2020 at 01:44:18PM +, WebFreak001 via 
Digitalmars-d-learn wrote:

[...]


Whoa, hold your horses right there!  What does `pragma(msg, 
real.dig);`

output on your machine?

[...]


You are right, probably should have double checked, my fault. No 
idea why the numbers are different though if I specify it 
manually or divide by 2


Will check real.dig on that machine tomorrow but only post a 
reply if it's something else than 18.


std.uni, std.ascii, std.encoding, std.utf ugh!

2020-05-05 Thread learner via Digitalmars-d-learn

Good morning,

Trying to do this:

```
bool foo(string s) nothrow { return s.all!isDigit; }
```

I realised that the conversion from char to dchar could throw.

I need to validate and operate over ascii strings and utf8 
strings, possibly in separate functions, what's the best way to 
transition between:


```
immutable(ubyte)[] -> validate utf8 -> string -> nothrow usage -> 
isDigit etc
immutable(ubyte)[] -> validate ascii -> AsciiString? -> nothrow 
usage -> isDigit etc
string -> validate ascii -> AsciiString? -> nothrow 
usage -> isDigit etc

```

Thank you




Re: Retrieve the return type of the current function

2020-05-05 Thread Meta via Digitalmars-d-learn

On Tuesday, 5 May 2020 at 18:19:00 UTC, Meta wrote:

mixin template magic()
{
alias CallerRet = typeof(return);
CallerRet magic()
{
return CallerRet.init;
}
}


Small edit: you can remove the "CallerRet" alias by doing the 
following:


mixin template magic()
{
typeof(return) magic()
{
return typeof(return).init;
}
}


Though I wouldn't really recommend it as it's very confusing, 
IMO. This works because "typeof(return)" in the return position 
here refers to the caller's scope, while "typeof(return)" inside 
the function refer's to the function's scope.




Re: Retrieve the return type of the current function

2020-05-05 Thread Meta via Digitalmars-d-learn

On Tuesday, 5 May 2020 at 17:11:53 UTC, learner wrote:

On Tuesday, 5 May 2020 at 16:41:06 UTC, Adam D. Ruppe wrote:


typeof(return)


Thank you, that was indeed easy!

Is it possible to retrieve also the caller return type? 
Something like:


```
int foo() {
return magic();
}

auto magic(maybesomedefaulttemplateargs = ??)() {
alias R = __traits(???); // --> int!
}
```

Mixin templates maybe?


You *can* use mixin templates to access the caller's scope, which 
means typeof(return) will refer to the caller's return type, 
instead of the callee's. However, there's no way to both mixin 
and call the mixin template in a single line, so it's not DRY:


int foo()
{
mixin magic;
return magic();
}

mixin template magic()
{
alias CallerRet = typeof(return);
CallerRet magic()
{
return CallerRet.init;
}
}

void main()
{
foo();
}

Maybe somebody else knows a way to get around having to first 
mixin magic.


Re: Retrieve the return type of the current function

2020-05-05 Thread Jonathan M Davis via Digitalmars-d-learn
On Tuesday, May 5, 2020 11:11:53 AM MDT learner via Digitalmars-d-learn 
wrote:
> On Tuesday, 5 May 2020 at 16:41:06 UTC, Adam D. Ruppe wrote:
> > typeof(return)
>
> Thank you, that was indeed easy!
>
> Is it possible to retrieve also the caller return type? Something
> like:
>
> ```
> int foo() {
>  return magic();
> }
>
> auto magic(maybesomedefaulttemplateargs = ??)() {
>  alias R = __traits(???); // --> int!
> }
> ```
>
> Mixin templates maybe?

A function is compiled completely independently of where it's used, and it's
the same regardless of where it's used. So, it won't ever have access to any
information about where it's called unless it's explicitly given that
information.

A function template will be compiled differently depending on its template
arguments, but that still doesn't depend on the caller at all beyond what it
explicitly passes to the function, and if the same instantiation is used in
multiple places, then that caller will use exactly the same function
regardless of whether the callers are doing anything even vaguely similar
with it.

So, if you want a function to have any kind of information about its caller,
then you're going to have to either explicitly give it that information via
a template argument or outright generate a different function with a string
mixin every time you use it. So, you could do something like

auto foo(T)(int i)
{
...
}

string bar(string s, int i)
{
return!string(i);
}

or

string bar(string s, int i)
{
return!(typeof(return))(i);
}

but you're not going to be have foo figure out anything about its caller on
its own.

- Jonathan M Davis





Re: Retrieve the return type of the current function

2020-05-05 Thread learner via Digitalmars-d-learn

On Tuesday, 5 May 2020 at 16:41:06 UTC, Adam D. Ruppe wrote:


typeof(return)


Thank you, that was indeed easy!

Is it possible to retrieve also the caller return type? Something 
like:


```
int foo() {
return magic();
}

auto magic(maybesomedefaulttemplateargs = ??)() {
alias R = __traits(???); // --> int!
}
```

Mixin templates maybe?


Re: Retrieve the return type of the current function

2020-05-05 Thread Adam D. Ruppe via Digitalmars-d-learn

On Tuesday, 5 May 2020 at 16:36:48 UTC, learner wrote:
I mean, without using the function name in the body, like 
ReturnType!foo ?


even easier:

typeof(return)


Retrieve the return type of the current function

2020-05-05 Thread learner via Digitalmars-d-learn

Good morning,

Is it possible something like this?

```
int foo() {

__traits(some_trait, some_generic_this) theInt = 0;
```

I mean, without using the function name in the body, like 
ReturnType!foo ?




Re: real operations imprecise?

2020-05-05 Thread kinke via Digitalmars-d-learn
I can't even reproduce the 'missing' digits. On run.dlang.io, 
i.e., on Linux x64 (and so x87 real), I get an identical output 
for both DMD and LDC:


void main()
{
import core.stdc.stdio, std.math;
printf("%.70Lf\n", PI);
printf("%.70Lf\n", PI_2);
printf("%La\n", PI);
printf("%La\n", PI_2);
}

=>

3.14159265358979323851280895940618620443274267017841339111328125
1.570796326794896619256404479703093102216371335089206695556640625000
0xc.90fdaa22168c235p-2
0xc.90fdaa22168c235p-3


Re: real operations imprecise?

2020-05-05 Thread H. S. Teoh via Digitalmars-d-learn
On Tue, May 05, 2020 at 01:44:18PM +, WebFreak001 via Digitalmars-d-learn 
wrote:
> I was dumping the full PI value on my machine with the highest
> precision it could get and got:
> 
> $ rdmd --eval='printf("%.70llf\n", PI)'
> 3.14159265358979323851280895940618620443274267017841339111328125

Whoa, hold your horses right there!  What does `pragma(msg, real.dig);`
output on your machine?

On my machine, it's 18, i.e., `real` is capable of holding only ~18
digits of meaningful information. 70 digits is WAY beyond anything
that's actually represented in a `real`. And if you check online for the
actual digits of pi, you'll see that after about the 19th digit of your
output above, the rest of the digits are just pure garbage.  What you
see is just meaningless output from the typesetting algorithm from
information that isn't actually there.


> now this all looks good, but when I tried to print PI_2 I got
> 
> $ rdmd --eval='printf("%.70llf\n", PI_2)'
> 1.57079632679489655799898173427209258079528808593750
> 
> note how many more 0's there are.
> 
> When manually writing down the identifier as it is defined for PI but
> with the exponent reduced by one it looks good again:
> $ rdmd --eval='printf("%.70llf\n",
> cast(real)0x1.921fb54442d18469898cc51701b84p+0L)'
> 1.570796326794896619256404479703093102216371335089206695556640625000

Again, what's the value of `real.dig` on your machine?  Don't be
deceived by the number of trailing zeroes; most of the digits that come
before it are complete garbage long before it dwindled to zero.  After
about the 17th digit, the two printouts above have already diverged.
Looks to me like it's a difference of just 1 ulp or so in the actual
representation, assuming you're on x86 where `real` has about 18 digits
of precision.


> I would expect that the manifest constant PI_2 being defined as PI/2
> would simply modify the exponent and not the actual value in such a
> drastic manner.
> 
> While I don't need it myself right now, is there some compiler switch to
> make real operations like this division more precise to not lose all these
> bits of information? (LDC 1.20.1)

At the most, the above difference is only *one* bit of information. Most
of your trailing digits are meaningless garbage because they don't
actually exist in the representation of `real`.  Unless you have a
256-bit representation of `real`, you can't expect to get that many
digits out of it!!


> Sure, 1^-50 precision might not be everyones typical use-case but I
> think there is potential improvement either in the compiler or
> std.math to be done here :p

Um, no, if you need 1^-50 precision maybe you should be looking at
arbitrary-precision float libraries, like MPFR (but be prepared for a
big performance hit once you move away from hardware floats). The
hardware `real` type simply does not have that many bits to store that
many digits. You're asking for more digits (*way* more) than are
actually stored in the type, so your test results are invalid.  D's
floating-point types have a .dig property for a reason.  Use it! ;-)


T

-- 
The problem with the world is that everybody else is stupid.


real operations imprecise?

2020-05-05 Thread WebFreak001 via Digitalmars-d-learn
I was dumping the full PI value on my machine with the highest 
precision it could get and got:


$ rdmd --eval='printf("%.70llf\n", PI)'
3.14159265358979323851280895940618620443274267017841339111328125

now this all looks good, but when I tried to print PI_2 I got

$ rdmd --eval='printf("%.70llf\n", PI_2)'
1.57079632679489655799898173427209258079528808593750

note how many more 0's there are.

When manually writing down the identifier as it is defined for PI 
but with the exponent reduced by one it looks good again:
$ rdmd --eval='printf("%.70llf\n", 
cast(real)0x1.921fb54442d18469898cc51701b84p+0L)'

1.570796326794896619256404479703093102216371335089206695556640625000

I would expect that the manifest constant PI_2 being defined as 
PI/2 would simply modify the exponent and not the actual value in 
such a drastic manner.


While I don't need it myself right now, is there some compiler 
switch to make real operations like this division more precise to 
not lose all these bits of information? (LDC 1.20.1)


Sure, 1^-50 precision might not be everyones typical use-case but 
I think there is potential improvement either in the compiler or 
std.math to be done here :p


Re: Compilation memory use

2020-05-05 Thread Patrick Schluter via Digitalmars-d-learn

On Monday, 4 May 2020 at 17:00:21 UTC, Anonymouse wrote:
TL;DR: Is there a way to tell what module or other section of a 
codebase is eating memory when compiling?


[...]


maybe with the massif tool of valgrind?


Re: Bug?

2020-05-05 Thread RazvanN via Digitalmars-d-learn

On Tuesday, 5 May 2020 at 05:37:08 UTC, Simen Kjærås wrote:

On Tuesday, 5 May 2020 at 04:02:06 UTC, RazvanN wrote:

[...]


Surely the above code, which silently discards the exception, 
does not print "hello"?


Regardless, I ran your code with writeln inside the catch(), 
and without the try-catch entirely, with and without nothrow on 
K's destructor. I am unable to replicate the issue on my 
computer with DMD 2.091.0, as well as on run.dlang.io. Is 
something missing in your code here?


--
  Simen


Ah sorry! I was on a branch were I had some other modifications. 
Indeed in git master the issue does not manifest.