Re: DMD 2.090.1: SIGILL, Illegal instruction on (ahem) intel Pentium III

2020-02-26 Thread Seb via Digitalmars-d-learn

On Thursday, 27 February 2020 at 00:36:49 UTC, kdevel wrote:

```test.d
void main ()
{
   int [] v = new int [10];
}
```

$ [...]linux/bin32/dmd test
$ gdb test
[...]
(gdb) r
[...]
Program received signal SIGILL, Illegal instruction.
0x0809ad14 in 
_D2gc4impl12conservativeQw3Gcx10smallAllocMFNbkKkkxC8TypeInfoZPv ()

[...]
(gdb) disass
[...]
0x0809ad14 
<_D2gc4impl12conservativeQw3Gcx10smallAllocMFNbkKkkxC8TypeInfoZPv+172>:	movsd  -0x58(%ebp),%xmm0


Does this exception relate to [1] and shall I file a bug or do 
I have to decommission my PIII?


I think so.

Have you tried:
1) LDC on your machine?
2) the DMD version before this change?




Re: Strange counter-performance in an alternative `decimalLength9` function

2020-02-26 Thread Basile B. via Digitalmars-d-learn
On Thursday, 27 February 2020 at 03:58:15 UTC, Bruce Carneal 
wrote:


Maybe you talked about another implementation of 
decimalLength9 ?


Yes.  It's one I wrote after I saw your post. Psuedo-code here:

  auto d9_branchless(uint v) { return 1 + (v >= 10) + (v >= 
100) ... }


Using ldc to target an x86 with the above yields a series of 
cmpl, seta instruction pairs in the function body followed by a 
summing and a return.  No branching.


Let me know if the above is unclear or insufficient.


No thanks, it's crystal clear now.


Re: Strange counter-performance in an alternative `decimalLength9` function

2020-02-26 Thread Bruce Carneal via Digitalmars-d-learn
On Thursday, 27 February 2020 at 03:58:15 UTC, Bruce Carneal 
wrote:

On Wednesday, 26 February 2020 at 23:09:34 UTC, Basile B. wrote:
On Wednesday, 26 February 2020 at 20:44:31 UTC, Bruce Carneal 
wrote:


After shuffling the input, branchless wins by 2.4X (240%).




snip

Let me know if the above is unclear or insufficient.


The 2.4X improvement came when using a shuffled uniform digits 
distribution.  Equal numbers of 1 digit values, 2 digit values, 
... put in to an array and shuffled.


Branchless *loses* by 8.5% or so on my Zen1 when the array is not 
shuffled (when branch predictions are nearly perfect).













Re: Strange counter-performance in an alternative `decimalLength9` function

2020-02-26 Thread Bruce Carneal via Digitalmars-d-learn

On Wednesday, 26 February 2020 at 23:09:34 UTC, Basile B. wrote:
On Wednesday, 26 February 2020 at 20:44:31 UTC, Bruce Carneal 
wrote:


After shuffling the input, branchless wins by 2.4X (240%).


I've replaced the input by the front of a rndGen (that pops for 
count times and starting with a custom seed) and I never see 
the decimalLength9_3 (which seems to be the closest to the 
original in term of performances) doing better.


The uniform random uint distribution yields a highly skewed 
digits distribution.  Note, for example, that only 10 of the 
2**32 possible outputs from a uint PRNG can be represented by a 
single decimal digit.


Your current winner will be very hard to beat if the inputs are 
uniform random.  That implementation's high-to-low cascade of 
early exit ifs aligns with PRNG output.


If you aren't sure what the input distribution is, or want 
guaranteed good worst case behavior, then branchless may be a 
better way to go.




Maybe you talked about another implementation of decimalLength9 
?


Yes.  It's one I wrote after I saw your post. Psuedo-code here:

  auto d9_branchless(uint v) { return 1 + (v >= 10) + (v >= 100) 
... }


Using ldc to target an x86 with the above yields a series of 
cmpl, seta instruction pairs in the function body followed by a 
summing and a return.  No branching.


Let me know if the above is unclear or insufficient.























DMD 2.090.1: SIGILL, Illegal instruction on (ahem) intel Pentium III

2020-02-26 Thread kdevel via Digitalmars-d-learn

```test.d
void main ()
{
   int [] v = new int [10];
}
```

$ [...]linux/bin32/dmd test
$ gdb test
[...]
(gdb) r
[...]
Program received signal SIGILL, Illegal instruction.
0x0809ad14 in 
_D2gc4impl12conservativeQw3Gcx10smallAllocMFNbkKkkxC8TypeInfoZPv 
()

[...]
(gdb) disass
[...]
0x0809ad14 
<_D2gc4impl12conservativeQw3Gcx10smallAllocMFNbkKkkxC8TypeInfoZPv+172>:	movsd  -0x58(%ebp),%xmm0


Does this exception relate to [1] and shall I file a bug or do I 
have to decommission my PIII?


Stefan

[1] https://dlang.org/changelog/2.087.0.html#xmm-linux-changelog



Re: Strange counter-performance in an alternative `decimalLength9` function

2020-02-26 Thread Basile B. via Digitalmars-d-learn
On Wednesday, 26 February 2020 at 20:44:31 UTC, Bruce Carneal 
wrote:
The winning function implementation lines up with that 
distribution.  It would not fare as well with higher entropy 
input.


Using sorted equi-probable inputs (N 1 digit numbers, N 2 digit 
numbers, ...) decimalLength9_0 beats a simple branchless 
implementation by about 10%.


After shuffling the input, branchless wins by 2.4X (240%).


I've replaced the input by the front of a rndGen (that pops for 
count times and starting with a custom seed) and I never see the 
decimalLength9_3 (which seems to be the closest to the original 
in term of performances) doing better.


Maybe you talked about another implementation of decimalLength9 ?


Re: Strange counter-performance in an alternative `decimalLength9` function

2020-02-26 Thread Basile B. via Digitalmars-d-learn

On Wednesday, 26 February 2020 at 22:07:30 UTC, Johan wrote:

On Wednesday, 26 February 2020 at 00:50:35 UTC, Basile B. wrote:

[...]


Hi Basile,
  I recently saw this presentation: 
https://www.youtube.com/watch?v=Czr5dBfs72U
It has some ideas that may help you make sure your measurements 
are good and may give you ideas to find the performance 
bottleneck or where to optimize.
llvm-mca is featured on godbolt.org: 
https://mca.godbolt.org/z/YWp3yv


cheers,
  Johan


yes llvm-mca looks excellent, although I don't know if it worth 
continuing... You see this function is certainly not a 
bottleneck, it's just that I wanted to try better than the naive 
implementation.


Fundamentatlly the problem is that
1. the original is smaller, faster to decode
2. the alternatives (esp. the 3rd) is conceptually better but the 
cost of the jump table + lzcnt wastes it.


Re: Strange counter-performance in an alternative `decimalLength9` function

2020-02-26 Thread Johan via Digitalmars-d-learn

On Wednesday, 26 February 2020 at 00:50:35 UTC, Basile B. wrote:
So after reading the translation of RYU I was interested too 
see if the decimalLength() function can be written to be 
faster, as it cascades up to 8 CMP.


...

Then bad surprise. Even with ldmd (so ldc2 basically) feeded 
with the args from the script line. Maybe the fdecimalLength9 
version is slightly faster. Only *slightly*. Good news, I've 
lost my time. So I try an alternative version that uses a table 
of delegates instead of a switch (ffdecimalLength9) and 
surprise, "tada", it is like **100x** slower then the two 
others.


How is that possible ?


Hi Basile,
  I recently saw this presentation: 
https://www.youtube.com/watch?v=Czr5dBfs72U
It has some ideas that may help you make sure your measurements 
are good and may give you ideas to find the performance 
bottleneck or where to optimize.
llvm-mca is featured on godbolt.org: 
https://mca.godbolt.org/z/YWp3yv


cheers,
  Johan



Re: Strange counter-performance in an alternative `decimalLength9` function

2020-02-26 Thread Bruce Carneal via Digitalmars-d-learn
On Wednesday, 26 February 2020 at 19:44:05 UTC, Bruce Carneal 
wrote:

On Wednesday, 26 February 2020 at 13:50:11 UTC, Basile B. wrote:
On Wednesday, 26 February 2020 at 00:50:35 UTC, Basile B. 
wrote:

...
foreach (i; 0 .. count)
sum += funcs[func](i);


The input stream is highly predictable and strongly skewed 
towards higher digits.


The winning function implementation lines up with that 
distribution.  It would not fare as well with higher entropy 
input.


Using sorted equi-probable inputs (N 1 digit numbers, N 2 digit 
numbers, ...) decimalLength9_0 beats a simple branchless 
implementation by about 10%.


After shuffling the input, branchless wins by 2.4X (240%).



Re: SQLite 3 support?

2020-02-26 Thread Jordan Wilson via Digitalmars-d-learn

On Wednesday, 26 February 2020 at 20:06:20 UTC, mark wrote:
There seems to be some support for SQLite 3 in std. lib. etc 
when looking at the stable docs:

https://dlang.org/phobos/etc_c_sqlite3.html

But this isn't visible when looking at stable (ddox).

Is this the best SQLite 3 library to use or is a third-party 
library best?

For example https://github.com/biozic/d2sqlite3


I use d2sqlite3 regularly, no problems at all with it.
I have no experience with the std. lib one.

Jordan


Re: SQLite 3 support?

2020-02-26 Thread tchaloupka via Digitalmars-d-learn

On Wednesday, 26 February 2020 at 20:06:20 UTC, mark wrote:
There seems to be some support for SQLite 3 in std. lib. etc 
when looking at the stable docs:

https://dlang.org/phobos/etc_c_sqlite3.html

But this isn't visible when looking at stable (ddox).

Is this the best SQLite 3 library to use or is a third-party 
library best?

For example https://github.com/biozic/d2sqlite3


What's in the Phobos is just D binding to the sqlite3 C api. 
Probably not much up to date too. I've been using d2sqlite3 a lot 
some time ago and I'd definitelly recommend it. It also provides 
nice higher level API that helps to work with it in a more D 
friendly and productive way.


But I'm not following sqlite3 updates much nowadays.


SQLite 3 support?

2020-02-26 Thread mark via Digitalmars-d-learn
There seems to be some support for SQLite 3 in std. lib. etc when 
looking at the stable docs:

https://dlang.org/phobos/etc_c_sqlite3.html

But this isn't visible when looking at stable (ddox).

Is this the best SQLite 3 library to use or is a third-party 
library best?

For example https://github.com/biozic/d2sqlite3




Re: Strange counter-performance in an alternative `decimalLength9` function

2020-02-26 Thread Bruce Carneal via Digitalmars-d-learn

On Wednesday, 26 February 2020 at 13:50:11 UTC, Basile B. wrote:

On Wednesday, 26 February 2020 at 00:50:35 UTC, Basile B. wrote:
...
foreach (i; 0 .. count)
sum += funcs[func](i);


The input stream is highly predictable and strongly skewed 
towards higher digits.


The winning function implementation lines up with that 
distribution.  It would not fare as well with higher entropy 
input.










Re: Strange counter-performance in an alternative `decimalLength9` function

2020-02-26 Thread Basile B. via Digitalmars-d-learn

On Wednesday, 26 February 2020 at 00:50:35 UTC, Basile B. wrote:

How is that possible ?


It turns out that there's a problem with the benchmarking method. 
With command line argument the different optimization passes of 
LLVM don't fuck up with the literal constants.


It appears that none of the alternatives based on the "most 
significant bit trick" are faster (at least when covering a 
decent range of numbers):


---
#!ldmd -boundscheck=off -release -inline -O -mcpu=native 
-mattr=+sse2,+sse3,+sse4.1,+sse4.2,+fast-lzcnt,+avx,+avx2,+cmov,+bmi,+bmi2

import core.memory;
import core.bitop;
import std.stdio;
import std.range;
import std.algorithm;
import std.getopt;

ubyte decimalLength9_0(const uint v)
{
if (v >= 1) { return 9; }
if (v >= 1000) { return 8; }
if (v >= 100) { return 7; }
if (v >= 10) { return 6; }
if (v >= 1) { return 5; }
if (v >= 1000) { return 4; }
if (v >= 100) { return 3; }
if (v >= 10) { return 2; }
return 1;
}

ubyte decimalLength9_1(const uint v) pure nothrow
{
if (v == 0) // BSR and LZCNT UB when input is 0
return 1;
const ubyte lzc = cast(ubyte) bsr(v);
ubyte result;
switch (lzc)
{
case 0 :
case 1 :
case 2 : result =  1; break;
case 3 : result =  v >= 10 ? 2 : 1; break;
case 4 :
case 5 : result =  2; break;
case 6 : result =  v >= 100 ? 3 : 2; break;
case 7 :
case 8 : result =  3; break;
case 9 : result =  v >= 1000 ? 4 : 3; break;
case 10:
case 11:
case 12: result =  4; break;
case 13: result =  v >= 1 ? 5 : 4; break;
case 14:
case 15: result =  5; break;
case 16: result =  v >= 10 ? 6 : 5; break;
case 17:
case 18: result =  6; break;
case 19: result =  v >= 100 ? 7 : 6; break;
case 20:
case 21:
case 22: result =  7; break;
case 23: result =  v >= 1000 ? 8 : 7; break;
case 24:
case 25: result =  8; break;
case 26: result =  v >= 1 ? 9 : 8; break;
case 27:
case 28:
case 29:
case 30:
case 31: result =  9; break;

default: assert(false);
}
return result;
}

private ubyte decimalLength9_2(const uint v) pure nothrow
{
if (v == 0) // BSR and LZCNT UB when input is 0
return 1;
const ubyte lzc = cast(ubyte) bsr(v);
static immutable pure nothrow ubyte function(uint)[32] tbl =
[
0 : (uint a) => ubyte(1),
1 : (uint a) => ubyte(1),
2 : (uint a) => ubyte(1),
3 : (uint a) => a >= 10  ? ubyte(2) : ubyte(1),
4 : (uint a) => ubyte(2),
5 : (uint a) => ubyte(2),
6 : (uint a) => a >= 100  ? ubyte(3) : ubyte(2),
7 : (uint a) => ubyte(3),
8 : (uint a) => ubyte(3),
9 : (uint a) => a >= 1000  ? ubyte(4) : ubyte(3),
10: (uint a) => ubyte(4),
11: (uint a) => ubyte(4),
12: (uint a) => ubyte(4),
13: (uint a) => a >= 1  ? ubyte(5) : ubyte(4),
14: (uint a) => ubyte(5),
15: (uint a) => ubyte(5),
16: (uint a) => a >= 10  ? ubyte(6) : ubyte(5),
17: (uint a) => ubyte(6),
18: (uint a) => ubyte(6),
19: (uint a) => a >= 100  ? ubyte(7) : ubyte(6),
20: (uint a) => ubyte(7),
21: (uint a) => ubyte(7),
22: (uint a) => ubyte(7),
23: (uint a) => a >= 1000  ? ubyte(8) : ubyte(7),
24: (uint a) => ubyte(8),
25: (uint a) => ubyte(8),
26: (uint a) => a >= 1  ? ubyte(9) : ubyte(8),
27: (uint a) => ubyte(9),
28: (uint a) => ubyte(9),
29: (uint a) => ubyte(9),
30: (uint a) => ubyte(9),
31: (uint a) => ubyte(9),
];
return tbl[lzc](v);
}

ubyte decimalLength9_3(const uint v) pure nothrow
{
if (v == 0) // BSR and LZCNT UB when input is 0
return 1;
ubyte result;
const ubyte lzc = cast(ubyte) bsr(v);
const ubyte[32] decimalLength9LUT =
[
0 : ubyte(1),
1 : ubyte(1),
2 : ubyte(1),
3 : ubyte(10),
4 : ubyte(2),
5 : ubyte(2),
6 : ubyte(11),
7 : ubyte(3),
8 : ubyte(3),
9 : ubyte(12),
10: ubyte(4),
11: ubyte(4),
12: ubyte(4),
13: ubyte(12),
14: ubyte(5),
15: ubyte(5),
16: ubyte(13),
17: ubyte(6),
18: ubyte(6),
19: ubyte(14),
20: ubyte(7),
21: ubyte(7),
22: ubyte(7),
23: ubyte(15),
24: ubyte(8),
25: ubyte(8),
26: ubyte(16),
27: ubyte(9),
28: ubyte(9),
29: ubyte(9),
30: ubyte(9),
31: ubyte(9),
];
ubyte resultOrSelector = decimalLength9LUT[lzc];
if (resultOrSelector < 10)
result = resultOrSelector;
else switch (lzc)
{
case 3 : result =  v >= 10 ? 2 : 1; break;
case 6 : result =  v >= 100 ? 3 : 2; break;
case 9 : result =  v >= 1000 ? 4 : 3; break;
case 13: result =  v >= 1 ? 5 : 4; break;
case 16: result =  v >= 10 ? 6 : 5; break;
case 19: result =  

Re: Strange counter-performance in an alternative `decimalLength9` function

2020-02-26 Thread Stefan Koch via Digitalmars-d-learn

On Wednesday, 26 February 2020 at 00:50:35 UTC, Basile B. wrote:
So after reading the translation of RYU I was interested too 
see if the decimalLength() function can be written to be 
faster, as it cascades up to 8 CMP.


[...]


It can be made faster using binary search. Not by much though.

You'll get ceil(log2(numberofplaces)) cmps.


Re: String switch is odd using betterC

2020-02-26 Thread Simen Kjærås via Digitalmars-d-learn

On Wednesday, 26 February 2020 at 08:32:50 UTC, Abby wrote:

On Wednesday, 26 February 2020 at 08:25:00 UTC, Abby wrote:

Any idea why?


Ok so this is enough to produce the same result, it seems that 
there is a problem in string switch when there is more the 6 
cases.


extern(C) void main()
{
auto s = "F";
final switch(s)
{
case "A": break;
case "B": break;
case "C": break;
case "D": break;
case "E": break;
case "F": break;
case "G": break;
}
}


The explanation can be found in 
druntime/import/core/internal/switch_.d: the __switch template 
does a simple binary search for less than 7 cases, but calls 
.idup on each case label for >= 7 cases.


There's a comment there about why it's being done, but it seems 
to be a far more complicated fix than necessary - static 
immutable cases = [caseLabels]; works just as well, it seems.


Anyway, the current code was added in this commit: 
https://github.com/dlang/druntime/commit/fa665f6618af7dbc09ed5ba1333f385017b7ece8.


Anyways, reported here: 
https://issues.dlang.org/show_bug.cgi?id=20613


--
  Simen


Re: String switch is odd using betterC

2020-02-26 Thread Petar via Digitalmars-d-learn

On Wednesday, 26 February 2020 at 08:32:50 UTC, Abby wrote:

On Wednesday, 26 February 2020 at 08:25:00 UTC, Abby wrote:

Any idea why?


Ok so this is enough to produce the same result, it seems that 
there is a problem in string switch when there is more the 6 
cases.


extern(C) void main()
{
auto s = "F";
final switch(s)
{
case "A": break;
case "B": break;
case "C": break;
case "D": break;
case "E": break;
case "F": break;
case "G": break;
}
}


This looks like a possible cause:
https://github.com/dlang/druntime/blob/e018a72084e54ecc7466e97c96e4557b96b7f905/src/core/internal/switch_.d#L34



Re: String switch is odd using betterC

2020-02-26 Thread Abby via Digitalmars-d-learn

On Wednesday, 26 February 2020 at 08:25:00 UTC, Abby wrote:

Any idea why?


Ok so this is enough to produce the same result, it seems that 
there is a problem in string switch when there is more the 6 
cases.


extern(C) void main()
{
auto s = "F";
final switch(s)
{
case "A": break;
case "B": break;
case "C": break;
case "D": break;
case "E": break;
case "F": break;
case "G": break;
}
}


String switch is odd using betterC

2020-02-26 Thread Abby via Digitalmars-d-learn

I have a simple enum of strings like so:

enum Alphabet : string
{
a = "A",
b = "B",
c = "C",
d = "D",
e = "E",
f = "F",
g = "G"
}

and then simple final switch like so:

extern(C) void main()
{
auto s = Alphabet.f;
final switch(s)
{
case Alphabet.a: break;
case Alphabet.b: break;
case Alphabet.c: break;
case Alphabet.d: break;
case Alphabet.e: break;
case Alphabet.f: break;
case Alphabet.g: break;
}
}

The problem I have is that this causes:

/dlang/dmd/linux/bin64/../../src/druntime/import/object.d(2999): 
Error: TypeInfo cannot be used with -betterC


Odd think is that wehen I remove g from my enum it compiles just 
fine, so it seems that this compilation error occurs only when my 
enum has more then 6 members.


Any idea why?


Re: String switch is odd using betterC

2020-02-26 Thread Abby via Digitalmars-d-learn

On Wednesday, 26 February 2020 at 08:25:00 UTC, Abby wrote:

/dlang/dmd/linux/bin64/../../src/druntime/import/object.d(2999): Error: 
TypeInfo cannot be used with -betterC

Odd think is that wehen I remove g from my enum it compiles 
just fine, so it seems that this compilation error occurs only 
when my enum has more then 6 members.


Any idea why?


Ok so this is enough to reproduce the problem

extern(C) void main()
{
auto s = "F";
final switch(s)
{
case "A": break;
case "B": break;
case "C": break;
case "D": break;
case "E": break;
case "F": break;
case "G": break;
}
}