Re: Switch ignores case (?)

2016-11-25 Thread Steven Schveighoffer via Digitalmars-d-learn

On 11/24/16 7:16 AM, ketmar wrote:

On Thursday, 24 November 2016 at 11:40:24 UTC, Antonio Corbi wrote:

Could it be possible to ping M. Nowak to include the fix for this bug
(due to its importance) in the final 2.072.1 release?


there is no real fix yet. what i provided is a quick hack, not tested on
anything except GNU/Linux, and without unitests. somebody should create
a real fix first. and it seems that we doesn't even have a bug opened in
bugzilla yet! ;-)


Yeah, there is: https://issues.dlang.org/show_bug.cgi?id=16739

Having a fix is obviously possible, we can do it very slowly on Windows 
:) slow is better than incorrect!


-Steve


Re: Switch ignores case (?)

2016-11-24 Thread ketmar via Digitalmars-d-learn
On Thursday, 24 November 2016 at 11:40:24 UTC, Antonio Corbi 
wrote:
Could it be possible to ping M. Nowak to include the fix for 
this bug (due to its importance) in the final 2.072.1 release?


there is no real fix yet. what i provided is a quick hack, not 
tested on anything except GNU/Linux, and without unitests. 
somebody should create a real fix first. and it seems that we 
doesn't even have a bug opened in bugzilla yet! ;-)


Re: Switch ignores case (?)

2016-11-24 Thread Antonio Corbi via Digitalmars-d-learn

On Wednesday, 23 November 2016 at 22:13:38 UTC, ketmar wrote:
On Wednesday, 23 November 2016 at 22:00:58 UTC, Steven 
Schveighoffer wrote:
I can't see why you need to deal with the glue layer at all -- 
just tell the glue layer that it's a list of strings and not 
dstrings ;)


'cause that is how s2ir.d is done in dmd. ;-) it actually sorts 
*objects*, and objects knows how to order 'emselves. at this 
stage it is not trivial to treat objects as byte arrays (easy, 
but not a one-liner).


sure, other compilers may do that differently, and it doesn't 
really matter how exactly the code for switch is generated 
until it works correctly. ;-)


Could it be possible to ping M. Nowak to include the fix for this 
bug (due to its importance) in the final 2.072.1 release?


Antonio


Re: Switch ignores case (?)

2016-11-24 Thread Chris via Digitalmars-d-learn

On Thursday, 24 November 2016 at 10:12:40 UTC, ketmar wrote:


thanks. tbh, i am surprised myself -- it is completely fubared, 
but nobody noticed. maybe that says something about real-world 
useability of dstring/wstring... ;-)


Well, I came across it, because I wanted to work around 
autodecode, our old "friend" (who need enemies, if you have a 
friend like that?). Except, it doesn't work as expected.


For my needs, I can work around it with `(c == "\u2019")`, 
because switch works for the rest of the (common) cases like full 
stop, question mark etc. The whole string handling issue in D 
needs to be fixed asap, because text processing is one of _the_ 
big things in IT these days. Think of all the data harvesting and 
evaluation and whatnot.


Re: Switch ignores case (?)

2016-11-24 Thread ketmar via Digitalmars-d-learn

On Thursday, 24 November 2016 at 09:56:25 UTC, Chris wrote:
Great job, ketmar! I'm only surprised that this bug wasn't 
discovered earlier, I mean it goes back to (at least) dmd 2.040.


thanks. tbh, i am surprised myself -- it is completely fubared, 
but nobody noticed. maybe that says something about real-world 
useability of dstring/wstring... ;-)


Re: Switch ignores case (?)

2016-11-24 Thread Chris via Digitalmars-d-learn

On Wednesday, 23 November 2016 at 22:13:38 UTC, ketmar wrote:
On Wednesday, 23 November 2016 at 22:00:58 UTC, Steven 
Schveighoffer wrote:
I can't see why you need to deal with the glue layer at all -- 
just tell the glue layer that it's a list of strings and not 
dstrings ;)


'cause that is how s2ir.d is done in dmd. ;-) it actually sorts 
*objects*, and objects knows how to order 'emselves. at this 
stage it is not trivial to treat objects as byte arrays (easy, 
but not a one-liner).


sure, other compilers may do that differently, and it doesn't 
really matter how exactly the code for switch is generated 
until it works correctly. ;-)


Great job, ketmar! I'm only surprised that this bug wasn't 
discovered earlier, I mean it goes back to (at least) dmd 2.040.


Re: Switch ignores case (?)

2016-11-23 Thread ketmar via Digitalmars-d-learn
On Wednesday, 23 November 2016 at 22:00:58 UTC, Steven 
Schveighoffer wrote:
I can't see why you need to deal with the glue layer at all -- 
just tell the glue layer that it's a list of strings and not 
dstrings ;)


'cause that is how s2ir.d is done in dmd. ;-) it actually sorts 
*objects*, and objects knows how to order 'emselves. at this 
stage it is not trivial to treat objects as byte arrays (easy, 
but not a one-liner).


sure, other compilers may do that differently, and it doesn't 
really matter how exactly the code for switch is generated until 
it works correctly. ;-)


Re: Switch ignores case (?)

2016-11-23 Thread Steven Schveighoffer via Digitalmars-d-learn

On 11/23/16 4:44 PM, ketmar wrote:

On Wednesday, 23 November 2016 at 21:40:52 UTC, Steven Schveighoffer wrote:

So the better way is to sort based on byte array, and just use memcmp
for everything.


i am not completely sure that this is really better. sorting and
generating tables is done in glue layer, which can be different for
different codegens. and `.compare` method, that is used for sorting
strings may be used in other places too. by introducing something like
`switchCompare()` we will add unnecessary complexity to the compiler,
will make sort order less obvious, and won't really get significant
speedup, as binary search doesn't really do alot of comparisons anyway.


I'm not certain of how difficult this will be, or how much risk there 
is. What I am certain of is that the user doesn't care that the compiler 
really isn't sorting the strings alphabetically, and that he wants the 
fastest code possible for a switch statement.


I can't see why you need to deal with the glue layer at all -- just tell 
the glue layer that it's a list of strings and not dstrings ;)


I also don't actually know that memcmp is faster than wmemcmp, so maybe 
there is even an advantage to changing behavior for dstring searching.



i thing it is better to be fixed in druntime.


This can be a solution, for sure. And in reality, it's not *that* much 
slower -- you are still doing a binary search.


I wonder if there is a "binary search among arrays" algorithm that can 
be optimized for this.


-Steve


Re: Switch ignores case (?)

2016-11-23 Thread ketmar via Digitalmars-d-learn
On Wednesday, 23 November 2016 at 21:40:52 UTC, Steven 
Schveighoffer wrote:
So the better way is to sort based on byte array, and just use 
memcmp for everything.


i am not completely sure that this is really better. sorting and 
generating tables is done in glue layer, which can be different 
for different codegens. and `.compare` method, that is used for 
sorting strings may be used in other places too. by introducing 
something like `switchCompare()` we will add unnecessary 
complexity to the compiler, will make sort order less obvious, 
and won't really get significant speedup, as binary search 
doesn't really do alot of comparisons anyway.


i thing it is better to be fixed in druntime.


Re: Switch ignores case (?)

2016-11-23 Thread ketmar via Digitalmars-d-learn
On Wednesday, 23 November 2016 at 21:31:08 UTC, Steven 
Schveighoffer wrote:
I think it makes the most sense to remove the memcmp, and do 
binary search based on actual char values.


yeah. there is wmemcmp, which can be used to speed up one of the 
cases ('cause wchar_t has different size on windows and 
GNU/Linux), as i did in my hackfix. yet i am not sure that 
wmemcmp is really there on windows in all C runtimes (hence 
HACKfix ;-).


Re: Switch ignores case (?)

2016-11-23 Thread Steven Schveighoffer via Digitalmars-d-learn

On 11/23/16 2:30 PM, ketmar wrote:

On Wednesday, 23 November 2016 at 19:07:49 UTC, Chris wrote:

It has something to do with the smart quote, e.g.:


it is wrong binary search in `_d_switch_string()`.

strings for switch are lexically sorted, and compiler calls
`_d_switch_string()` to select one. the thing is that comparison in
`_d_switch_string()` is done with `memcmp()`. still not clear? ok, let's
see how cases are encoded:

body _d_switch_dstring()
   'U0027' (ca)
table[0] = 1, 'U0027'
table[1] = 1, 'U2019'

or, in memory:

table[0] = 1, 0x27, 0x00
table[1] = 1, 0x19, 0x20

so, memcmp for `table[1]` returns... 1! 'cause 0x27 is greater than
0x19. and binsearch is broken from here on. the same is true for
`_d_switch_ustring()`, of course.

this can be fixed either by using slow char-by-char comparisons in
druntime, or by fixing codegen, so it would sort strings as byte arrays.


Oh wow, so this is really an endian issue. On a big endian machine, the 
code would work. Interesting!


I think it makes the most sense to remove the memcmp, and do binary 
search based on actual char values.


Thanks for finding this.

-Steve



Re: Switch ignores case (?)

2016-11-23 Thread ketmar via Digitalmars-d-learn

quickfix:


diff --git a/src/rt/switch_.d b/src/rt/switch_.d
index ec124e3..83572fe 100644
--- a/src/rt/switch_.d
+++ b/src/rt/switch_.d
@@ -27,6 +27,32 @@ private import core.stdc.string;

 extern (C):

+import core.stdc.wchar_ : wchar_t, wmemcmp;
+
+
+static if (wchar_t.sizeof == wchar.sizeof) {
+  alias memcmpw = wmemcmp;
+  int memcmpd (const(void)* s0, const(void)* s1, size_t len) {
+while (len--) {
+  if (int r = *cast(const(int)*)s0-*cast(const(int)*)s1) 
return (r < 0 ? -1 : 1);

+  s0 += dchar.sizeof;
+  s1 += dchar.sizeof;
+}
+return 0;
+  }
+} else static if (wchar_t.sizeof == dchar.sizeof) {
+  int memcmpw (const(void)* s0, const(void)* s1, size_t len) {
+while (len--) {
+  if (int r = 
*cast(const(ushort)*)s0-*cast(const(ushort)*)s1) return (r < 0 ? 
-1 : 1);

+  s0 += wchar.sizeof;
+  s1 += wchar.sizeof;
+}
+return 0;
+  }
+  alias memcmpd = wmemcmp;
+} else static assert(0, "wut?!");
+
+
 int _d_switch_string(char[][] table, char[] ca)
 in
 {
@@ -189,7 +215,7 @@ in
 {
 int c;

-c = memcmp(table[j - 1].ptr, table[j].ptr, len1 * 
wchar.sizeof);

+c = memcmpw(table[j - 1].ptr, table[j].ptr, len1);
 assert(c < 0);  // c==0 means a duplicate
 }
 }
@@ -205,7 +231,7 @@ out (result)
 for (auto i = 0u; i < table.length; i++)
 {
 if (table[i].length == ca.length)
-{   c = memcmp(table[i].ptr, ca.ptr, ca.length * 
wchar.sizeof);

+{   c = memcmpw(table[i].ptr, ca.ptr, ca.length);
 assert(c != 0);
 }
 }
@@ -218,7 +244,7 @@ out (result)
 assert(i < table.length);
 if (table[i].length == ca.length)
 {
-c = memcmp(table[i].ptr, ca.ptr, ca.length * 
wchar.sizeof);

+c = memcmpw(table[i].ptr, ca.ptr, ca.length);
 if (c == 0)
 {
 assert(i == result);
@@ -253,7 +279,7 @@ body
 auto c = cast(sizediff_t)(ca.length - pca.length);
 if (c == 0)
 {
-c = memcmp(ca.ptr, pca.ptr, ca.length * 
wchar.sizeof);

+c = memcmpw(ca.ptr, pca.ptr, ca.length);
 if (c == 0)
 {   //printf("found %d\n", mid);
 return cast(int)mid;
@@ -317,7 +343,7 @@ in
 assert(len1 <= len2);
 if (len1 == len2)
 {
-auto c = memcmp(table[j - 1].ptr, table[j].ptr, len1 
* dchar.sizeof);
+auto c = memcmpd(table[j - 1].ptr, table[j].ptr, 
len1);

 assert(c < 0);  // c==0 means a duplicate
 }
 }
@@ -331,7 +357,7 @@ out (result)
 for (auto i = 0u; i < table.length; i++)
 {
 if (table[i].length == ca.length)
-{   auto c = memcmp(table[i].ptr, ca.ptr, ca.length 
* dchar.sizeof);
+{   auto c = memcmpd(table[i].ptr, ca.ptr, 
ca.length);

 assert(c != 0);
 }
 }
@@ -344,7 +370,7 @@ out (result)
 assert(i < table.length);
 if (table[i].length == ca.length)
 {
-auto c = memcmp(table[i].ptr, ca.ptr, ca.length 
* dchar.sizeof);
+auto c = memcmpd(table[i].ptr, ca.ptr, 
ca.length);

 if (c == 0)
 {
 assert(i == result);
@@ -379,7 +405,7 @@ body
 auto c = cast(sizediff_t)(ca.length - pca.length);
 if (c == 0)
 {
-c = memcmp(ca.ptr, pca.ptr, ca.length * 
dchar.sizeof);

+c = memcmpd(ca.ptr, pca.ptr, ca.length);
 if (c == 0)
 {   //printf("found %d\n", mid);
 return cast(int)mid;
--
2.9.2


Re: Switch ignores case (?)

2016-11-23 Thread ketmar via Digitalmars-d-learn

On Wednesday, 23 November 2016 at 20:49:01 UTC, Chris wrote:
Actually, I came across a compiler message that gave me 
something like \x19\x20 which I found odd. This sure needs 
fixing. After all, it's quite a basic feature. So it's back to 
the old `if` again (for now).


yeah, until this is fixed, `switch` over wstring/dstring can be 
considered completely broken, and better be avoided. `switch` 
over normal string is unaffected, of course.


Re: Switch ignores case (?)

2016-11-23 Thread Chris via Digitalmars-d-learn

On Wednesday, 23 November 2016 at 19:30:01 UTC, ketmar wrote:

On Wednesday, 23 November 2016 at 19:07:49 UTC, Chris wrote:

It has something to do with the smart quote, e.g.:


it is wrong binary search in `_d_switch_string()`.

strings for switch are lexically sorted, and compiler calls 
`_d_switch_string()` to select one. the thing is that 
comparison in `_d_switch_string()` is done with `memcmp()`. 
still not clear? ok, let's see how cases are encoded:


body _d_switch_dstring()
   'U0027' (ca)
table[0] = 1, 'U0027'
table[1] = 1, 'U2019'

or, in memory:

table[0] = 1, 0x27, 0x00
table[1] = 1, 0x19, 0x20

so, memcmp for `table[1]` returns... 1! 'cause 0x27 is greater 
than 0x19. and binsearch is broken from here on. the same is 
true for `_d_switch_ustring()`, of course.


this can be fixed either by using slow char-by-char comparisons 
in druntime, or by fixing codegen, so it would sort strings as 
byte arrays.


Actually, I came across a compiler message that gave me something 
like \x19\x20 which I found odd. This sure needs fixing. After 
all, it's quite a basic feature. So it's back to the old `if` 
again (for now).


Re: Switch ignores case (?)

2016-11-23 Thread ketmar via Digitalmars-d-learn

On Wednesday, 23 November 2016 at 19:07:49 UTC, Chris wrote:

It has something to do with the smart quote, e.g.:


it is wrong binary search in `_d_switch_string()`.

strings for switch are lexically sorted, and compiler calls 
`_d_switch_string()` to select one. the thing is that comparison 
in `_d_switch_string()` is done with `memcmp()`. still not clear? 
ok, let's see how cases are encoded:


body _d_switch_dstring()
   'U0027' (ca)
table[0] = 1, 'U0027'
table[1] = 1, 'U2019'

or, in memory:

table[0] = 1, 0x27, 0x00
table[1] = 1, 0x19, 0x20

so, memcmp for `table[1]` returns... 1! 'cause 0x27 is greater 
than 0x19. and binsearch is broken from here on. the same is true 
for `_d_switch_ustring()`, of course.


this can be fixed either by using slow char-by-char comparisons 
in druntime, or by fixing codegen, so it would sort strings as 
byte arrays.


Re: Switch ignores case (?)

2016-11-23 Thread Jonathan M Davis via Digitalmars-d-learn
On Wednesday, November 23, 2016 19:07:49 Chris via Digitalmars-d-learn 
wrote:
> (I think I marked it as "P1" (default option), maybe it's "P2",
> didn't know what to choose)

AFAIK, there are no devs that pay any attention to those. Some attention is
paid to regression vs enhancement vs etc. But the priority field really
doesn't mean anything for how D bugs get fixed or who fixes what when.

- Jonathan M Davis



Re: Switch ignores case (?)

2016-11-23 Thread Chris via Digitalmars-d-learn
On Wednesday, 23 November 2016 at 18:34:28 UTC, Steven 
Schveighoffer wrote:




Please file here: https://issues.dlang.org/enter_bug.cgi

I think this has been there forever. Happens in 2.040 too (the 
earliest dmd I have on my system).


-Steve


Here it is:

https://issues.dlang.org/show_bug.cgi?id=16739

(I think I marked it as "P1" (default option), maybe it's "P2", 
didn't know what to choose)


It has something to do with the smart quote, e.g.:

import std.array;
import std.conv;
import std.stdio;

void main()
{
  auto tokens = to!(dchar[][])(["D"d, "’"d, "Addario"d, "'"d, 
"."d]);

  // Or use this below:
  //~ dstring[] tokens = ["D"d, "’"d, "Addario"d, "'"d];
  while (!tokens.empty)
  {
switch (tokens[0])
{
  case "\u2019"d:
writeln("Apostrophe smart " ~ tokens[0]);
break;
  case "\u0027"d:
writeln("Apostrophe straight " ~ tokens[0]);
break;
  case "D"d:
writeln("Letter 'D'");
break;
  case "."d:
writeln("fullstop " ~ tokens[0]);
break;
  default:
writeln("Other " ~ tokens[0]);
break;
}
tokens = tokens[1..$];
  }
}

prints:

Letter 'D'
Other ’  // <== not expected
Other Addario
Apostrophe straight '
fullstop .

Both LDC and DMD produce the same error. I discovered this while 
writing a tokenizer. It is a bit upsetting, because it is really 
an essential thing. The workaround for now would be


if (token[0] == "\u2019"d)
 goto Wherever;

which works, by the way.


Re: Switch ignores case (?)

2016-11-23 Thread Steven Schveighoffer via Digitalmars-d-learn

On 11/23/16 1:03 PM, Chris wrote:

On Wednesday, 23 November 2016 at 17:33:04 UTC, Steven Schveighoffer wrote:



I tested this locally with different ideas. This definitely looks like
a codegen bug.

I was able to reduce it to:

void main()
{
switch("'"d)
{
case "'"d:
writeln("a");
break;
case "’"d:
writeln("b");
break;
default:
writeln("default");
}
}

prints "default"

What seems to fix it is removing the case statement for the "smart
apostrophe". So I'd look there for where the bug is triggering.



Yep, removing one of the two cases works. I tried it with different
versions of dmd back to 2.070.0, and it always gives me the same (wrong)
result. I haven't tried ldc yet.



Please file here: https://issues.dlang.org/enter_bug.cgi

I think this has been there forever. Happens in 2.040 too (the earliest 
dmd I have on my system).


-Steve


Re: Switch ignores case (?)

2016-11-23 Thread Chris via Digitalmars-d-learn
On Wednesday, 23 November 2016 at 17:33:04 UTC, Steven 
Schveighoffer wrote:




I tested this locally with different ideas. This definitely 
looks like a codegen bug.


I was able to reduce it to:

void main()
{
switch("'"d)
{
case "'"d:
writeln("a");
break;
case "’"d:
writeln("b");
break;
default:
writeln("default");
}
}

prints "default"

What seems to fix it is removing the case statement for the 
"smart apostrophe". So I'd look there for where the bug is 
triggering.


-Steve


Yep, removing one of the two cases works. I tried it with 
different versions of dmd back to 2.070.0, and it always gives me 
the same (wrong) result. I haven't tried ldc yet.




Re: Switch ignores case (?)

2016-11-23 Thread Steven Schveighoffer via Digitalmars-d-learn

On 11/23/16 11:28 AM, Chris wrote:

Only one of the two cases is considered. What am I doing wrong?

`
import std.array;
import std.conv;
import std.stdio;

void main()
{
  auto tokens = to!(dchar[][])(["D"d, "’"d, "Addario"d, "'"d]);
  // Or use this below:
  //~ dstring[] tokens = ["D"d, "’"d, "Addario"d, "'"d];
  while (!tokens.empty)
  {
switch (tokens[0])
{
  case "\u2019"d:
writeln("Apostrophe smart " ~ tokens[0]);
break;
  case "\u0027"d:
writeln("Apostrophe straight " ~ tokens[0]);
break;
  default:
writeln("Other " ~ tokens[0]);
break;
}
tokens = tokens[1..$];
  }
}
`
Prints:

Other D
Apostrophe smart ’
Other Addario
Other '

I would have expected:

Other D
Apostrophe smart ’
Other Addario
Apostrophe straight '  <== expected


I tested this locally with different ideas. This definitely looks like a 
codegen bug.


I was able to reduce it to:

void main()
{
switch("'"d)
{
case "'"d:
writeln("a");
break;
case "’"d:
writeln("b");
break;
default:
writeln("default");
}
}

prints "default"

What seems to fix it is removing the case statement for the "smart 
apostrophe". So I'd look there for where the bug is triggering.


-Steve