Re: Today's programming challenge - How's your Range-Fu ?

Chris via Digitalmars-d Sat, 18 Apr 2015 04:56:00 -0700

On Saturday, 18 April 2015 at 11:35:47 UTC, Jacob Carlborg wrote:

On 2015-04-18 12:27, Walter Bright wrote:
That doesn't make sense to me, because the umlauts and theaccented e
all have Unicode code point assignments.
This code snippet demonstrates the problem:

import std.stdio;

void main ()
{
    dstring a = "e\u0301";
    dstring b = "é";
    assert(a != b);
    assert(a.length == 2);
    assert(b.length == 1);
    writefln(a, " ", b);
}
If you run the above code all asserts should pass. If yoursystem correctly supports Unicode (works on OS X 10.10) the twoprinted characters should look exactly the same.
\u0301 is the "combining acute accent" [1].

[1] http://www.fileformat.info/info/unicode/char/0301/index.htm

Yep, this was the cause of some bugs I had in my program. Thething is you never know, if a text is composed or decomposed, soyou have to be prepared that "é" has length 2 or 1. On OS X thesecharacters are automatically decomposed by default. So if youpipe it through the system an "é" (length=1) automaticallybecomes "e\u0301" (length=2). Same goes for file names on OS X.I've had to find a workaround for this more than once.

Re: Today's programming challenge - How's your Range-Fu ?

Reply via email to