Re: Today's programming challenge - How's your Range-Fu ?

Chris via Digitalmars-d Mon, 20 Apr 2015 08:02:24 -0700

On Monday, 20 April 2015 at 11:04:58 UTC, Panke wrote:

Yes, again and again I encountered length related bugs withUnicode characters. Normalization is not 100% reliable.
I think it is 100% reliable, it just doesn't make the problemsgo away. It just guarantees that two strings normalized to thesame form are binary equal iff they are equal in the unicodesense. Nothing about columns or string length or grapheme count.

The problem is not normalization as such, the problem is withstring (as opposed to dstring):


import std.uni : normalize, NFC;
void main() {

  dstring de_one = "é";
  dstring de_two = "e\u0301";

  assert(de_one.length == 1);
  assert(de_two.length == 2);

  string e_one = "é";
  string e_two = "e\u0301";

  string random = "ab";

  assert(e_one.length == 2);
  assert(e_two.length == 3);
  assert(e_one.length == random.length);

  assert(normalize!NFC(e_one).length == 2);
  assert(normalize!NFC(e_two).length == 2);
}

This can lead to subtle bugs, cf. length of random and e_one. Youhave to convert everything to dstring to get the "expected"result. However, this is not always desirable.

Re: Today's programming challenge - How's your Range-Fu ?

Reply via email to