Re: Unicode handling comparison

Wyatt Wed, 27 Nov 2013 12:41:31 -0800

On Wednesday, 27 November 2013 at 17:22:43 UTC, Jakob Ovrum wrote:


i18nString sounds like a range of graphemes to me.

Maybe. If I had called it...say, "normalisedString"? Would youstill think that? That was an off-the-cuff name because mymorning brain imagined that this sort of thing would be usefulfor user input where you can't make assumptions about its form.

I would like a convenient function in std.uni to get such arange of graphemes from a range of points, but I wouldn't wantto elevate it to any particular status; that would be aknee-jerk reaction. D's granularity when it comes to Unicode isbecause there is an appropriate level of representation foreach domain. Shoe-horning everything into a range of graphemesis something we should avoid.

Okay, hold up. It's a bit late to prevent everyone from divingdown this rabbit hole, but let me be clear:

This really isn't about graphemes. Not really. They may beinvolved, but I think focusing on that obscures the point.

If you recall the original article, I don't think he's beingunfair in expecting "noël" to have a length of four no matterhow it was composed. I don't think it's unfair to expect that"noël".take(3) returns "noë", and I don't think it's unfairthat reversing it should be "lëon". All the places where hisexpectations were defied (and more!) are implementation details.

While I stated before that I don't necessarily have anythingagainst people learning more about unicode, neither do Ifundamentally believe that's something a lot of people _need_ toworry about. I'm not saying the default string in D shouldchange or anything crazy like that. All I'm suggesting is maybe,rather than telling people they should read a small book aboutthe most arcane stuff imaginable and then explaining which tooldoes what when that doesn't take, we could just tell them "Here,use this library type where you need it" with the admonishmentthat it may be too slow if abused. I think THAT could be useful.

In D, we can write code that is both Unicode-correct and highlyperformant, while still being simple and pleasant to read. Towrite such code, one must have a modicum of understanding ofhow Unicode works (in order to choose the right tools from thetoolbox), but I think it's a novel compromise.

See, this sways me only a little bit. The reason for that is,often, convenience greatly trumps elegance or performance. SureI COULD write something in C to look for obvious bad stuff in mysyslog, but would I bother when I have a shell with pipes, grep,cut, and sed? This all isn't to say I don't LIKE performance andelegance; but I live, work, and play on both sides of thisspectrum, and I'd like to think they can peacefully coexistwithout too much fuss.


-Wyatt

Re: Unicode handling comparison

Reply via email to