Re: The Case Against Autodecode

default0 via Digitalmars-d Thu, 02 Jun 2016 14:12:25 -0700

On Thursday, 2 June 2016 at 20:52:29 UTC, ag0aep6g wrote:

On 06/02/2016 10:36 PM, Andrei Alexandrescu wrote:
By whom? The "support level 1" folks yonder at the Unicodestandard? :o)
-- Andrei
Do they say that level 1 should be the default, and do theygive a rationale for that? Would you kindly link or quote that?

The level 2 support description noted that it should be opt-inbecause its slow.Arguably it should be easier to operate on code units if you knowits safe to do so, but either always working on code units oralways working on graphemes as the default seems to be either toobroken too often or too slow too often.

Now one can argue either consistency for code units (because thenwe can treat char[] and friends as a slice) or correctness forgraphemes but really the more I think about it the more I thinkthere is no good default and you need to learn unicode anyways.The only sad parts here are that 1) we hijacked an array type forstrings, which sucks and 2) that we dont have an api that isactually good at teaching the user what it does and doesnt do.

The consequence of 1 is that generic code that also wants to dealwith strings will want to special-case to get rid ofauto-decoding, the consequence of 2 is that we will have tons ofnot-actually-correct string handling.I would assume that almost all string handling code that is outin the wild is broken anyways (in code I have encountered I havenever seen attempts to normalize or do other things before orafter comparisons, searching, etc), unless of course, YOU or oneof your colleagues wrote it (consider that checking the length ofa string in Java or C# to validate it is no longer than Xcharacters is often done and wrong, because .Length is the numberof UTF-16 code units in those languages) :o)

So really as bad and alarming as "incorrect string handling" bydefault seems, it in practice of other languages that get usedway more than D has not prevented people from writing working(internationalized!) applications in those languages.One could say we should do it better than them, but I would beinclined to believe that RCStr provides our opportunity to do so.Having char[] be what it is is an annoying wart, and maybe atsome point we can deprecate/remove that behaviour, but for now Idrather see if RCStr is viable than attempt to change semantics ofall string handling code in D.

Re: The Case Against Autodecode

Reply via email to