Re: char array weirdness

Marco Leise via Digitalmars-d-learn Tue, 29 Mar 2016 15:36:38 -0700

Am Mon, 28 Mar 2016 16:29:50 -0700
schrieb "H. S. Teoh via Digitalmars-d-learn"
<[email protected]>:


> […] your diacritics may get randomly reattached to
> stuff they weren't originally attached to, or you may end up with wrong
> sequences of Unicode code points (e.g. diacritics not attached to any
> grapheme). Using filter() on Korean text, even with autodecoding, will
> pretty much produce garbage. And so on.

I'm on the same page here. If it ain't ASCII parsable, you
*have* to make a conscious decision about whether you need
code units or graphemes. I've yet to find out about the use
cases for auto-decoded code-points though.

> So in short, we're paying a performance cost for something that's only
> arguably better but still not quite there, and this cost is attached to
> almost *everything* you do with strings, regardless of whether you need
> to (e.g., when you know you're dealing with pure ASCII data).

An unconscious decision made by the library that yields the
least likely intended and expected result? Let me think ...
mhhh ... that's worse than iterating by char. No talking
back :p.

-- 
Marco

Re: char array weirdness

Reply via email to