Am Mon, 28 Mar 2016 16:29:50 -0700 schrieb "H. S. Teoh via Digitalmars-d-learn" <[email protected]>:
> […] your diacritics may get randomly reattached to > stuff they weren't originally attached to, or you may end up with wrong > sequences of Unicode code points (e.g. diacritics not attached to any > grapheme). Using filter() on Korean text, even with autodecoding, will > pretty much produce garbage. And so on. I'm on the same page here. If it ain't ASCII parsable, you *have* to make a conscious decision about whether you need code units or graphemes. I've yet to find out about the use cases for auto-decoded code-points though. > So in short, we're paying a performance cost for something that's only > arguably better but still not quite there, and this cost is attached to > almost *everything* you do with strings, regardless of whether you need > to (e.g., when you know you're dealing with pure ASCII data). An unconscious decision made by the library that yields the least likely intended and expected result? Let me think ... mhhh ... that's worse than iterating by char. No talking back :p. -- Marco
