Dmitry Olshansky , dans le message (digitalmars.D:147415), a écrit : > Assuming language support stays on stage of "codepoint is a character" > it's totaly expected to ignore modifiers and compare identically > normalized UTF without decoding. Yes, it risks to hit certain issues.
string being seen as range of codepoint (dchar) is already aweful enough. Now seeing strings as range of displayable caracters just do not make sense. Unicode is too complicated to allow doing this for a general purpose string manipulation. All the transformations to displayable characters can only be done when displaying characters ! Just like fiancé is hidden is you write fiance' (with the approriate unicode character to have the ' placed over the 'e'). You can hide any word by using delete characters. You have to make asumption on the input, and you have to put limitations to the algorithm because in any case, you can have unexpected behavior. And I can assure you there is less unexpected behavior if you treat strings as dchar range or even char[], than if you treat them as displayable characters. > It's a complete mess even with proper decoding ;) Sure, that's why we better not decode.