Re: Why the hell doesn't foreach decode strings

Michel Fortin Wed, 26 Oct 2011 07:06:05 -0700

On 2011-10-26 11:50:32 +0000, "Steven Schveighoffer"<schvei...@yahoo.com> said:

It's even easier than this:


a) you want to do a proper string comparison not knowing what state the
unicode strings are in, use the full-fledged decode-when-needed string
type, and its associated str.find method.
b) you know they are both the same normalized form and want to optimize,
use std.algorithm.find(haystack.asArray, needle.asArray).

Well, treating the string as an array of dchar doesn't work in thegeneral case, even with strings normalized the same way your fiancéexample can break. So should never treat them as plain arrays unlessI'm sure I have no combining marks in the string.

I'm not opposed to having a new string type being developed, but I'mskeptical about its inclusion in the language. We already have threestring types which can be assumed to contain valid UTF sequences. Ithink the first thing to do is not to develop a new string type, but todevelop the normalization and grapheme splitting algorithms, and thoseto find a substring, using the existing char[], wchar[] and dchar[]types. Then write a program with proper handling of Unicode usingthose and hand-optimize it. If that proves to be a pain (it might wellbe), write a new string type, rewrite the program using it, do somebenchmarks and then we'll know if it's a good idea, and will be able toquantify the drawbacks.

But right now, all this arguing for or against a new string type isstacking hypothesis against other hypothesis, it won't lead anywhere.


--
Michel Fortin
michel.for...@michelf.com
http://michelf.com/

Re: Why the hell doesn't foreach decode strings

Reply via email to