Am Mon, 10 Mar 2014 17:44:22 -0400 schrieb Nick Sabalausky <seewebsitetocontac...@semitwist.com>:
> On 3/7/2014 8:40 AM, Michel Fortin wrote: > > On 2014-03-07 03:59:55 +0000, "bearophile" <bearophileh...@lycos.com> said: > > > >> Walter Bright: > >> > >>> I understand this all too well. (Note that we currently have a > >>> different silent problem: unnoticed large performance problems.) > >> > >> On the other hand your change could introduce Unicode-related bugs in > >> future code (that the current Phobos avoids) (and here I am not > >> talking about code breakage). > > > > The way Phobos works isn't any more correct than dealing with code > > units. Many graphemes span on multiple code points -- because of > > combined diacritics or character variant modifiers -- and decoding at > > the code-point level is thus often insufficient for correctness. > > > > Well, it is *more* correct, as many western languages are more likely in > current Phobos to "just work" in most cases. It's just that things still > aren't completely correct overall. > > > From my experience, I'd suggest these basic operations for a "string > > range" instead of the regular range interface: > > > > .empty > > .frontCodeUnit > > .frontCodePoint > > .frontGrapheme > > .popFrontCodeUnit > > .popFrontCodePoint > > .popFrontGrapheme > > .codeUnitLength (aka length) > > .codePointLength (for dchar[] only) > > .codePointLengthLinear > > .graphemeLengthLinear > > > > Someone should be able to mix all the three 'front' and 'pop' function > > variants above in any code dealing with a string type. In my XML parser > > for instance I regularly use frontCodeUnit to avoid the decoding penalty > > when matching the next character with an ASCII one such as '<' or '&'. > > An API like the one above forces you to be aware of the level you're > > working on, making bugs and inefficiencies stand out (as long as you're > > familiar with each representation). > > > > If someone wants to use a generic array/range algorithm with a string, > > my opinion is that he should have to wrap it in a range type that maps > > front and popFront to one of the above variant. Having to do that should > > make it obvious that there's an inefficiency there, as you're using an > > algorithm that wasn't tailored to work with strings and that more > > decoding than strictly necessary is being done. > > > > I actually like this suggestion quite a bit. +1 Reminds me of my proposal for Rust (https://github.com/mozilla/rust/issues/7043#issuecomment-19187984) -- Marco