On 3/7/2014 8:40 AM, Michel Fortin wrote:
On 2014-03-07 03:59:55 +0000, "bearophile" <bearophileh...@lycos.com> said:

Walter Bright:

I understand this all too well. (Note that we currently have a
different silent problem: unnoticed large performance problems.)

On the other hand your change could introduce Unicode-related bugs in
future code (that the current Phobos avoids) (and here I am not
talking about code breakage).

The way Phobos works isn't any more correct than dealing with code
units. Many graphemes span on multiple code points -- because of
combined diacritics or character variant modifiers -- and decoding at
the code-point level is thus often insufficient for correctness.


Well, it is *more* correct, as many western languages are more likely in current Phobos to "just work" in most cases. It's just that things still aren't completely correct overall.

 From my experience, I'd suggest these basic operations for a "string
range" instead of the regular range interface:

.empty
.frontCodeUnit
.frontCodePoint
.frontGrapheme
.popFrontCodeUnit
.popFrontCodePoint
.popFrontGrapheme
.codeUnitLength (aka length)
.codePointLength (for dchar[] only)
.codePointLengthLinear
.graphemeLengthLinear

Someone should be able to mix all the three 'front' and 'pop' function
variants above in any code dealing with a string type. In my XML parser
for instance I regularly use frontCodeUnit to avoid the decoding penalty
when matching the next character with an ASCII one such as '<' or '&'.
An API like the one above forces you to be aware of the level you're
working on, making bugs and inefficiencies stand out (as long as you're
familiar with each representation).

If someone wants to use a generic array/range algorithm with a string,
my opinion is that he should have to wrap it in a range type that maps
front and popFront to one of the above variant. Having to do that should
make it obvious that there's an inefficiency there, as you're using an
algorithm that wasn't tailored to work with strings and that more
decoding than strictly necessary is being done.


I actually like this suggestion quite a bit.


Reply via email to