On Sun, 07 Sep 2014 10:45:22 +0000
via Digitalmars-d <digitalmars-d@puremagic.com> wrote:

> For western text strings utf-8 is much better due to cache 
> efficiency. You can speed it up using SSE or dedicated 
> datastructures.
that's what i call efficiency! using SIMD for string indexing!

> The point of having unique immutable strings is that they compare 
> by reference only and that you can have auxillary datastructures 
> that classify them if needed.
and this fill fail with compacting gc. heh.

> I think the D approach to strings is unpleasant. You should not 
> have slices of strings, only slices of ubyte arrays.
oh, no, thanks. casting strings back and forth for slicing is not fun.
and writing parsers using string slicing is fun.

> If you want real speedups for streams of symbols you have to move 
> into the landscape of huffman-encoding, tries, dedicated 
> datastructures…
or just ditch utf-8 and use ucs-4. this will speedup the most
frequently string operations: correct indexing and slicing.

> Having uniform string support in libraries (i.e. only supporting 
> utf-8) is a clear advantage IMO, that will allow for APIs that 
> are SSE backed and performant.
utf-8 was not invented as encoding for internal string representation.
it's merely for data interchange. i myself believe that language should
not do any encoding/decoding on given string without explicit asking.
i.e. `foreach (dchar ch; s)` must be the same as `foreach (char ch; s)`
when s is `string`. for any decoding i must use `foreach (ch; s.byUtf8Char)`.

the whole "let's use utf-8 as internal string representation" was a
mistake. and i'm not talking about D here.

Attachment: signature.asc
Description: PGP signature

Reply via email to