Re: Creeping Bloat in Phobos

Uranuz via Digitalmars-d Sun, 28 Sep 2014 05:12:02 -0700

On Sunday, 28 September 2014 at 00:13:59 UTC, Andrei Alexandrescuwrote:

On 9/27/14, 3:40 PM, H. S. Teoh via Digitalmars-d wrote:
If we can get Andrei on board, I'm all for killing offautodecoding.
That's rather vague; it's unclear what would replace it. --Andrei

I believe that removing autodeconding will make things evenworse. As far as understand if we will remove it from front()function that operates on narrow strings then it will return justbyte of char. I believe that proceeding on narrow string by `userperceived chars` (graphemes) is more common use case. Operatingon single bytes of multibyte character is uncommon task and youcan do that via direct indexing of char[] array. I believe whatnumber of bytes is in *user perceived chars* is internalimplementation of UTF-8 encoding and it should not be consideredin common tasks such as parsing, searching, replacing text andetc. If you need byte representation of string you should cast itinto ubyte[] and work with it using the same range functionswithout autodecoding.

The main problem that I see that unexpirienced in D programmercan be confused where he operates by bytes or by graphemes.Especially it could happen when he migrates from C#, Python wherestring is not considered as array of it's bytes. Because *char*in D is not char it's a part of char, but not entire char. It'smain inconsistence.

Possible solution is to include class or struct implementation ofstring and hide internal implementation of narrow string forthose users who don't need to operate on single bytes of UTF-8characters. I believe it's the best way to kill all the rabbits))We could provide this class String with method returning ubyte[](better way) or char[] that will expose internal implementationfor those who need it.

A question: can you list some languages that represent UTF-8narrow strings as array of single bytes?

Re: Creeping Bloat in Phobos

Reply via email to