On Sunday, 28 September 2014 at 00:13:59 UTC, Andrei Alexandrescu wrote:
On 9/27/14, 3:40 PM, H. S. Teoh via Digitalmars-d wrote:
If we can get Andrei on board, I'm all for killing off autodecoding.

That's rather vague; it's unclear what would replace it. -- Andrei

I believe that removing autodeconding will make things even worse. As far as understand if we will remove it from front() function that operates on narrow strings then it will return just byte of char. I believe that proceeding on narrow string by `user perceived chars` (graphemes) is more common use case. Operating on single bytes of multibyte character is uncommon task and you can do that via direct indexing of char[] array. I believe what number of bytes is in *user perceived chars* is internal implementation of UTF-8 encoding and it should not be considered in common tasks such as parsing, searching, replacing text and etc. If you need byte representation of string you should cast it into ubyte[] and work with it using the same range functions without autodecoding.

The main problem that I see that unexpirienced in D programmer can be confused where he operates by bytes or by graphemes. Especially it could happen when he migrates from C#, Python where string is not considered as array of it's bytes. Because *char* in D is not char it's a part of char, but not entire char. It's main inconsistence.

Possible solution is to include class or struct implementation of string and hide internal implementation of narrow string for those users who don't need to operate on single bytes of UTF-8 characters. I believe it's the best way to kill all the rabbits)) We could provide this class String with method returning ubyte[] (better way) or char[] that will expose internal implementation for those who need it.

A question: can you list some languages that represent UTF-8 narrow strings as array of single bytes?

Reply via email to