On Saturday, 25 May 2013 at 19:58:25 UTC, Dmitry Olshansky wrote:
Runs away in horror :) It's mess even before you've got to details.
Perhaps it's fatally flawed, but I don't see an argument for why, so I'll assume you can't find such a flaw. It is still _much less_ messy than UTF-8, that is the critical distinction.

Another point about using sometimes a 2-byte encoding - welcome to the nice world of BigEndian/LittleEndian i.e. the very trap UTF-16 has stepped into.
I don't think this is a sizable obstacle. It takes some coordination, but it is a minor issue.

On Saturday, 25 May 2013 at 20:20:11 UTC, Juan Manuel Cabo wrote:
You obviously are not thinking it through. Such encoding would have a O(n^2) complexity for appending a character/symbol in a different language to the string, since you would have to update the beginning of the string, and move the contents forward to make room. Not to mention that it wouldn't be backwards compatible with ascii routines, and the complexity of such a header would be have to be carried all the way to font rendering routines in the OS.
You obviously have not read the rest of the thread, both your non-font-related assertions have been addressed earlier. I see no reason why a single-byte encoding of UCS would have to be carried to "font rendering routines" but UTF-8 wouldn't be.

Multiple languages/symbols in one string is a blessing of modern humane computing. It is the norm more than the exception in most of the world.
I disagree, but in any case, most of this thread refers to multi-language strings. The argument is about how best to encode them.

On Saturday, 25 May 2013 at 20:47:25 UTC, Peter Alexander wrote:
On Saturday, 25 May 2013 at 14:58:02 UTC, Joakim wrote:
On Saturday, 25 May 2013 at 14:16:21 UTC, Peter Alexander wrote:
I suggest you read up on UTF-8. You really don't understand it. There is no need to decode, you just treat the UTF-8 string as if it is an ASCII string.
Not being aware of this shortcut doesn't mean not understanding UTF-8.

It's not just a shortcut, it is absolutely fundamental to the design of UTF-8. It's like saying you understand Lisp without being aware that everything is a list.
It is an accidental shortcut because of the encoding scheme chosen for UTF-8 and, as I've noted, still less efficient than similarly searching a single-byte encoding. The fact that you keep trumpeting this silly detail as somehow "fundamental" suggests you have no idea what you're talking about.

Also, you continuously keep stating disadvantages to UTF-8 that are completely false, like "slicing does require decoding". Again, completely missing the point of UTF-8. I cannot conceive how you can claim to understand how UTF-8 works yet repeatedly demonstrating that you do not.
Slicing on code points requires decoding, I'm not sure how you don't know that. If you mean slicing by byte, that is not only useless, but _every_ encoding can do that. I cannot conceive how you claim to defend UTF-8, yet keep making such stupid points, that you don't even bother backing up.

You are either ignorant or a successful troll. In either case, I'm done here.
Must be nice to just insult someone who has demolished your arguments and leave. Good riddance, you weren't adding anything.

Reply via email to