Re: Why UTF-8/16 character encodings?

Joakim Sun, 26 May 2013 02:55:33 -0700

On Saturday, 25 May 2013 at 19:58:25 UTC, Dmitry Olshansky wrote:

Runs away in horror :) It's mess even before you've got todetails.

Perhaps it's fatally flawed, but I don't see an argument for why,so I'll assume you can't find such a flaw. It is still _muchless_ messy than UTF-8, that is the critical distinction.

Another point about using sometimes a 2-byte encoding - welcometo the nice world of BigEndian/LittleEndian i.e. the very trapUTF-16 has stepped into.

I don't think this is a sizable obstacle. It takes somecoordination, but it is a minor issue.


On Saturday, 25 May 2013 at 20:20:11 UTC, Juan Manuel Cabo wrote:

You obviously are not thinking it through. Such encoding wouldhave a O(n^2) complexity for appending a character/symbol in adifferent language to the string, since you would have toupdate the beginning of the string, and move the contentsforward to make room. Not to mention that it wouldn't bebackwards compatible with ascii routines, and the complexity ofsuch a header would be have to be carried all the way to fontrendering routines in the OS.

You obviously have not read the rest of the thread, both yournon-font-related assertions have been addressed earlier. I seeno reason why a single-byte encoding of UCS would have to becarried to "font rendering routines" but UTF-8 wouldn't be.

Multiple languages/symbols in one string is a blessing ofmodern humane computing. It is the norm more than the exceptionin most of the world.

I disagree, but in any case, most of this thread refers tomulti-language strings. The argument is about how best to encodethem.


On Saturday, 25 May 2013 at 20:47:25 UTC, Peter Alexander wrote:

On Saturday, 25 May 2013 at 14:58:02 UTC, Joakim wrote:
On Saturday, 25 May 2013 at 14:16:21 UTC, Peter Alexanderwrote:
I suggest you read up on UTF-8. You really don't understandit. There is no need to decode, you just treat the UTF-8string as if it is an ASCII string.
Not being aware of this shortcut doesn't mean notunderstanding UTF-8.
It's not just a shortcut, it is absolutely fundamental to thedesign of UTF-8. It's like saying you understand Lisp withoutbeing aware that everything is a list.

It is an accidental shortcut because of the encoding schemechosen for UTF-8 and, as I've noted, still less efficient thansimilarly searching a single-byte encoding. The fact that youkeep trumpeting this silly detail as somehow "fundamental"suggests you have no idea what you're talking about.

Also, you continuously keep stating disadvantages to UTF-8 thatare completely false, like "slicing does require decoding".Again, completely missing the point of UTF-8. I cannot conceivehow you can claim to understand how UTF-8 works yet repeatedlydemonstrating that you do not.

Slicing on code points requires decoding, I'm not sure how youdon't know that. If you mean slicing by byte, that is not onlyuseless, but _every_ encoding can do that. I cannot conceive howyou claim to defend UTF-8, yet keep making such stupid points,that you don't even bother backing up.

You are either ignorant or a successful troll. In either case,I'm done here.

Must be nice to just insult someone who has demolished yourarguments and leave. Good riddance, you weren't adding anything.

Re: Why UTF-8/16 character encodings?

Reply via email to