Re: Why UTF-8/16 character encodings?

Joakim Fri, 24 May 2013 13:50:31 -0700

On Friday, 24 May 2013 at 20:37:58 UTC, Joakim wrote:

3. Even if I have a string that is 99% ASCII then I have topay extra bytes for every character just because 1% wasn'tASCII. With UTF-8, I only pay the extra bytes when needed.
I don't understand what you mean here. If your string has athousand non-ASCII characters, the UTF-8 version will have oneor two thousand more characters, ie 1 or 2 KB more. My formatwould add a couple bytes in the header for each non-ASCIIlanguage character used, that's it. It's a clear win for myformat.

Sorry, I was a bit imprecise.  Here's what I meant to write:


I don't understand what you mean here.  If your string has a
thousand non-ASCII characters, the UTF-8 version will have one
or two thousand more bytes, ie 1 or 2 KB more.  My format
would add a couple bytes in the header for each non-ASCII
language used, that's it.  It's a clear win for my format.

Re: Why UTF-8/16 character encodings?

Reply via email to