On Friday, 24 May 2013 at 20:37:58 UTC, Joakim wrote:
3. Even if I have a string that is 99% ASCII then I have to pay extra bytes for every character just because 1% wasn't ASCII. With UTF-8, I only pay the extra bytes when needed.
I don't understand what you mean here. If your string has a thousand non-ASCII characters, the UTF-8 version will have one or two thousand more characters, ie 1 or 2 KB more. My format would add a couple bytes in the header for each non-ASCII language character used, that's it. It's a clear win for my format.
Sorry, I was a bit imprecise.  Here's what I meant to write:

I don't understand what you mean here.  If your string has a
thousand non-ASCII characters, the UTF-8 version will have one
or two thousand more bytes, ie 1 or 2 KB more.  My format
would add a couple bytes in the header for each non-ASCII
language used, that's it.  It's a clear win for my format.

Reply via email to