>>> I'd prefer to have an option to use UTF-16 (treated as a 2-byte
>>> character set with surrogate pairs) as that will only halve the
>>> maximum allowed number of characters.

The maximum allowed number of characters in Unicode is about 1
Million. Which can be perfectly represented by either UTF-8 or UTF-16.

>> Nope. If you take into account surrogates, UTF-16 will have the
>> same maximum of 4 bytes per character.

You should think of that not as 4 bytes but as two 16-bit words.

> You are missing my point. There are two ways to consider UTF-16, one is
> your interpretation where each character is 2-4 bytes, or as 2 byte 
> 'characters', where some codepoints are built from a surrogate pair 
> (which essential means that some codepoints require two 'characters', 
> which in isolation don't make much sense).

I don't get your point. UTF-16 is a standard that uses one or two
16-bit words to represent one Unicode character (code point). That's
the only way to consider it. (UCS-2 uses one 16-bit word, which is
only usable for BMP characters, making it completely useless today.)

> As most languages don't need those surrogate pairs for their
> codepoints/glyphs, it is easier to consider UTF-16 to be 2 byte. As far
> as I know this is how most UTF-16 implementations handle it.

You mix up words like "byte", "character", "codepoint", and "glyph".
In the good old ASCII days we had a 1:1:1 relationship between
"bytes", "characters" and "glyphs". Today there is no such
relationship anymore.

In the Unicode system, you usually need more than one byte to represent a
character. You may need more than one character to represent a glyph.

Regards

Stefan



------------------------------------------------------------------------------
Learn the latest--Visual Studio 2012, SharePoint 2013, SQL 2012, more!
Discover the easy way to master current and previous Microsoft technologies
and advance your career. Get an incredible 1,500+ hours of step-by-step
tutorial videos with LearnDevNow. Subscribe today and save!
http://pubads.g.doubleclick.net/gampad/clk?id=58040911&iu=/4140/ostg.clktrk
Firebird-Devel mailing list, web interface at 
https://lists.sourceforge.net/lists/listinfo/firebird-devel

Reply via email to