Re: [Firebird-devel] Unicode UTF-16 etc

Mark Rotteveel Fri, 09 Aug 2013 23:50:29 -0700

On 9-8-2013 01:18, Adriano dos Santos Fernandes wrote:
> On 08-08-2013 13:30, Mark Rotteveel wrote:
>> Looking in the source of intl_builtin.cpp I noticed that there is
>> support for UTF16, UTF32 and UNICODE_UCS2, for UNICODE_UCS2 there is
>> also a constant (=8) defined in charsets.h
>>
>> These definitions are missing from RDB$CHARACTER_SETS. Can these be used
>> as a connection or column character set? If not, what are they for?
>>
>
> These are for internal usage only.
>
> I doubt someone can make UTF16/32 works as connection charset, it's too
> much work.
>
> For columns, with some work may be possible. But why? UTF-8 uses 1-4
> bytes per char, UTF-16 is also multibyte, using 2-4, and UTF-32 always 4
> bytes per char.
>
> I don't see how they might be preferred over UTF-8.


Although technically UTF-16 is 2-4 bytes, for most languages you can 
consider it to be each 'character' is 2 bytes, but some are composed of 
two 'characters'  (surrogate pairs). If you document that as a 
limitation, then for VARCHAR you can instead of having a limit of 8191 
characters with UTF-8, have a limit of 16382 with the proviso about 
surrogate pairs. This is for example how Java handles it internally: 
http://docs.oracle.com/javase/7/docs/api/java/lang/Character.html#unicode

Mark
-- 
Mark Rotteveel

------------------------------------------------------------------------------
Get 100% visibility into Java/.NET code with AppDynamics Lite!
It's a free troubleshooting tool designed for production.
Get down to code-level detail for bottlenecks, with <2% overhead. 
Download for free and get started troubleshooting in minutes. 
http://pubads.g.doubleclick.net/gampad/clk?id=48897031&iu=/4140/ostg.clktrk
Firebird-Devel mailing list, web interface at 
https://lists.sourceforge.net/lists/listinfo/firebird-devel

Re: [Firebird-devel] Unicode UTF-16 etc

Reply via email to