On 9-8-2013 01:18, Adriano dos Santos Fernandes wrote: > On 08-08-2013 13:30, Mark Rotteveel wrote: >> Looking in the source of intl_builtin.cpp I noticed that there is >> support for UTF16, UTF32 and UNICODE_UCS2, for UNICODE_UCS2 there is >> also a constant (=8) defined in charsets.h >> >> These definitions are missing from RDB$CHARACTER_SETS. Can these be used >> as a connection or column character set? If not, what are they for? >> > > These are for internal usage only. > > I doubt someone can make UTF16/32 works as connection charset, it's too > much work. > > For columns, with some work may be possible. But why? UTF-8 uses 1-4 > bytes per char, UTF-16 is also multibyte, using 2-4, and UTF-32 always 4 > bytes per char. > > I don't see how they might be preferred over UTF-8.
Although technically UTF-16 is 2-4 bytes, for most languages you can consider it to be each 'character' is 2 bytes, but some are composed of two 'characters' (surrogate pairs). If you document that as a limitation, then for VARCHAR you can instead of having a limit of 8191 characters with UTF-8, have a limit of 16382 with the proviso about surrogate pairs. This is for example how Java handles it internally: http://docs.oracle.com/javase/7/docs/api/java/lang/Character.html#unicode Mark -- Mark Rotteveel ------------------------------------------------------------------------------ Get 100% visibility into Java/.NET code with AppDynamics Lite! It's a free troubleshooting tool designed for production. Get down to code-level detail for bottlenecks, with <2% overhead. Download for free and get started troubleshooting in minutes. http://pubads.g.doubleclick.net/gampad/clk?id=48897031&iu=/4140/ostg.clktrk Firebird-Devel mailing list, web interface at https://lists.sourceforge.net/lists/listinfo/firebird-devel