Re: [HACKERS] Reducing the overhead of NUMERIC data

Gregory Maxwell Fri, 04 Nov 2005 16:18:30 -0800

On 11/4/05, Martijn van Oosterhout <[email protected]> wrote:
[snip]
> : ICU does not use UCS-2. UCS-2 is a subset of UTF-16. UCS-2 does not
> : support surrogates, and UTF-16 does support surrogates. This means
> : that UCS-2 only supports UTF-16's Base Multilingual Plane (BMP). The
> : notion of UCS-2 is deprecated and dead. Unicode 2.0 in 1996 changed
> : its default encoding to UTF-16.
> <snip>


This means it's fine.. ICU's use of UTF-16 will not break our support
for all of unicode. Conversion too and from UTF-16 isn't cheap,
however, if you're doing it all the time. Storing ASCII in UTF-16 is
pretty lame. Widespread use of UTF-16 tends to hide bugs in the
handling of non-bmp characters. ...  I would be somewhat surprised to
see a substantial performance difference in working with UTF-16 data
over UTF-8, but then again ... they'd know and I wouldn't.

Other lame aspects of using unicode encodings other than UTF-8
internally is that it's harder to figure out what is text in GDB
output and such.. can make debugging more difficult.

---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?

               http://archives.postgresql.org

Re: [HACKERS] Reducing the overhead of NUMERIC data

Reply via email to