Gregory Maxwell <[EMAIL PROTECTED]> writes: > Another way to look at this is in the context of compression: With > unicode, characters are really 32bit values... But only a small range > of these values is common. So we store and work with them in a > compressed format, UTF-8.
> As such it might be more interesting to ask some other questions like: > are we using the best compression algorithm for the application, and, > why do we sometimes stack two compression algorithms? Actually, the real reason we use UTF-8 and not any of the sorta-fixed-size representations of Unicode is that the backend is by and large an ASCII, null-terminated-string engine. *All* of the supported backend encodings are ASCII-superset codes. Making everything null-safe in order to allow use of UCS2 or UCS4 would be a huge amount of work, and the benefit is at best questionable. regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match