On Fri, Nov 04, 2005 at 08:38:38AM -0500, [EMAIL PROTECTED] wrote: > On Thu, Nov 03, 2005 at 09:17:43PM -0500, Tom Lane wrote: > > Actually, the real reason we use UTF-8 and not any of the > > sorta-fixed-size representations of Unicode is that the backend is by > > and large an ASCII, null-terminated-string engine. *All* of the > > supported backend encodings are ASCII-superset codes. Making > > everything null-safe in order to allow use of UCS2 or UCS4 would be > > a huge amount of work, and the benefit is at best questionable. > > Perhaps on a side note - my intuition (which sometimes lies) would tell > me that, if the above is true, the backend is doing unnecessary copies > of read-only data, if only, to insert a '\0' at the end of the strings. > Is this true?
It's not quite that bad. Obviously for all on disk datatype zeros are allowed. Bit strings, arrays, timestamps, numerics can all have embedded nulls and they have a length header. Where this becomes an issue is for things like table names, field names, encoding names, etc. The "name" type is a fixed length string which is kept in a way that it can be treated as a C string. If these could contain null characters it would get messy. I do conceive that the backend could support a UTF-16 datatype which would be indexable and have various support functions. But as soon as it came to talking to clients, it would be converted back to UTF-8 because libpq treats all strings coming back as null terminated. Similarly, querys sent couldn't be anything other than UTF-8 also. And if users can't send or receive UTF-16 text, why should the backend store it that way? > I'm thinking along the lines of the other threads that speak of PostgreSQL > being CPU or I/O bound, not disk bound, for many sorts of operations. Is > PostgreSQL unnecessary copying string data around (and other data, I would > assume). Well, there is a bit of copying around while creating tuples and such, but it's not to add null terminators. Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a > tool for doing 5% of the work and then sitting around waiting for someone > else to do the other 95% so you can sue them.
pgpHxbnCTUZPz.pgp
Description: PGP signature