* Florian Pflug: > On Nov24, 2011, at 10:54 , Florian Weimer wrote: >>> Or is it not only about being able to *store* NULs in a text field? >> >> No, the entire core should be NUL-transparent. > > That's unlikely to happen.
Yes, with the type input/output functions tied to NUL-terminated strings, that seems indeed unlikely to happen. > A more realistic approach would be to solve this only for UTF-8 > encoded strings by encoding the NUL character not as a single 0 byte, > but as sequence of non-0 bytes. 0xFF cannot occur in valid UTF-8, so that's one possibility. > Java, for example, seems to use it to serialize Strings (which may contain > NUL characters) to UTF-8. Only internally in the VM. UTF-8 produced by the I/O encoder/decoders produces and consumes NUL bytes. > Should you try to add a new encoding which supports that, you might also > want to allow CESU-8-style encoding of UTF-16 surrogate pairs. This means > that code points representable by UTF-16 surrogate pairs may be encoded by > separately encoding the two surrogate characters in UTF-8. I'm not sure if this is a good idea. The motivation behind CESU-8 is that it sorts byte-encoded strings in the same order as UTF-16, which is a completely separate concern. -- Florian Weimer <fwei...@bfk.de> BFK edv-consulting GmbH http://www.bfk.de/ Kriegsstraße 100 tel: +49-721-96201-1 D-76133 Karlsruhe fax: +49-721-96201-99 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers