Robert Haas <robertmh...@gmail.com> writes: > The thing that makes me doubt that is this comment from Tatsuo Ishii: > TI> COPY explicitly specifies the encoding (to be UTF-8 in this case). So > TI> I think we should not regard U+FEFF as "BOM" in COPY, rather we should > TI> regard U+FEFF as "ZERO WIDTH NO-BREAK SPACE".
Yeah, that's a reasonable argument for rejecting the patch altogether. I'm not qualified to decide whether it outweighs the "we need to be able to read Notepad output" argument. I do observe that http://en.wikipedia.org/wiki/Byte_order_mark says Unicode 3.2 has deprecated the no-break-space interpretation, but on the other hand you're right that we can't really assume that the character is not present in people's data. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers