On Mon, Sep 26, 2011 at 2:38 PM, Peter Eisentraut <pete...@gmx.net> wrote: > On mån, 2011-09-26 at 13:19 -0400, Robert Haas wrote: >> The thing that makes me doubt that is this comment from Tatsuo Ishii: >> >> TI> COPY explicitly specifies the encoding (to be UTF-8 in this case). >> So >> TI> I think we should not regard U+FEFF as "BOM" in COPY, rather we >> should >> TI> regard U+FEFF as "ZERO WIDTH NO-BREAK SPACE". >> >> If a BOM is confusable with valid data, then I think recognizing it >> and discarding it unconditionally is no good - you could end up where >> COPY OUT, TRUNCATE, COPY IN changes the table contents. > > We did recently accept a patch for psql -f to skip over a UTF-8 > byte-order mark. We had a lot of this same discussion there.
But that case is different, because zero-width, non-breaking space has no particular meaning in an SQL script - it's either going to be ignored as a BOM, ignored as whitespace, or an error. But inside a file being subjected to COPY it might be confusable with data that the user wanted to end up in some table. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers