Re: [HACKERS] Support UTF-8 files with BOM in COPY FROM

Andrew Dunstan Mon, 26 Sep 2011 11:47:52 -0700


On 09/26/2011 02:38 PM, Peter Eisentraut wrote:

On mån, 2011-09-26 at 13:19 -0400, Robert Haas wrote:

The thing that makes me doubt that is this comment from Tatsuo Ishii:

TI>  COPY explicitly specifies the encoding (to be UTF-8 in this case).
So
TI>  I think we should not regard U+FEFF as "BOM" in COPY, rather we
should
TI>  regard U+FEFF as "ZERO WIDTH NO-BREAK SPACE".

If a BOM is confusable with valid data, then I think recognizing it
and discarding it unconditionally is no good - you could end up where
COPY OUT, TRUNCATE, COPY IN changes the table contents.

We did recently accept a patch for psql -f to skip over a UTF-8
byte-order mark.  We had a lot of this same discussion there.

Yes, but wasn't part of the rationale that this was safe because aleading BOM could not possibly be mistaken for anything else legitimatein an SQL source file? That's quite different from a data file. ISTM.


cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Support UTF-8 files with BOM in COPY FROM

Reply via email to