Re: [HACKERS] Support UTF-8 files with BOM in COPY FROM

Robert Haas Mon, 26 Sep 2011 10:19:53 -0700

On Mon, Sep 26, 2011 at 1:15 PM, Tom Lane <[email protected]> wrote:
> Robert Haas <[email protected]> writes:
>> On Mon, Sep 26, 2011 at 11:09 AM, Tatsuo Ishii <[email protected]> wrote:
>>> Suppose a user uses brain-dead editor, which does not accept UTF-8
>>> without BOM.
>
>> Maybe this needs to be an optional behavior, controlled by some COPY option.
>
> I'm not excited about emitting non-standards-conformant output on the
> strength of a hypothetical argument about users and editors that may or
> may not exist.  I believe that there's a use-case for reading BOMs, but
> I have seen no field complaints demonstrating that we need to write
> them.  Even if we had a couple, "use a less brain dead editor" might be
> the best response.  We cannot promise to be compatible with arbitrarily
> broken software.


The thing that makes me doubt that is this comment from Tatsuo Ishii:

TI> COPY explicitly specifies the encoding (to be UTF-8 in this case).  So
TI> I think we should not regard U+FEFF as "BOM" in COPY, rather we should
TI> regard U+FEFF as "ZERO WIDTH NO-BREAK SPACE".

If a BOM is confusable with valid data, then I think recognizing it
and discarding it unconditionally is no good - you could end up where
COPY OUT, TRUNCATE, COPY IN changes the table contents.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Support UTF-8 files with BOM in COPY FROM

Reply via email to