Re: [HACKERS] UTF8 with BOM support in psql

Itagaki Takahiro Mon, 16 Nov 2009 21:19:47 -0800

Tom Lane <[email protected]> wrote:

> Itagaki Takahiro <[email protected]> writes:
> > If encoding setting is reverted, 
> >> "Eat BOM at beginning of file and <<set client encoding to UTF-8>>"
> > will be much safer.
> 
> This isn't going to happen, so please stop wasting our time arguing
> about it.


Ok, sorry. But I still cannot accept this restriction.
>> - Only when client encoding is UTF-8 --> please fix that

The attachd patch is a new proposal of the feature.
When we found BOM at beginning of file, set "expected_encoding" to UTF8.
Before every execusion of query, if pset.encoding is not UTF8, we check the
query string not to contain any non-ASCII characters and throw an error if
found. Encoding declarations are typically written only in ascii characters,
so we can postpone encoding checking until non-ascii characters appear.

Since the default value of expected_encoding is SQL_ASCII, that pass
through all characters, so the patch does nothing to scripts without BOM.
(There are no codes to set expected_encoding except BOM.)
If client encoding is UTF8, it skips BOM and no effect to the script body.
BOMs are skipped even if client encoding is not set to UTF8, but can throw
an error if there are no explicit encoding declaration.

AFAIC, the patch can solve the almost problems in the discussions
developmentally. Comments welcome.

Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center

psql-utf8bom_20091117.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] UTF8 with BOM support in psql

Reply via email to