[GENERAL] Client Encoding and Latin characters

2009-11-24 Thread Lee Hachadoorian
My database is encoded UTF8. I recently was uploading (via COPY) some
census data which included place names with ñ, é, ü, and other such
characters. The upload choked on the Latin characters. Following the
docs, I was able to fix this with:

SET CLIENT_ENCODING TO 'LATIN1';
COPY table FROM 'filename';

After which I

SET CLIENT_ENCODING TO 'UTF8';

I typically use COPY FROM to bulk load data. My question is, is there
any disadvantage to setting the default client_encoding as LATIN1? I
expect to never be dealing with Asian languages, or most of the other
LATINx languages. If I ever try to COPY FROM data incompatible with
LATIN1, the command will just choke, and I can pick an appropriate
encoding and try again, right?

Thanks,
--Lee

-- 
Lee Hachadoorian
PhD Student, Geography
Program in Earth  Environmental Sciences
CUNY Graduate Center

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Client Encoding and Latin characters

2009-11-24 Thread Tom Lane
Lee Hachadoorian lee.hachadoor...@gmail.com writes:
 My database is encoded UTF8. I recently was uploading (via COPY) some
 census data which included place names with ñ, é, ü, and other such
 characters. The upload choked on the Latin characters. Following the
 docs, I was able to fix this with:

 SET CLIENT_ENCODING TO 'LATIN1';
 COPY table FROM 'filename';

 After which I

 SET CLIENT_ENCODING TO 'UTF8';

 I typically use COPY FROM to bulk load data. My question is, is there
 any disadvantage to setting the default client_encoding as LATIN1? I
 expect to never be dealing with Asian languages, or most of the other
 LATINx languages. If I ever try to COPY FROM data incompatible with
 LATIN1, the command will just choke, and I can pick an appropriate
 encoding and try again, right?

Uh, no.  You can pretty much assume that LATIN1 will take any random
byte string; likewise for any other single-byte encoding.  UTF8 as a
default is a bit safer because it's significantly more likely that it
will be able to detect non-UTF8 input.

regards, tom lane

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Client Encoding and Latin characters

2009-11-24 Thread Lee Hachadoorian
 Uh, no.  You can pretty much assume that LATIN1 will take any random
 byte string; likewise for any other single-byte encoding.  UTF8 as a
 default is a bit safer because it's significantly more likely that it
 will be able to detect non-UTF8 input.

                        regards, tom lane


So, IIUC, the general approach is:

*Leave the default client_encoding = server_encoding (in this case UTF8)
*Rely on the client to change client_encoding on a session basis only

Thanks,
--Lee

-- 
Lee Hachadoorian
PhD Student, Geography
Program in Earth  Environmental Sciences
CUNY Graduate Center

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general