[GENERAL] Client Encoding and Latin characters
My database is encoded UTF8. I recently was uploading (via COPY) some census data which included place names with ñ, é, ü, and other such characters. The upload choked on the Latin characters. Following the docs, I was able to fix this with: SET CLIENT_ENCODING TO 'LATIN1'; COPY table FROM 'filename'; After which I SET CLIENT_ENCODING TO 'UTF8'; I typically use COPY FROM to bulk load data. My question is, is there any disadvantage to setting the default client_encoding as LATIN1? I expect to never be dealing with Asian languages, or most of the other LATINx languages. If I ever try to COPY FROM data incompatible with LATIN1, the command will just choke, and I can pick an appropriate encoding and try again, right? Thanks, --Lee -- Lee Hachadoorian PhD Student, Geography Program in Earth Environmental Sciences CUNY Graduate Center -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Client Encoding and Latin characters
Lee Hachadoorian lee.hachadoor...@gmail.com writes: My database is encoded UTF8. I recently was uploading (via COPY) some census data which included place names with ñ, é, ü, and other such characters. The upload choked on the Latin characters. Following the docs, I was able to fix this with: SET CLIENT_ENCODING TO 'LATIN1'; COPY table FROM 'filename'; After which I SET CLIENT_ENCODING TO 'UTF8'; I typically use COPY FROM to bulk load data. My question is, is there any disadvantage to setting the default client_encoding as LATIN1? I expect to never be dealing with Asian languages, or most of the other LATINx languages. If I ever try to COPY FROM data incompatible with LATIN1, the command will just choke, and I can pick an appropriate encoding and try again, right? Uh, no. You can pretty much assume that LATIN1 will take any random byte string; likewise for any other single-byte encoding. UTF8 as a default is a bit safer because it's significantly more likely that it will be able to detect non-UTF8 input. regards, tom lane -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Client Encoding and Latin characters
Uh, no. You can pretty much assume that LATIN1 will take any random byte string; likewise for any other single-byte encoding. UTF8 as a default is a bit safer because it's significantly more likely that it will be able to detect non-UTF8 input. regards, tom lane So, IIUC, the general approach is: *Leave the default client_encoding = server_encoding (in this case UTF8) *Rely on the client to change client_encoding on a session basis only Thanks, --Lee -- Lee Hachadoorian PhD Student, Geography Program in Earth Environmental Sciences CUNY Graduate Center -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general