Re: [SQL] Understanding Encoding
> Hello All, > > I am not able to understand how the encoding is handled. I would be happy > if someone can tell what is happening in the following scenario: > > 1. I have created a database with EUC_KR encoding and created a table and > inserted some korean value into it. > > =# CREATE DATABASE korean WITH ENCODING 'EUC_KR' LC_COLLATE='ko_KR.euckr' > LC_CTYPE='ko_KR.euckr' TEMPLATE=template0; > > =# \c korean > > korean=# SHOW client_encoding; > client_encoding > - > UTF8 > (1 row) > > korean=# CREATE TABLE tbl (doc text); > > korean=# INSERT INTO tbl VALUES ('그레스'); > > > 2. If I insert non-korean values it throws error: > > korean=# INSERT INTO tbl VALUES ('データベース'); > ERROR: character with byte sequence 0xe3 0x83 0xbc in encoding "UTF8" has > no equivalent in encoding "EUC_KR" The error messages says all. PostgreSQL accepted 'データベース' encoded in UTF-8 then tried to convert to EUC_KR but failed, because EUC_KR does not accept languages other than Korean (and ASCII). What else did you expect? > korean=# SELECT * FROM tbl; > doc > > 그레스 > (1 row) > > > 3. I change the client encoding to EUC_KR and try inserting the same korean > characters and it throws an error: > > korean=# SET client_encoding = 'EUC_KR'; > SET > korean=# INSERT INTO tbl VALUES ('그레스'); > ERROR: invalid byte sequence for encoding "EUC_KR": 0xa0 0x88 0xa0 is definitely not part of EUC_KR. That's why PostgreSQL throws an error. I gues you are using UHC (Unified Hangul Code), rather than EUC_KR. They are different encodings. You should do either: 1) Make sure that your termical encoding is EUC_KR. 2) set client_encoding = 'uhc'; > Even the SELECT statement displays something different. I am not able to > understand why? > > korean=# SELECT * FROM tbl; > doc > > ���� > (1 row) This is because the same reason above. > Can someone please help me. > > Thanks you, > > Beena Emerson -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese: http://www.sraoss.co.jp -- Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-sql
Re: [SQL] [NOVICE] Understanding Encoding
On Fri, Sep 6, 2013 at 3:47 PM, Beena Emerson wrote: > >> >> I wonder if you have tried changing your "locale" to ko_KR; something >> like: >> >> LANG=ko_KR LC_ALL=ko_KR \ >> psql -d korean >> > > Hi, > > It still gives same result: > > $ LANG=ko_KR LC_ALL=ko_KR > $ psql -d korean > > korean=# SHOW client_encoding; > client_encoding > - > EUC_KR > (1 row) > > korean=# INSERT INTO tbl VALUES ('그레스'); > ERROR: invalid byte sequence for encoding "EUC_KR": 0xa0 0x88 I changed the encoding of the terminal emulator (GNOME Terminal 2.31.3) using the Terminal menu as: Terminal -> Set Character Encoding -> Korean (EUC-KR) Note that, if the menu only lists UTF-8, you'd have to add EUC-KR using "Add or Remove". And it seems to work; could you try the same? -- Amit Langote -- Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-sql
Re: [SQL] [NOVICE] Understanding Encoding
On Fri, Sep 6, 2013 at 12:29 PM, Tom Lane wrote: > Beena Emerson writes: > > It still gives same result: > > > $ LANG=ko_KR LC_ALL=ko_KR > > $ psql -d korean > > > korean=# SHOW client_encoding; > > client_encoding > > - > > EUC_KR > > (1 row) > > > korean=# INSERT INTO tbl VALUES ('그레스'); > > ERROR: invalid byte sequence for encoding "EUC_KR": 0xa0 0x88 > > What you need to figure out is what encoding the text you are typing > is in. You're telling psql it's EUC_KR but it evidently isn't. > If you're typing these characters manually then it's probably determined > by a setting of the terminal-emulator program you're using. But if > you're copying-and-pasting then things get more complicated. > > Also, what you did above is not what Amit suggested: he wanted you to put > the variable assignments on the same command line as the psql invocation, > so that they'd affect the environment passed to psql. I'm suspicious of > his solution because I'd have thought the terminal program would set up > the right environment ... but you might as well try it. > I tried with both the assignment and invocation in same line. Again it gave the same result. Maybe the problem is with copy paste. I will look into it. Thank you.
Re: [SQL] Understanding Encoding
Hi, Tip: To identify what encoding you enter in the psql command interpreter: 1) Open a file with vim 2) Type in you SQL or copy/paste 3) Save the file and quit vim 4) $ file Should give you the encoding of that text file. For ex: sf@orca:~$ echo $LC_ALL en_US.UTF-8 sf@orca:~$ cat /tmp/xx abcdefé sf@orca:~$ file /tmp/xx /tmp/xx: UTF-8 Unicode text Seb On 09/06/2013 09:03 AM, Tatsuo Ishii wrote: Hello All, I am not able to understand how the encoding is handled. I would be happy if someone can tell what is happening in the following scenario: 1. I have created a database with EUC_KR encoding and created a table and inserted some korean value into it. =# CREATE DATABASE korean WITH ENCODING 'EUC_KR' LC_COLLATE='ko_KR.euckr' LC_CTYPE='ko_KR.euckr' TEMPLATE=template0; =# \c korean korean=# SHOW client_encoding; client_encoding - UTF8 (1 row) korean=# CREATE TABLE tbl (doc text); korean=# INSERT INTO tbl VALUES ('그레스'); 2. If I insert non-korean values it throws error: korean=# INSERT INTO tbl VALUES ('データベース'); ERROR: character with byte sequence 0xe3 0x83 0xbc in encoding "UTF8" has no equivalent in encoding "EUC_KR" The error messages says all. PostgreSQL accepted 'データベース' encoded in UTF-8 then tried to convert to EUC_KR but failed, because EUC_KR does not accept languages other than Korean (and ASCII). What else did you expect? korean=# SELECT * FROM tbl; doc 그레스 (1 row) 3. I change the client encoding to EUC_KR and try inserting the same korean characters and it throws an error: korean=# SET client_encoding = 'EUC_KR'; SET korean=# INSERT INTO tbl VALUES ('그레스'); ERROR: invalid byte sequence for encoding "EUC_KR": 0xa0 0x88 0xa0 is definitely not part of EUC_KR. That's why PostgreSQL throws an error. I gues you are using UHC (Unified Hangul Code), rather than EUC_KR. They are different encodings. You should do either: 1) Make sure that your termical encoding is EUC_KR. 2) set client_encoding = 'uhc'; Even the SELECT statement displays something different. I am not able to understand why? korean=# SELECT * FROM tbl; doc ���� (1 row) This is because the same reason above. Can someone please help me. Thanks you, Beena Emerson -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese: http://www.sraoss.co.jp -- Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-sql
Re: [SQL] Understanding Encoding
Hello, Thank you all. Amit, Changing the encoding of the terminal emulator worked. Sebastiean, the tip was helpful. -- Beena Emerson