Re: [SQL] Understanding Encoding

2013-09-06 Thread Tatsuo Ishii
> Hello All,
> 
> I am not able to understand how the encoding is handled. I would be happy
> if someone can tell what is happening in the following scenario:
> 
> 1. I have created a database with EUC_KR encoding and created a table and
> inserted some korean value into it.
> 
> =# CREATE DATABASE korean WITH ENCODING 'EUC_KR' LC_COLLATE='ko_KR.euckr'
> LC_CTYPE='ko_KR.euckr' TEMPLATE=template0;
> 
> =# \c korean
> 
> korean=# SHOW client_encoding;
>  client_encoding
> -
>  UTF8
> (1 row)
> 
> korean=# CREATE TABLE tbl (doc text);
> 
> korean=# INSERT INTO tbl VALUES ('그레스');
> 
> 
> 2. If I insert non-korean values it throws error:
> 
> korean=# INSERT INTO tbl VALUES ('データベース');
> ERROR:  character with byte sequence 0xe3 0x83 0xbc in encoding "UTF8" has
> no equivalent in encoding "EUC_KR"

The error messages says all. PostgreSQL accepted 'データベース'
encoded in UTF-8 then tried to convert to EUC_KR but failed, because
EUC_KR does not accept languages other than Korean (and ASCII). What
else did you expect?

> korean=# SELECT * FROM tbl;
>   doc
> 
>  그레스
> (1 row)
> 
> 
> 3. I change the client encoding to EUC_KR and try inserting the same korean
> characters and it throws an error:
> 
> korean=# SET client_encoding = 'EUC_KR';
> SET
> korean=# INSERT INTO tbl VALUES ('그레스');
> ERROR:  invalid byte sequence for encoding "EUC_KR": 0xa0 0x88

0xa0 is definitely not part of EUC_KR. That's why PostgreSQL throws an
error. I gues you are using UHC (Unified Hangul Code), rather than
EUC_KR. They are different encodings. You should do either:

1) Make sure that your termical encoding is EUC_KR.

2) set client_encoding = 'uhc';

> Even the SELECT statement displays something different. I am not able to
> understand why?
> 
> korean=# SELECT * FROM tbl;
>   doc
> 
>  �׷���
> (1 row)

This is because the same reason above.

> Can someone please help me.
> 
> Thanks you,
> 
> Beena Emerson
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp

-- 
Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql


Re: [SQL] [NOVICE] Understanding Encoding

2013-09-06 Thread Amit Langote
On Fri, Sep 6, 2013 at 3:47 PM, Beena Emerson  wrote:
>
>>
>> I wonder if you have tried changing your "locale" to ko_KR; something
>> like:
>>
>> LANG=ko_KR LC_ALL=ko_KR \
>> psql -d korean
>>
>
> Hi,
>
> It still gives same result:
>
> $ LANG=ko_KR LC_ALL=ko_KR
> $ psql -d korean
>
> korean=# SHOW client_encoding;
>  client_encoding
> -
>  EUC_KR
> (1 row)
>
> korean=# INSERT INTO tbl VALUES ('그레스');
> ERROR:  invalid byte sequence for encoding "EUC_KR": 0xa0 0x88


I changed the encoding of the terminal emulator (GNOME Terminal
2.31.3) using the Terminal menu as:

Terminal -> Set Character Encoding -> Korean (EUC-KR)

Note that, if the menu only lists UTF-8, you'd have to add EUC-KR
using "Add or Remove".

And it seems to work; could you try the same?

--
Amit Langote


-- 
Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql


Re: [SQL] [NOVICE] Understanding Encoding

2013-09-06 Thread Beena Emerson
On Fri, Sep 6, 2013 at 12:29 PM, Tom Lane  wrote:

> Beena Emerson  writes:
> > It still gives same result:
>
> > $ LANG=ko_KR LC_ALL=ko_KR
> > $ psql -d korean
>
> > korean=# SHOW client_encoding;
> >  client_encoding
> > -
> >  EUC_KR
> > (1 row)
>
> > korean=# INSERT INTO tbl VALUES ('그레스');
> > ERROR:  invalid byte sequence for encoding "EUC_KR": 0xa0 0x88
>
> What you need to figure out is what encoding the text you are typing
> is in.  You're telling psql it's EUC_KR but it evidently isn't.
> If you're typing these characters manually then it's probably determined
> by a setting of the terminal-emulator program you're using.  But if
> you're copying-and-pasting then things get more complicated.
>
> Also, what you did above is not what Amit suggested: he wanted you to put
> the variable assignments on the same command line as the psql invocation,
> so that they'd affect the environment passed to psql.  I'm suspicious of
> his solution because I'd have thought the terminal program would set up
> the right environment ... but you might as well try it.
>

I tried with both the assignment and invocation in same line. Again it gave
the same result.
Maybe the problem is with copy paste. I will look into it.
Thank you.


Re: [SQL] Understanding Encoding

2013-09-06 Thread Sebastien FLAESCH

Hi,

Tip:

To identify what encoding you enter in the psql command interpreter:

1) Open a file with vim
2) Type in you SQL or copy/paste
3) Save the file and quit vim
4) $ file 

Should give you the encoding of that text file.

For ex:

sf@orca:~$ echo $LC_ALL
en_US.UTF-8

sf@orca:~$ cat /tmp/xx
abcdefé

sf@orca:~$ file /tmp/xx
/tmp/xx: UTF-8 Unicode text


Seb


On 09/06/2013 09:03 AM, Tatsuo Ishii wrote:

Hello All,

I am not able to understand how the encoding is handled. I would be happy
if someone can tell what is happening in the following scenario:

1. I have created a database with EUC_KR encoding and created a table and
inserted some korean value into it.

=# CREATE DATABASE korean WITH ENCODING 'EUC_KR' LC_COLLATE='ko_KR.euckr'
LC_CTYPE='ko_KR.euckr' TEMPLATE=template0;

=# \c korean

korean=# SHOW client_encoding;
  client_encoding
-
  UTF8
(1 row)

korean=# CREATE TABLE tbl (doc text);

korean=# INSERT INTO tbl VALUES ('그레스');


2. If I insert non-korean values it throws error:

korean=# INSERT INTO tbl VALUES ('データベース');
ERROR:  character with byte sequence 0xe3 0x83 0xbc in encoding "UTF8" has
no equivalent in encoding "EUC_KR"


The error messages says all. PostgreSQL accepted 'データベース'
encoded in UTF-8 then tried to convert to EUC_KR but failed, because
EUC_KR does not accept languages other than Korean (and ASCII). What
else did you expect?


korean=# SELECT * FROM tbl;
   doc

  그레스
(1 row)


3. I change the client encoding to EUC_KR and try inserting the same korean
characters and it throws an error:

korean=# SET client_encoding = 'EUC_KR';
SET
korean=# INSERT INTO tbl VALUES ('그레스');
ERROR:  invalid byte sequence for encoding "EUC_KR": 0xa0 0x88


0xa0 is definitely not part of EUC_KR. That's why PostgreSQL throws an
error. I gues you are using UHC (Unified Hangul Code), rather than
EUC_KR. They are different encodings. You should do either:

1) Make sure that your termical encoding is EUC_KR.

2) set client_encoding = 'uhc';


Even the SELECT statement displays something different. I am not able to
understand why?

korean=# SELECT * FROM tbl;
   doc

  �׷���
(1 row)


This is because the same reason above.


Can someone please help me.

Thanks you,

Beena Emerson

--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp





--
Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql


Re: [SQL] Understanding Encoding

2013-09-06 Thread Beena Emerson
Hello,

Thank you all.

Amit, Changing the encoding of the terminal emulator worked.

Sebastiean, the tip was helpful.

--

Beena Emerson