[HACKERS] questionable item in HISTORY

2005-09-24 Thread Tatsuo Ishii
Following item in HISTORY:

 * Add support for 3 and 4-byte UTF8 characters (John Hansen)
   Previously only one and two-byte UTF8 characters were supported.
   This is particularly important for support for some Chinese
   characters.

is wrong since 3-byte UTF-8 characters are supported since UTF-8
support has been added to PostgreSQL. Correct description would be:

 * Add support for 4-byte UTF8 characters (John Hansen)
   Previously only up to three-byte UTF8 characters were supported.
   This is particularly important for support for some Chinese
   characters.

In the mean time I wonder if we need to update UTF-8 -- locale
encoding maps. The author of the patches stated that This is
particularly important for support for some Chinese characters. I
have no idea what encoding he is reffering to, but I wonder if the
latest Chinense encoding standard GB18030 needs 4-byte UTF-8 mappings.
If yes, we surely need to update utf8_to_gb18030.map.

Anybody familiar with GB18030/UTF-8?
--
SRA OSS, Inc. Japan
Tatsuo Ishii

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] questionable item in HISTORY

2005-09-24 Thread Bruce Momjian
Tatsuo Ishii wrote:
 Following item in HISTORY:
 
  * Add support for 3 and 4-byte UTF8 characters (John Hansen)
Previously only one and two-byte UTF8 characters were supported.
This is particularly important for support for some Chinese
characters.
 
 is wrong since 3-byte UTF-8 characters are supported since UTF-8
 support has been added to PostgreSQL. Correct description would be:
 
  * Add support for 4-byte UTF8 characters (John Hansen)
Previously only up to three-byte UTF8 characters were supported.
This is particularly important for support for some Chinese
characters.

Release notes updated.

 
 In the mean time I wonder if we need to update UTF-8 -- locale
 encoding maps. The author of the patches stated that This is
 particularly important for support for some Chinese characters. I
 have no idea what encoding he is reffering to, but I wonder if the
 latest Chinense encoding standard GB18030 needs 4-byte UTF-8 mappings.
 If yes, we surely need to update utf8_to_gb18030.map.
 
 Anybody familiar with GB18030/UTF-8?

Good question.  The report we got in the past was that some UTF
characters were being rejected even though they were valid UTF
characters, mostly Chinese.  I have no idea how they map to GB*
character sets.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  pgman@candle.pha.pa.us   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 6: explain analyze is your friend