> -----Original Message----- > From: Tatsuo Ishii [mailto:[EMAIL PROTECTED] > Sent: Sunday, May 08, 2005 11:08 PM > To: John Hansen > Cc: pgman@candle.pha.pa.us; [EMAIL PROTECTED]; > pgsql-hackers@postgresql.org > Subject: Re: [HACKERS] Patch for collation using ICU > > > > I don't buy it. If current conversion tables does the > right thing, > > > why we need to replace. Or if conversion tables are not > correct, why > > > don't you fix it? I think the rule of character > conversion will not > > > change frequently, especially for LATIN languages. Thus > maintaining > > > cost is not too high. > > > > I never said we need to, but if we're going to implement > ICU, then we > > might as well go all the way. > > So you admit there's no benefit using ICU for replacing > existing conversions? > > Besides ICU does not support all existing conversions, I > think ICU has serious flaw for using conversion. If I > understand correctly, ICU uses UNICODE internally to do the > conversion. For example, to implement > SJIS->EUC_JP conversion, ICU first converts SJIS to UNICODE then > converts UNICODE to EUC_JP. Problem is these conversion is > not roud trip(conversion between SJIS/EUC_JP and UNICODE will > lose some information). Thus SJIS->EUC_JP->SJIS conversion > using ICU does not preserve original text.
Just for the record, I fetched a web page encoded in sjis, and converted it to euc-jp and back using uconv from ICU 3.2, and the result is the original is identical to the transformed file. uconv -f Shift_JIS -t EUC-JP -o index.html.euc index.html uconv -f EUC-JP -t Shift_JIS -o index.html.sjis index.html.euc diff index.html index.html.sjis ... John ---------------------------(end of broadcast)--------------------------- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match