> -----Original Message-----
> From: Tatsuo Ishii [mailto:[EMAIL PROTECTED] 
> Sent: Sunday, May 08, 2005 11:08 PM
> To: John Hansen
> Cc: pgman@candle.pha.pa.us; [EMAIL PROTECTED]; 
> pgsql-hackers@postgresql.org
> Subject: Re: [HACKERS] Patch for collation using ICU
> 
> > > I don't buy it. If current conversion tables does the 
> right thing, 
> > > why we need to replace. Or if conversion tables are not 
> correct, why 
> > > don't you fix it? I think the rule of character 
> conversion will not 
> > > change frequently, especially for LATIN languages. Thus 
> maintaining 
> > > cost is not too high.
> > 
> > I never said we need to, but if we're going to implement 
> ICU, then we 
> > might as well go all the way.
> 
> So you admit there's no benefit using ICU for replacing 
> existing conversions?
> 
> Besides ICU does not support all existing conversions, I 
> think ICU has serious flaw for using conversion. If I 
> understand correctly, ICU uses UNICODE internally to do the 
> conversion. For example, to implement
> SJIS->EUC_JP conversion, ICU first converts SJIS to UNICODE then
> converts UNICODE to EUC_JP. Problem is these conversion is 
> not roud trip(conversion between SJIS/EUC_JP and UNICODE will 
> lose some information). Thus SJIS->EUC_JP->SJIS conversion 
> using ICU does not preserve original text.

Just for the record, I fetched a web page encoded in sjis, and converted
it to euc-jp and back using uconv from ICU 3.2, and the result is the
original is identical to the transformed file.

 uconv -f Shift_JIS -t EUC-JP -o index.html.euc index.html
 uconv -f EUC-JP -t Shift_JIS -o index.html.sjis index.html.euc
 diff index.html index.html.sjis

... John

---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if your
      joining column's datatypes do not match

Reply via email to