Re: [HACKERS] Patch for collation using ICU

John Hansen Sat, 07 May 2005 07:21:40 -0700

Bruce Momjian wrote:
> Palle Girgensohn wrote:
> > 
> > --On l?rdag, maj 07, 2005 23.15.29 +1000 John Hansen 
> > <[EMAIL PROTECTED]>
> > wrote:
> > 
> > > Btw, I had been planning to propose replacing every single one of 
> > > the built in charset conversion functions with calls to ICU (thus 
> > > making pg _depend_ on ICU), as this would seem like a cleaner 
> > > solution than for us to maintain our own conversion tables.
> > >
> > > ICU also has a fair few conversions that we do not have 
> at present.
> 
> That is a much larger issue, similar to our shipping our own 
> timezone database.  What does it buy us?
>       
>       o  Do we ship it in our tarball?
>       o  Is the license compatible?
>       o  Does it remove utils/mb conversions?
>       o  Does it allow us to index LIKE (next high char)?
>       o  Does it allow us to support multiple encodings in
>          a single database easier?
>       o  performance?
> 
> > I just had a similar though. And why use ICU only for 
> multibyte charsets? 
> > If I use LATIN1, I still expect upper('?') => SS, and I 
> don't get it... 
> > Same for the Turkish example.
> 
> We assume the native toupper() can handle single-byte 
> character encodings.  We use towupper() only for wide character sets.


That assumption is wrong,...

Encoding latin1
Locale <> de*

Select Upper('�'); (lowercase german SS)
Should return SS, but returns �

... John



---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
    (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])

Re: [HACKERS] Patch for collation using ICU

Reply via email to