Re: [HACKERS] UPPER()/LOWER() and UTF-8

Hannu Krosing Sun, 09 Nov 2003 15:03:11 -0800

Alexey Mahotkin kirjutas K, 05.11.2003 kell 17:11:
> Aha, that's in src/backend/utils/adt/formatting.c, right?
> 
> Yes, I see, it goes byte by byte and uses toupper().  I believe we
> could look at the locale, and if it is UTF-8, then use (or copy)
> e.g. g_utf8_strup/strdown, right?
> 
>      
> http://developer.gnome.org/doc/API/2.0/glib/glib-Unicode-Manipulation.html#g-utf8-strup
> 
> I belive that patch could be written in a matter of hours.
> 
> 
>     TL> There has been some discussion of using <wctype.h> where
>     TL> available, but this has a number of issues, notably figuring
>     TL> out the correct mapping from the server string encoding (eg
>     TL> UTF-8) to unpacked wide characters.  At minimum we'd need to
>     TL> know which charset the locale setting is expecting, and there
>     TL> doesn't seem to be a portable way to find that out.
> 
>     TL> IIRC, Peter thinks we must abandon use of libc's locale
>     TL> functionality altogether and write our own locale layer before
>     TL> we can really have all the locale-specific functionality we
>     TL> want.
> 
> I believe that native Unicode strings (together with human language
> handling) should be introduced as (almost) separate data type (which
> have nothing to do with locale), but that's bluesky maybe.


They should have nothing to do with _system_ locale, but you can
neither  UPPER()/LOWER() nor ORDER BY unless you know the locale. It is
just that the locale should either be property of column or given in the
SQL statement.

I guess one could write UCHAR, UVARCHAR, UTEXT types based on ICU.

-------------
Hannu


---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

               http://archives.postgresql.org

Re: [HACKERS] UPPER()/LOWER() and UTF-8

Reply via email to