I think the question is how often are you passing data around/storing it _in_ your application and how often are you processing it.
--------------------------------------------------------------------------- Dennis Gearon wrote: > I agree with all of that except for one caveat: > > all my reading, and just general off the cuff thinking, says that processing > variable width characters SIGNIFICANTLY slows an application. It seems better to > PROCESS fixed width characters (1,2,4 byte), and TRANSMIT variable width characters > (avoiding the null problem.) > > Gianni Mariani wrote: > > > Dennis Gearon wrote: > > > >> Got a link to that section of the standard, or better yet, to a > >> 'interpreted' version of the standard? :-) > >> > >> Stephan Szabo wrote: > >> > >>> On Wed, 13 Aug 2003, Dennis Gearon wrote: > >>> > >>> > >>>> Dennis Bj?rklund wrote: > >>>> > >>>> > >>>>> In the future we need indexes that depend on the locale (and a lot > >>>>> of other changes). > >>>>> > >>>> > >>>> I agree. I've been looking at the web on this subject a lot lately. I > >>>> am **NOT** a microslop fan, but SQL-SERVER even lets a user define a > >>>> language(maybe encoding) down to the column level! > >>>> > >>>> I've been reading on GNU-C and on languages, encoding, and > >>>> localization. > >>>> > >>>> http://pauillac.inria.fr/~lang/hotlist/free/licence/fsf96/drepper/paper-1.html > >>>> > >>>> http://h21007.www2.hp.com/dspp/tech/tech_TechSingleTipDetailPage_IDX/1,2366,1222,00.html > >>>> > >>>> > >>>> > >>>> > >>>> There are three basic approaches to doing different langauges in > >>>> computerized text: > >>>> > >>>> A/ various adaptations of the 8 bit character set, I.E. the > >>>> ISO-8859-x series. > >>>> B/ wide characters > >>>> ********This should be how Postgress stores data internally.******** > >>>> C/ Multibyte characters > >>>> ********This is how Postgress should default to sending data OUT > >>>> of the application, > >>>> i.e. to the display or the web, or other system > >>>> applications******** > >>> > >>> > >>> > >>> > >>> SQL has a system for defining character set specifications, > >>> collations and > >>> such (per column/literal in some cases). We should probably look at it > >>> before making decisions on how to do things. > >> > >> > > > > I thought UNIX (SCOTM) systems also had a way of being able to define > > collation order. > > > > see: > > ftp://dkuug.dk/i18n/WG15-collection/locales > > > > for a collection of all ISO standardized locales (the WG15 ISO work > > group's stuff). > > > > Do a "man localedef" on most Linuxen or UNIXen. > > > > As for wide characters vs multibyte, there is no clear winner. The > > right answer DEPENDS on the situation. > > > > Wide characters on some platforms are 16 bit which means that when you > > do Unicode you'll still have problems with surrogate pairs (meaning that > > it's still multi (wide) char) so you still have all the problems of > > multi-byte encodings. > > > > You could decide to process everything in a PG specific 4 byte wide char > > and do all text in Unicode but the overhead in processing 4 times the > > data is quite significant. The other option is to store all data in > > utf-8 and have all text code become utf-8 aware. > > > > I have found in practice that the utf-8 option is significantly easier > > to implement, 100% Unicode compliant and the best performer (because of > > reduced memory requirements). > > The Posix API's for locales are not very good for modern day programs, > > I'm not sure where the "mbr*" and the "wcr*" apis are in the > > standardization process but if these are not well supported, you're on > > your own and will need to implement similar functionality from scratch > > and for that matter, the collation functions all operate on a "current" > > locate which is really difficult to work with on multi-locale applications. > > > > > > > > > > > > > > > > ---------------------------(end of broadcast)--------------------------- > > TIP 8: explain analyze is your friend > > > > > ---------------------------(end of broadcast)--------------------------- > TIP 7: don't forget to increase your free space map settings > -- Bruce Momjian | http://candle.pha.pa.us [EMAIL PROTECTED] | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073 ---------------------------(end of broadcast)--------------------------- TIP 6: Have you searched our list archives? http://archives.postgresql.org