I think the question is how often are you passing data around/storing it
_in_ your application and how often are you processing it.

---------------------------------------------------------------------------

Dennis Gearon wrote:
> I agree with all of that except for one caveat:
> 
>       all my reading, and just general off the cuff thinking, says that processing 
> variable width characters SIGNIFICANTLY slows an application. It seems better to 
> PROCESS fixed width characters (1,2,4 byte), and TRANSMIT variable width characters 
> (avoiding the null problem.)
> 
> Gianni Mariani wrote:
> 
> > Dennis Gearon wrote:
> > 
> >> Got a link to that section of the standard, or better yet, to a 
> >> 'interpreted' version of the standard? :-)
> >>
> >> Stephan Szabo wrote:
> >>
> >>> On Wed, 13 Aug 2003, Dennis Gearon wrote:
> >>>
> >>>
> >>>> Dennis Bj?rklund wrote:
> >>>>
> >>>>
> >>>>> In the future we need indexes that depend on the locale (and a lot 
> >>>>> of other changes).
> >>>>>
> >>>>
> >>>> I agree. I've been looking at the web on this subject a lot lately. I
> >>>> am **NOT** a microslop fan, but SQL-SERVER even lets a user define a
> >>>> language(maybe encoding) down to the column level!
> >>>>
> >>>> I've been reading on GNU-C and on languages, encoding, and 
> >>>> localization.
> >>>>
> >>>> http://pauillac.inria.fr/~lang/hotlist/free/licence/fsf96/drepper/paper-1.html 
> >>>>
> >>>> http://h21007.www2.hp.com/dspp/tech/tech_TechSingleTipDetailPage_IDX/1,2366,1222,00.html
> >>>>  
> >>>>
> >>>>
> >>>>
> >>>> There are three basic approaches to doing different langauges in 
> >>>> computerized text:
> >>>>
> >>>>    A/ various adaptations of the 8 bit character set, I.E. the 
> >>>> ISO-8859-x series.
> >>>>    B/ wide characters
> >>>>    ********This should be how Postgress stores data internally.********
> >>>>    C/ Multibyte characters
> >>>>    ********This is how Postgress should default to sending data OUT 
> >>>> of the application,
> >>>>            i.e. to the display or the web, or other system 
> >>>> applications********
> >>>
> >>>
> >>>
> >>>
> >>> SQL has a system for defining character set specifications, 
> >>> collations and
> >>> such (per column/literal in some cases).  We should probably look at it
> >>> before making decisions on how to do things.
> >>
> >>
> > 
> > I thought UNIX (SCOTM) systems also had a way of being able to define 
> > collation order.
> > 
> > see:
> >    ftp://dkuug.dk/i18n/WG15-collection/locales
> > 
> > for a collection of all ISO standardized locales (the WG15 ISO work 
> > group's stuff).
> > 
> > Do a "man localedef" on most Linuxen or UNIXen.
> > 
> > As for wide characters vs multibyte, there is no clear winner.  The 
> > right answer DEPENDS on the situation.
> > 
> > Wide characters on some platforms are 16 bit which means that when you 
> > do Unicode you'll still have problems with surrogate pairs (meaning that 
> > it's still multi (wide) char) so you still have all the problems of 
> > multi-byte encodings.
> > 
> > You could decide to process everything in a PG specific 4 byte wide char 
> > and do all text in Unicode but the overhead in processing 4 times the 
> > data is quite significant.  The other option is to store all data in 
> > utf-8 and have all text code become utf-8 aware.
> > 
> > I have found in practice that the utf-8 option is significantly easier 
> > to implement, 100% Unicode compliant and the best performer (because of 
> > reduced memory requirements).
> > The Posix API's for locales are not very good for modern day programs, 
> > I'm not sure where the "mbr*" and the "wcr*" apis are in the 
> > standardization process but if these are not well supported, you're on 
> > your own and will need to implement similar functionality from scratch 
> > and for that matter, the collation functions all operate on a "current" 
> > locate which is really difficult to work with on multi-locale applications.
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > ---------------------------(end of broadcast)---------------------------
> > TIP 8: explain analyze is your friend
> > 
> 
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 7: don't forget to increase your free space map settings
> 

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  [EMAIL PROTECTED]               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

               http://archives.postgresql.org

Reply via email to