Re: [HACKERS] Locale implementation questions
Thead added to TODO.detail. --- Tatsuo Ishii wrote: > > 3. Compiled locale files are large. One UTF-8 locale datafile can > > exceed a megabyte. Do we want the option of disabling it for small > > systems? > > To avoid the problem, you could dynmically load the compiled > tables. The charset conversion tables are handled similar way. > > Also I think it's important to allow user defined collate data. To > implement the CREATE COLLATE syntax, we need to have that capability > anyway. > > > 4. Do we want the option of running system locale in parallel with the > > internal ones? > > > > 5. I think we're going to have to deal with the very real possibility > > that our locale database will not be as good as some of the system > > provided ones. The question is how. This is quite unlike timezones > > which are quite standardized and rarely change. That database is quite > > well maintained. > > > > Would people object to a configure option that selected: > > --with-locales=internal (use pg database) > > --with-locales=system (use system database for win32, glibc or > > MacOS X) > > --with-locales=none (what we support now, which is neither) > > > > I don't think it will be much of an issue to support this, all the > > functions take the same parameters and have almost the same names. > > To be honest, I don't understand why we have to rely on (often broken) > system locales. I don't think building our own locale data is too > hard, and once we make up it, the maintenace cost will be very small > since it should not be changed regularly. Moreover we could enjoy the > benefit that PostgreSQL handles collations in a corret manner on any > platform which PostgreSQL supports. > > > 6. Locales for SQL_ASCII. Seems to me you have two options, either > > reject COLLATE altogether unless they specify a charset, or don't care > > and let the user shoot themselves in the foot if they wish... > > > > BTW, this MacOS locale supports seems to be new for 10.4.2 according to > > the CVS log info, can anyone confirm this? > > > > Anyway, I hope this post didn't bore too much. Locale support has been > > one of those things that has bugged me for a long time and it would be > > nice if there could be some real movement. > > Right. We Japanese (and probably Chinese too) have been bugged by the > broken mutibyte locales for long time. Using C locale help us to a > certain extent, but for Unicode we need correct locale data, othewise > the sorted data will be completely chaos. > -- > SRA OSS, Inc. Japan > Tatsuo Ishii > > ---(end of broadcast)--- > TIP 1: if posting/reading through Usenet, please send an appropriate >subscribe-nomail command to [EMAIL PROTECTED] so that your >message can get through to the mailing list cleanly > -- Bruce Momjian http://candle.pha.pa.us EnterpriseDBhttp://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. + ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] Locale implementation questions
Greg Stark <[EMAIL PROTECTED]> writes: > I think it's sheer madness to try to reproduce large swaths of the OS > inside Postgres because you're unhappy with the quality of the OS > implementation. You should be asking yourself why OS vendors have such > a hard time getting this stuff right In the case of the *BSDs, it's pretty obviously because they don't care. > and why would Postgres do any better In the first place, we do care, and in the second place, having to deal with only one set of locale bugs would in itself be a huge advance over where we are now. We went over to maintaining our own timezone code for more or less the same reasons, and in hindsight that was obviously the right decision. Locale support is a bigger chunk, no doubt about it, but we also have a lot of motivation. regards, tom lane ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] Locale implementation questions
Tatsuo Ishii <[EMAIL PROTECTED]> writes: > To be honest, I don't understand why we have to rely on (often broken) > system locales. I don't think building our own locale data is too > hard, and once we make up it, the maintenace cost will be very small > since it should not be changed regularly. Moreover we could enjoy the > benefit that PostgreSQL handles collations in a corret manner on any > platform which PostgreSQL supports. I think it's sheer madness to try to reproduce large swaths of the OS inside Postgres because you're unhappy with the quality of the OS implementation. You should be asking yourself why OS vendors have such a hard time getting this stuff right and why would Postgres do any better. Wouldn't that work be better spent improving the database functionality of Postgres? Or at least better spent improving the locale support for the entire OS? It would be positively awful if every application on my system had its own locale database each of which had its own set of bugs and its own feature set. -- greg ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] Locale implementation questions
On Sun, Sep 04, 2005 at 10:25:36PM +0900, Tatsuo Ishii wrote: > > 3. Compiled locale files are large. One UTF-8 locale datafile can > > exceed a megabyte. Do we want the option of disabling it for small > > systems? > > To avoid the problem, you could dynmically load the compiled > tables. The charset conversion tables are handled similar way. That's not the point, ofcourse they are loaded dynamically. The question is, when do we create the files in the first place. There are 48*15 = 750 combinations which would amount to tens of megabytes of essentially useless data. *When* you create the files is an important question. Compile time is out. Charset conversion is completely different, there just arn't that many combinations. > Also I think it's important to allow user defined collate data. To > implement the CREATE COLLATE syntax, we need to have that capability > anyway. Most OS's allow you to create collate data yourself anyway, why do we need to implement this too? > To be honest, I don't understand why we have to rely on (often broken) > system locales. I don't think building our own locale data is too > hard, and once we make up it, the maintenace cost will be very small > since it should not be changed regularly. Moreover we could enjoy the > benefit that PostgreSQL handles collations in a corret manner on any > platform which PostgreSQL supports. You say building our own locale data is not hard. I disagree, it's a waste of time we can do without. Unless you know the language yourself you cannot check changes made by anybody else. If there's an error in locale ordering, take it up with your OS distributor. I also think we open ourselves to questions like: 1. My locale is supported by the system but not by PostgreSQL, why? 2. My locale was supported last release but not this one, why? 3. Why does PostgreSQL sort differently from 'sort' or any other app on my system? > Right. We Japanese (and probably Chinese too) have been bugged by the > broken mutibyte locales for long time. Using C locale help us to a > certain extent, but for Unicode we need correct locale data, othewise > the sorted data will be completely chaos. Ok, is glibc still wrong or are they just implementing the unicode standard and that's what's wrong. All I'm saying is that we need to allow use of system locales until our native locale support is mature. In the end something like ICU (http://icu.sourceforge.net/) will end up obsoleting us. Nobody (in free-software anyway) uses it yet, but eventually it may be viable to require that to allow system independant locales. -- Martijn van Oosterhout http://svana.org/kleptog/ > Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a > tool for doing 5% of the work and then sitting around waiting for someone > else to do the other 95% so you can sue them. pgp43wJ8YwMLK.pgp Description: PGP signature
Re: [HACKERS] Locale implementation questions
> 3. Compiled locale files are large. One UTF-8 locale datafile can > exceed a megabyte. Do we want the option of disabling it for small > systems? To avoid the problem, you could dynmically load the compiled tables. The charset conversion tables are handled similar way. Also I think it's important to allow user defined collate data. To implement the CREATE COLLATE syntax, we need to have that capability anyway. > 4. Do we want the option of running system locale in parallel with the > internal ones? > > 5. I think we're going to have to deal with the very real possibility > that our locale database will not be as good as some of the system > provided ones. The question is how. This is quite unlike timezones > which are quite standardized and rarely change. That database is quite > well maintained. > > Would people object to a configure option that selected: > --with-locales=internal (use pg database) > --with-locales=system (use system database for win32, glibc or MacOS > X) > --with-locales=none (what we support now, which is neither) > > I don't think it will be much of an issue to support this, all the > functions take the same parameters and have almost the same names. To be honest, I don't understand why we have to rely on (often broken) system locales. I don't think building our own locale data is too hard, and once we make up it, the maintenace cost will be very small since it should not be changed regularly. Moreover we could enjoy the benefit that PostgreSQL handles collations in a corret manner on any platform which PostgreSQL supports. > 6. Locales for SQL_ASCII. Seems to me you have two options, either > reject COLLATE altogether unless they specify a charset, or don't care > and let the user shoot themselves in the foot if they wish... > > BTW, this MacOS locale supports seems to be new for 10.4.2 according to > the CVS log info, can anyone confirm this? > > Anyway, I hope this post didn't bore too much. Locale support has been > one of those things that has bugged me for a long time and it would be > nice if there could be some real movement. Right. We Japanese (and probably Chinese too) have been bugged by the broken mutibyte locales for long time. Using C locale help us to a certain extent, but for Unicode we need correct locale data, othewise the sorted data will be completely chaos. -- SRA OSS, Inc. Japan Tatsuo Ishii ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly