Re: [HACKERS] Locale implementation questions

2006-06-14 Thread Bruce Momjian

Thead added to TODO.detail.

---

Tatsuo Ishii wrote:
> > 3. Compiled locale files are large. One UTF-8 locale datafile can
> > exceed a megabyte. Do we want the option of disabling it for small
> > systems?
> 
> To avoid the problem, you could dynmically load the compiled
> tables. The charset conversion tables are handled similar way.
> 
> Also I think it's important to allow user defined collate data. To
> implement the CREATE COLLATE syntax, we need to have that capability
> anyway.
> 
> > 4. Do we want the option of running system locale in parallel with the
> > internal ones?
> > 
> > 5. I think we're going to have to deal with the very real possibility
> > that our locale database will not be as good as some of the system
> > provided ones. The question is how. This is quite unlike timezones
> > which are quite standardized and rarely change. That database is quite
> > well maintained.
> > 
> > Would people object to a configure option that selected:
> >   --with-locales=internal (use pg database)
> >   --with-locales=system   (use system database for win32, glibc or 
> > MacOS X)
> >   --with-locales=none (what we support now, which is neither)
> > 
> > I don't think it will be much of an issue to support this, all the
> > functions take the same parameters and have almost the same names.
> 
> To be honest, I don't understand why we have to rely on (often broken)
> system locales. I don't think building our own locale data is too
> hard, and once we make up it, the maintenace cost will be very small
> since it should not be changed regularly. Moreover we could enjoy the
> benefit that PostgreSQL handles collations in a corret manner on any
> platform which PostgreSQL supports.
> 
> > 6. Locales for SQL_ASCII. Seems to me you have two options, either
> > reject COLLATE altogether unless they specify a charset, or don't care
> > and let the user shoot themselves in the foot if they wish...
> > 
> > BTW, this MacOS locale supports seems to be new for 10.4.2 according to
> > the CVS log info, can anyone confirm this?
> > 
> > Anyway, I hope this post didn't bore too much. Locale support has been
> > one of those things that has bugged me for a long time and it would be
> > nice if there could be some real movement.
> 
> Right. We Japanese (and probably Chinese too) have been bugged by the
> broken mutibyte locales for long time. Using C locale help us to a
> certain extent, but for Unicode we need correct locale data, othewise
> the sorted data will be completely chaos.
> --
> SRA OSS, Inc. Japan
> Tatsuo Ishii
> 
> ---(end of broadcast)---
> TIP 1: if posting/reading through Usenet, please send an appropriate
>subscribe-nomail command to [EMAIL PROTECTED] so that your
>message can get through to the mailing list cleanly
> 

-- 
  Bruce Momjian   http://candle.pha.pa.us
  EnterpriseDBhttp://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] Locale implementation questions

2005-09-04 Thread Tom Lane
Greg Stark <[EMAIL PROTECTED]> writes:
> I think it's sheer madness to try to reproduce large swaths of the OS
> inside Postgres because you're unhappy with the quality of the OS
> implementation. You should be asking yourself why OS vendors have such
> a hard time getting this stuff right

In the case of the *BSDs, it's pretty obviously because they don't care.

> and why would Postgres do any better

In the first place, we do care, and in the second place, having to deal
with only one set of locale bugs would in itself be a huge advance over
where we are now.

We went over to maintaining our own timezone code for more or less the
same reasons, and in hindsight that was obviously the right decision.
Locale support is a bigger chunk, no doubt about it, but we also have
a lot of motivation.

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] Locale implementation questions

2005-09-04 Thread Greg Stark

Tatsuo Ishii <[EMAIL PROTECTED]> writes:

> To be honest, I don't understand why we have to rely on (often broken)
> system locales. I don't think building our own locale data is too
> hard, and once we make up it, the maintenace cost will be very small
> since it should not be changed regularly. Moreover we could enjoy the
> benefit that PostgreSQL handles collations in a corret manner on any
> platform which PostgreSQL supports.

I think it's sheer madness to try to reproduce large swaths of the OS inside
Postgres because you're unhappy with the quality of the OS implementation. You
should be asking yourself why OS vendors have such a hard time getting this
stuff right and why would Postgres do any better. Wouldn't that work be better
spent improving the database functionality of Postgres?

Or at least better spent improving the locale support for the entire OS? It
would be positively awful if every application on my system had its own locale
database each of which had its own set of bugs and its own feature set.

-- 
greg


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] Locale implementation questions

2005-09-04 Thread Martijn van Oosterhout
On Sun, Sep 04, 2005 at 10:25:36PM +0900, Tatsuo Ishii wrote:
> > 3. Compiled locale files are large. One UTF-8 locale datafile can
> > exceed a megabyte. Do we want the option of disabling it for small
> > systems?
> 
> To avoid the problem, you could dynmically load the compiled
> tables. The charset conversion tables are handled similar way.

That's not the point, ofcourse they are loaded dynamically. The
question is, when do we create the files in the first place. There are
48*15 = 750 combinations which would amount to tens of megabytes of
essentially useless data. *When* you create the files is an important
question. Compile time is out.

Charset conversion is completely different, there just arn't that many
combinations.

> Also I think it's important to allow user defined collate data. To
> implement the CREATE COLLATE syntax, we need to have that capability
> anyway.

Most OS's allow you to create collate data yourself anyway, why do we
need to implement this too?

> To be honest, I don't understand why we have to rely on (often broken)
> system locales. I don't think building our own locale data is too
> hard, and once we make up it, the maintenace cost will be very small
> since it should not be changed regularly. Moreover we could enjoy the
> benefit that PostgreSQL handles collations in a corret manner on any
> platform which PostgreSQL supports.

You say building our own locale data is not hard. I disagree, it's a
waste of time we can do without. Unless you know the language yourself
you cannot check changes made by anybody else. If there's an error in
locale ordering, take it up with your OS distributor.

I also think we open ourselves to questions like:

1. My locale is supported by the system but not by PostgreSQL, why?
2. My locale was supported last release but not this one, why?
3. Why does PostgreSQL sort differently from 'sort' or any other app on
my system?

> Right. We Japanese (and probably Chinese too) have been bugged by the
> broken mutibyte locales for long time. Using C locale help us to a
> certain extent, but for Unicode we need correct locale data, othewise
> the sorted data will be completely chaos.

Ok, is glibc still wrong or are they just implementing the unicode
standard and that's what's wrong.

All I'm saying is that we need to allow use of system locales until our
native locale support is mature. In the end something like ICU
(http://icu.sourceforge.net/) will end up obsoleting us. Nobody (in
free-software anyway) uses it yet, but eventually it may be viable to
require that to allow system independant locales.
-- 
Martijn van Oosterhout  http://svana.org/kleptog/
> Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
> tool for doing 5% of the work and then sitting around waiting for someone
> else to do the other 95% so you can sue them.


pgp43wJ8YwMLK.pgp
Description: PGP signature


Re: [HACKERS] Locale implementation questions

2005-09-04 Thread Tatsuo Ishii
> 3. Compiled locale files are large. One UTF-8 locale datafile can
> exceed a megabyte. Do we want the option of disabling it for small
> systems?

To avoid the problem, you could dynmically load the compiled
tables. The charset conversion tables are handled similar way.

Also I think it's important to allow user defined collate data. To
implement the CREATE COLLATE syntax, we need to have that capability
anyway.

> 4. Do we want the option of running system locale in parallel with the
> internal ones?
> 
> 5. I think we're going to have to deal with the very real possibility
> that our locale database will not be as good as some of the system
> provided ones. The question is how. This is quite unlike timezones
> which are quite standardized and rarely change. That database is quite
> well maintained.
> 
> Would people object to a configure option that selected:
>   --with-locales=internal (use pg database)
>   --with-locales=system   (use system database for win32, glibc or MacOS 
> X)
>   --with-locales=none (what we support now, which is neither)
> 
> I don't think it will be much of an issue to support this, all the
> functions take the same parameters and have almost the same names.

To be honest, I don't understand why we have to rely on (often broken)
system locales. I don't think building our own locale data is too
hard, and once we make up it, the maintenace cost will be very small
since it should not be changed regularly. Moreover we could enjoy the
benefit that PostgreSQL handles collations in a corret manner on any
platform which PostgreSQL supports.

> 6. Locales for SQL_ASCII. Seems to me you have two options, either
> reject COLLATE altogether unless they specify a charset, or don't care
> and let the user shoot themselves in the foot if they wish...
> 
> BTW, this MacOS locale supports seems to be new for 10.4.2 according to
> the CVS log info, can anyone confirm this?
> 
> Anyway, I hope this post didn't bore too much. Locale support has been
> one of those things that has bugged me for a long time and it would be
> nice if there could be some real movement.

Right. We Japanese (and probably Chinese too) have been bugged by the
broken mutibyte locales for long time. Using C locale help us to a
certain extent, but for Unicode we need correct locale data, othewise
the sorted data will be completely chaos.
--
SRA OSS, Inc. Japan
Tatsuo Ishii

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly