Re: [HACKERS] A rough roadmap for internationalization fixes
OK, I've been spreading rumours about fixing the internationalization problems, so let me make it a bit more clear. Here are the problems that need to be fixed: - Only one locale per process possible. - Only one gettext-language per process possible. - lc_collate and lc_ctype need to be held fixed in the entire cluster. - Gettext relies on iconv character set conversion, which relies on lc_ctype, which leads to a complete screw-up in the server because of the previous item. - Locale fixed per cluster, but encoding fixed per database, unware of each other, don't get along. - No support for upper/lower with multibyte encoding. - Implementation of Unicode horribly incomplete. These are all dependent on each other and sort of flow into each other. Here is a proposed ordering of steps toward improving the situation: 1. Take out the character set conversion routines from the backend and make them a library of their own. This could possibly be modelled after iconv, but not necessarily. Or we might conclude that we can just use iconv in the first place. How do you handle user-defined conversions? 2. Reimplement gettext to use 1. and allow switching of language and encoding at run-time. 3. Implement Unicode collation algorithm and character classification routines that are aware of 1. Use that in place of system locale routines. I don't see a relationship between Unicode and the one you are going to replace the system locale routines. If you are going to the direction for an Unicode central implementation, I will object. 4. Allow choice of locale per database. (This should be fairly easy after 3.) 5. Allow choice of locale per column and implement collation coercion according to SQL standard. This could easily take a long time, but I feel that even if we have to stop after 2., 3., or 4. at feature freeze, we'd be a lot farther. Comments? Anything else that needs fixing? -- Peter Eisentraut [EMAIL PROTECTED] ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [HACKERS] A rough roadmap for internationalization fixes
On Mon, 24 Nov 2003, Peter Eisentraut wrote: 1. Take out the character set conversion routines from the backend and make them a library of their own. This could possibly be modelled after iconv, but not necessarily. Or we might conclude that we can just use iconv in the first place. 2. Reimplement gettext to use 1. and allow switching of language and encoding at run-time. Force all translations to be in unicode and convert to other client encodings if needed. There is no need to support translations stored using different encodings. 3. Implement Unicode collation algorithm and character classification routines that are aware of 1. Use that in place of system locale routines. Couldn't we use some library that already have this, like glib (or something else). If it's not up to what we need, than fix that library instead. -- /Dennis ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] A rough roadmap for internationalization fixes
Tatsuo Ishii writes: 3. Implement Unicode collation algorithm and character classification routines that are aware of 1. Use that in place of system locale routines. I don't see a relationship between Unicode and the one you are going to replace the system locale routines. If you are going to the direction for an Unicode central implementation, I will object. The Unicode collation algorithm works for any character set, not only for Unicode. It just happens to be published by the Unicode consortium. So basically this is just a concrete alternative to making up our own out of thin air. Also, the Unicode collation algorithm gives us the flexibility to define customizations of collations that users frequently want, such as ignoring or not ignoring punctuation. Actually, what will more likely happen is that we'll define a collation as a collection of one or more support functions, the equivalents of strxfrm() and possibly a few more. Then it will be up to those functions to define the collation order. The server will provide utility functions that will facilitate implementing a collation order that follows the Unicode collation algorithm, but you could just as well implement one using memcmp() or whatever you like. -- Peter Eisentraut [EMAIL PROTECTED] ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] A rough roadmap for internationalization fixes
Dennis Bjorklund writes: Force all translations to be in unicode and convert to other client encodings if needed. There is no need to support translations stored using different encodings. Tell that to the Japanese. Couldn't we use some library that already have this, like glib (or something else). If it's not up to what we need, than fix that library instead. I wasn't aware that glib had this. I'll look. -- Peter Eisentraut [EMAIL PROTECTED] ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [HACKERS] A rough roadmap for internationalization fixes
Have you looked at what is available from http://oss.software.ibm.com/icu/ ? Seems they have a compatible license, but use some C++. Andreas ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] A rough roadmap for internationalization fixes
On Tue, 25 Nov 2003, Peter Eisentraut wrote: Force all translations to be in unicode and convert to other client encodings if needed. There is no need to support translations stored using different encodings. Tell that to the Japanese. I've always thought unicode was enough to even represent Japanese. Then the client encoding can be something else that we can convert to. In any way, the encoding of the message catalog has to be known to the system so it can be converted to the correct encoding for the client. Couldn't we use some library that already have this, like glib (or something else). If it's not up to what we need, than fix that library instead. I wasn't aware that glib had this. I'll look. And I don't really know what demands pg has, but glib has a lot of support functions for utf-8. At least we should take a look at it. -- /Dennis ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] A rough roadmap for internationalization fixes
On Tue, 25 Nov 2003, Peter Eisentraut wrote: Force all translations to be in unicode and convert to other client encodings if needed. There is no need to support translations stored using different encodings. Tell that to the Japanese. I've always thought unicode was enough to even represent Japanese. Then the client encoding can be something else that we can convert to. In any way, the encoding of the message catalog has to be known to the system so it can be converted to the correct encoding for the client. I'm tired of telling that Unicode is not that perfect. Another gottcha with Unicode is the UTF-8 encoding (currently we use) consumes 3 bytes for each Kanji character, while other encodings consume only 2 bytes. IMO 3/2 storage ratio could not be neglected for database use. -- Tatsuo Ishii ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [HACKERS] ObjectWeb/Clustered JDBC
Hans, I don't understand the statement about missing DECLARE CURSOR ? The backend supports it? Dave On Sun, 2003-11-23 at 12:12, Hans-Jürgen Schönig wrote: Peter Eisentraut wrote: I was at the ObjectWeb Conference today; ObjectWeb (http://www.objectweb.org) being a consortium that has amassed quite an impressive array of open-source, Java-based middleware under their umbrella, including for instance our old friend Enhydra. And they regularly kept mentioning PostgreSQL in their presentations. To those that are interested in distributed transactions/two-phase commit, I recommend taking a look at Clustered JDBC (http://c-jdbc.objectweb.org/). While this is not exactly the same thing, it looks to be a pretty neat solution for a similar class of applications. In particular, it provides redundancy, load balancing, caching, and even database independence. It is indeed a nice solution but it is far from ready yet. Especially the disaster recovery mechanism and things such as adding new masters need some more work. What I really miss is DECLARE CURSOR. Maybe it will be in there some day :). However, we have done some real testing with sync replication (4 x pg, 1 x oracle). It performed surprisingly well (the JDBC part, not the Oracle one ;) ). Maybe this will be something really useful within the next few months. Cheers, Hans ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [HACKERS] A rough roadmap for internationalization fixes
On Tue, 25 Nov 2003, Tatsuo Ishii wrote: I'm tired of telling that Unicode is not that perfect. Another gottcha with Unicode is the UTF-8 encoding (currently we use) consumes 3 bytes for each Kanji character, while other encodings consume only 2 bytes. IMO 3/2 storage ratio could not be neglected for database use. I'm aware of how utf-8 works and I was talking about the message cataloges. It does not affect what you store in the database in any way. -- /Dennis ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] ObjectWeb/Clustered JDBC
Hans-Jürgen Schönig writes: Especially the disaster recovery mechanism and things such as adding new masters need some more work. Yes, someone is working on automatic recovery (which would extend to adding new masters by starting recovery from zero). In fact, they're just across town from you (together.at). -- Peter Eisentraut [EMAIL PROTECTED] ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [HACKERS] A rough roadmap for internationalization fixes
On Tue, 25 Nov 2003, Tatsuo Ishii wrote: I'm tired of telling that Unicode is not that perfect. Another gottcha with Unicode is the UTF-8 encoding (currently we use) consumes 3 bytes for each Kanji character, while other encodings consume only 2 bytes. IMO 3/2 storage ratio could not be neglected for database use. The rest of the world seems to select unicode as the way to handle different languages in the UI of programs. For example gnome supports nothing but unicode. How is that handled in your country? I know that you are tired of people who don't understand how difficult it is for you, but I really would like to know. Is gnome not used over there because of this? About storing data in the database, I would expect it to work with any encoding, just like I would expect pg to be able to store images in any format. I'll try to not mention unicode near you in the feature :-) -- /Dennis ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [HACKERS] Build farm
Peter Eisentraut wrote: Bruce Momjian writes: FYI, the HP testdrive farm, http://www.testdrive.hp.com, has shared directories for most of the machines, meaning you can CVS update once and telnet in to compile for each platform. Except that you can't open connections to the outside from these machines. Oh, yea. You can connect to the machines with ftp, so I guess you would have to CVS update on your local machine, then push the changes to the farm. -- Bruce Momjian| http://candle.pha.pa.us [EMAIL PROTECTED] | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup.| Newtown Square, Pennsylvania 19073 ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [HACKERS] Updates for RPMS.
On Monday 24 November 2003 03:34 pm, David Fetter wrote: I just tried building 0.3PGDG on Redhat 9, and got: # rpmbuild --rebuild postgresql-7.4-0.3PGDG.src.rpm [snip] checking krb5.h usability... no checking krb5.h presence... no checking for krb5.h... no configure: error: header file krb5.h is required for Kerberos 5 error: Bad exit status from /var/tmp/rpm-tmp.97860 (%build) [snip] Is there some way to tell the .spec file where to look for krb5.h? Yes. Try rpmbuild --define 'build89 1' --rebuild postgresql-7.4-0.3PGDG.src.rpm I'm looking at automating this; but at the moment it is manual. The spec file does check the build89 macro, and, if defined, throws in the right value for kerberos. -- Lamar Owen Director of Information Technology Pisgah Astronomical Research Institute 1 PARI Drive Rosman, NC 28772 (828)862-5554 www.pari.edu ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [HACKERS] Build farm
Bruce Momjian wrote: FYI, the HP testdrive farm, http://www.testdrive.hp.com, has shared directories for most of the machines, meaning you can CVS update once and telnet in to compile for each platform. As Peter pointed out, these machines are firewalled. But presumably one could upload a snapshot to them. What I had in mind was a more distributed system, though. Of course, these things are not mutually exclusive - using the HP testdrive farm looks like it might be nice. But it would be hard to automate, I suspect. cheers andrew ---(end of broadcast)--- TIP 8: explain analyze is your friend
[HACKERS] fairly serious bug with pg_autovacuum in pg7.4
Hello, I've run across a pretty serious problem with pg_autovacuum. pg_autovacuum looses track of any table that's ever been truncated (possibly other situations too). When i truncate a table it gets a new relfilenode in pg_class. This is a problem because pg_autovacuum assumes pg_class.relfilenode will join to pg_stats_all_tables.relid. pg_stats_all_tables.relid is actallly the oid from pg_class, not the relfilenode. These two values start out equal so pg_autovacuum works initially, but it fails later on because of this incorrect assumption. here is one query pg_autovacuum uses (from pg_autovacuum.h) to get tables that breaks. select a.relfilenode,a.relname,a.relnamespace,a.relpages,a.relisshared,a.reltup les,b.schemaname,b .n_tup_ins,b.n_tup_upd,b.n_tup_del from pg_class a, pg_stat_all_tables b where a.relfilenode=b.relid and a.relkind = 'r' here's a little test case you can use to see what happens: basement=# create table test_table ( id int4 ); CREATE TABLE basement=# select relname, relfilenode from pg_class where relkind = 'r' and relname = 'test_table'; relname | relfilenode +- test_table |28814151 (1 row) basement=# select relid,relname from pg_stat_all_tables where relname = 'test_table'; relid | relname --+ 28814151 | test_table (1 row) basement=# select a.relfilenode,a.relname,a.relnamespace,a.relpages,a.relisshared,a.reltup les,b.schemaname, basement-# b.n_tup_ins,b.n_tup_upd,b.n_tup_del from pg_class a, pg_stat_all_tables b basement-# where a.relfilenode=b.relid and a.relkind = 'r' and a.relname = 'test_table'; relfilenode | relname | relnamespace | relpages | relisshared | reltuples | schemaname | n_tup_ins | n_tup_upd | n_tup_del -++--+--+- +---++---+---+--- 28814151 | test_table | 2200 | 10 | f | 1000 | public | 0 | 0 | 0 (1 row) basement=# basement=# truncate table test_table; TRUNCATE TABLE basement=# select relname, relfilenode from pg_class where relkind = 'r' and relname = 'test_table'; relname | relfilenode +- test_table |28814153 (1 row) basement=# select relid,relname from pg_stat_all_tables where relname = 'test_table'; relid | relname --+ 28814151 | test_table (1 row) basement=# select a.relfilenode,a.relname,a.relnamespace,a.relpages,a.relisshared,a.reltup les,b.schemaname, basement-# b.n_tup_ins,b.n_tup_upd,b.n_tup_del from pg_class a, pg_stat_all_tables b basement-# where a.relfilenode=b.relid and a.relkind = 'r' and a.relname = 'test_table'; relfilenode | relname | relnamespace | relpages | relisshared | reltuples | schemaname | n_tup_ins | n_tup_upd | n_tup_del -+-+--+--+- +---++---+---+--- (0 rows) basement=# drop table test_table; DROP TABLE basement=# PS: i'm running pg-7.4 and pg_autovacuum from contrib. ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
[HACKERS] Providing anonymous mmap as an option of sharing memory
Hello All, I was looking thr. the source and thought it would be worth to seek opinion on this proposal. From what I understood so far, the core shared memory handling is done in pgsql/src/backend/port/sysv_shmem.c. It is linked by configure as per the runtime environment. So I need to write another source code file which exports same APIs as above(i.e. all non static functions in that file) but using mmap and that would do it for using anon mmap instead of sysV shared memory. It might seem unnecessary to provide mmap based shared memory. but this is just one step I was thinking of. In pgsql/src/backend/storage/ipc/shmem.c, all the shared memory allocations are done. I was thinking of creating a structure of all global variables in that file. The global variables would still be in place so that existing code would not break. But the structure would hold database specific buffering information. Let's call that structure database context. That way we can assign different mmaped(anon, of course) regions per database. In the backend, we could just switch the database contexts i.e. assign global variables from the database context and let the backend write to appropriate shared memory region. Every database would need at least two shared memory regions. One for operating on it's own buffers and another for system where it could write to shared catalogs etc. It can close the shared memory region belonging to other databases on startup. Of course, buffer management alone would not cover database contexts altogether. WAL need to be lumped in as well(Not necessarily though. If all WAL buffering go thr. system shared region, everything will still work). I don't know if clog and data file handling is affected by this. If WAL goes in database context, we can probably provide per database WAL which could go well with tablespaces as well. In case of WAL per database, the operations done on a shared catalog from a backend would need flushing system WAL and database WAL to ensure such transaction commit. Otherwise only flushing database WAL would do. This way we can provided a background writer process per database, a common buffer per database minimising impact of cross database load significantly. e.g. vacuum full on one database would not hog another database due to buffer cache pollution. (IO can still saturate though.) This way we can push hardware to limit which might not possible right now in some cases. I was looking for the reason large number of buffers degrades the performance and the source code browsing spiralled in this thought. So far I haven't figured out any reason why large numebr of buffers can degrade the performance. Still looking for it. Comments? Shridhar ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] A rough roadmap for internationalization fixes
Peter Eisentraut [EMAIL PROTECTED] writes: Dennis Bjorklund writes: Couldn't we use some library that already have this, like glib (or something else). If it's not up to what we need, than fix that library instead. I wasn't aware that glib had this. I'll look. Of course the trouble with relying on glibc is that we'd have no solution for platforms that don't use glibc. It might be okay to rely on glibc for a first-cut implementation, realizing that we couldn't do everything at once anyway. regards, tom lane ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [HACKERS] A rough roadmap for internationalization fixes
Peter Eisentraut [EMAIL PROTECTED] writes: Actually, what will more likely happen is that we'll define a collation as a collection of one or more support functions, the equivalents of strxfrm() and possibly a few more. Then it will be up to those functions to define the collation order. The server will provide utility functions that will facilitate implementing a collation order that follows the Unicode collation algorithm, but you could just as well implement one using memcmp() or whatever you like. That sounds like a good plan to me. Personally I'd want a memcmp()-based collation implementation available, so that people who don't care about sorting anything beyond 7-bit ASCII don't need to pay a lot of overhead. We have seen over and over that strcoll() is depressingly slow in some locales (at least on some platforms). Do you have any feeling for the real-world performance of the Unicode algorithm? regards, tom lane ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [HACKERS] A rough roadmap for internationalization fixes
Tom Lane [EMAIL PROTECTED] writes: Peter Eisentraut [EMAIL PROTECTED] writes: I wasn't aware that glib had this. I'll look. Of course the trouble with relying on glibc is that we'd have no solution for platforms that don't use glibc. glib != glibc. glib is the low-level library used by GTK and GNOME for basic data structures, character handling etc. It's LGPL AFAIK, which would seem to rule out diredct use from a licensing perspective. -Doug ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [HACKERS] A rough roadmap for internationalization fixes
Dennis Bjorklund kirjutas T, 25.11.2003 kell 14:51: On Tue, 25 Nov 2003, Tatsuo Ishii wrote: I'm tired of telling that Unicode is not that perfect. Of course not, but neither is the current multibyte with only marginal support for unicode (many people actually need upper()/lower() ) Another gottcha with Unicode is the UTF-8 encoding (currently we use) consumes 3 bytes for each Kanji character, while other encodings consume only 2 bytes. I think that for *storage* we should use SCSU (the Standard Compression Scheme for Unicode). IMO 3/2 storage ratio could not be neglected for database use. SCSU should solve that (actually it should use less than 2 bytes char for encoding any single language) The rest of the world seems to select unicode as the way to handle different languages in the UI of programs. For example gnome supports nothing but unicode. How is that handled in your country? I know that you are tired of people who don't understand how difficult it is for you, but I really would like to know. Is gnome not used over there because of this? About storing data in the database, I would expect it to work with any encoding, just like I would expect pg to be able to store images in any format. I'll try to not mention unicode near you in the feature :-) --- Hannu ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [HACKERS] Providing anonymous mmap as an option of sharing memory
Shridhar Daithankar [EMAIL PROTECTED] writes: I was looking thr. the source and thought it would be worth to seek opinion on this proposal. This has been discussed and rejected before. See the archives. regards, tom lane ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [HACKERS] Build farm
Andrew Dunstan wrote: Bruce Momjian wrote: FYI, the HP testdrive farm, http://www.testdrive.hp.com, has shared directories for most of the machines, meaning you can CVS update once and telnet in to compile for each platform. As Peter pointed out, these machines are firewalled. But presumably one could upload a snapshot to them. What I had in mind was a more distributed system, though. Of course, these things are not mutually exclusive - using the HP testdrive farm looks like it might be nice. But it would be hard to automate, I suspect. I figured you could just upload once and telnet and build on each machine. -- Bruce Momjian| http://candle.pha.pa.us [EMAIL PROTECTED] | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup.| Newtown Square, Pennsylvania 19073 ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] A rough roadmap for internationalization fixes
Peter Eisentraut [EMAIL PROTECTED] writes: 2. Reimplement gettext to use 1. and allow switching of language and encoding at run-time. 3. Implement Unicode collation algorithm and character classification routines that are aware of 1. Use that in place of system locale routines. This sounds like you want to completely reimplement all of the locale handling provided by the OS? That seems like a dead-end approach to me. There's no way your handling will ever be as complete or as well optimized as some OS's. Better to find ways to use the OS gettext and locale handling on platforms that provide good interfaces. On platforms that don't provide good interfaces either don't support the features or use some third party library to provide a good implementation. The only thing you really need in the database is a second parameter on all the collation functions like strxfrm(col,locale) etc. Then functional indexes take care of almost everything. The only advantage to adding locales per-column and/or per-index is the notational simplicity. Queries could do simple standard expressions and not have to worry about calling strxfrm or other locale-specific functions all the time. I'm not sure it's worth the complexity of having to deal with WHERE xy where x and y are in different locales though. -- greg ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [HACKERS] Commercial binary support?
Josh, Hans, et. al. Please take this thread OFF LIST IMMEDIATELY. Its content is no longer appropriate for the Hackers mailing list, and we get enough traffic. Flamewars are not a part of our community. -- Josh Berkus Aglio Database Solutions San Francisco ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [HACKERS] Build farm
Bruce Momjian wrote: Andrew Dunstan wrote: Bruce Momjian wrote: FYI, the HP testdrive farm, http://www.testdrive.hp.com, has shared directories for most of the machines, meaning you can CVS update once and telnet in to compile for each platform. As Peter pointed out, these machines are firewalled. But presumably one could upload a snapshot to them. What I had in mind was a more distributed system, though. Of course, these things are not mutually exclusive - using the HP testdrive farm looks like it might be nice. But it would be hard to automate, I suspect. I figured you could just upload once and telnet and build on each machine. What I'm working on (slowly - I'm quite busy right now, and about to be away from home for 5 days) is a system which would (or could) run from cron on every member of the farm, and upload its results to a central server where it could be displayed, in a somewhat similar way to the way the Samba build farm works - see http://build.samba.org/ - so we'd be able to see at a glance when something is broken and where and why. We could also incorporate email notification of breakage, as a refinement. I have a few pieces of this working but not a full suite yet - it will essentially be 3 perl scripts - one on the client (to run the update(s), build(s) and upload the results) and two on the central server (one for upload and one for display). When I get a demo page done I'll show it working with a couple of hosts. Of course, you can automate (almost) anything, including telnet, but right now I'm assuming the farm members will have internet connectivity. cheers andrew ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
splitting WAL (was RE: [HACKERS] Providing anonymous mmap as an option of sharing memory)
In case of WAL per database, the operations done on a shared catalog from a backend would need flushing system WAL and database WAL to ensure such transaction commit. Otherwise only flushing database WAL would do. I don't think that is a good idea. If you want databases separated you should install more than one instance. That gives you way more flexibility. Imho per database WAL is a deficiency, not a feature. Andreas PS: problem with mmap was, that it has no attached process count ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Considerations for lib64
On Mon, Nov 24, 2003 at 07:25:56PM +0100, Peter Eisentraut wrote: Currently, when you specify --with-openssl, then configure automatically adds -L/usr/local/ssl/lib to LDFLAGS if that directory exists. This would pick up the wrong directory if you are in 64-bit mode. Analogous behavior exists for --with-krb5 with /usr/athena. I think these default installation directories of OpenSSL and Kerberos are mostly obsolete these days, so I'd rather get rid of that behavior altogether and let people specify the necessary directories with --with-libraries and --with-includes like for any of the other optional packages that PostgreSQL supports. Both default directories are wrong according to my installation. In fact, both libraries are just in /usr/lib. Certainly it will be a mess trying to compile with a different LDFLAGS if things are added randomly. -- Alvaro Herrera (alvherre[a]dcc.uchile.cl) Cómo ponemos nuestros dedos en la arcilla del otro. Eso es la amistad; jugar al alfarero y ver qué formas se pueden sacar del otro (C. Halloway en La Feria de las Tinieblas, R. Bradbury) ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [HACKERS] A rough roadmap for internationalization fixes
On Tue, Nov 25, 2003 at 08:40:57PM +0900, Tatsuo Ishii wrote: On Tue, 25 Nov 2003, Peter Eisentraut wrote: I've always thought unicode was enough to even represent Japanese. Then the client encoding can be something else that we can convert to. In any way, the encoding of the message catalog has to be known to the system so it can be converted to the correct encoding for the client. I'm tired of telling that Unicode is not that perfect. Maybe it should be explained what the problems really are, instead of saying it isn't perfect? From what I understand there is only a problem converting from the legacy encoding to unicode, and the other way around, and no problem if you stop doing the conversion. The conversion problem is because what in an encoding is only represented by 1 character can be several characters in unicode. Some examples people might understand are: - µ: In iso 8859-1 it's char 0xB5. In unicode it can be U+00B5 (micro sign) or U+03BC (greek letter small mu) - Å: ISO 8859-1: 0xC5. Unicode U+00C5 (latin capital letter a with ring above) or U+212B (angstrom sign) - The ohm sign vs the greek letter omega. - Quotation marks: You have left double quote, right double quote, and a few others. Another gottcha with Unicode is the UTF-8 encoding (currently we use) consumes 3 bytes for each Kanji character, while other encodings consume only 2 bytes. IMO 3/2 storage ratio could not be neglected for database use. You can encode unicode in different ways, and UTF-8 is only one of them. Is there a problem with using UCS-2 (except that it would require more storage for ASCII)? Kurt ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [HACKERS] Function parameter names
On Sun, 23 Nov 2003, Tom Lane wrote: Actually I'd suggest text[], as there is no good reason to pad the array entries to a fixed length. I've implemented this part now and it stores the paremeter names in the pg_proc table as a text[] field. However, in the parser I use IDENT to get the parameter names and already in the lexer the IDENT tokens are truncated to length NAMEDATALEN. So I've got 3 options: 1) Leave it as is now where the system table allows any length but the parser only lets you insert short identifiers. 2) Change the type to name[] 3) Change the parser to accept identifiers of any length and add the length check in a later phase for the identifiers that need to be shorter. Any opinions or should I just make a choice myself? -- /Dennis ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [HACKERS] Function parameter names
Dennis Bjorklund [EMAIL PROTECTED] writes: However, in the parser I use IDENT to get the parameter names and already in the lexer the IDENT tokens are truncated to length NAMEDATALEN. Right. What's the problem? regards, tom lane ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [HACKERS] Commercial binary support?
Hans, Josh, Please take this thread OFF LIST IMMEDIATELY. Sorry. Not enough coffee this AM -- should know better than to send e-mail when I'm short beans. Overreacted a bit, there.Apologies. -- -Josh Berkus Aglio Database Solutions San Francisco ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] A rough roadmap for internationalization fixes
Greg Stark writes: This sounds like you want to completely reimplement all of the locale handling provided by the OS? That seems like a dead-end approach to me. There's no way your handling will ever be as complete or as well optimized as some OS's. Actually, I'm pretty sure it will be more complete. About the optimization, we'll have to see. Better to find ways to use the OS gettext and locale handling on platforms that provide good interfaces. There are no such platforms to my knowledge. The exception is some version of glibc that provides undocumented interfaces to functionality that is rumoured to do something that may or may not be relevant to what we're doing. On platforms that don't provide good interfaces either don't support the features or use some third party library to provide a good implementation. There are no such libraries. I keep hearing ICU, but that is much too bloated. -- Peter Eisentraut [EMAIL PROTECTED] ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [HACKERS] Function parameter names
On Tue, 25 Nov 2003, Tom Lane wrote: Dennis Bjorklund [EMAIL PROTECTED] writes: However, in the parser I use IDENT to get the parameter names and already in the lexer the IDENT tokens are truncated to length NAMEDATALEN. Right. What's the problem? It's strange to allow identifiers to be of any length in the system table when there is no way to create it using normal syntax. The parser accepts this kind of input: CREATE FUNCTION foo (x int) RETURNS int AS ... and the identifier x (as all identifiers) can not be too long. Still, one can create the function and update the system table by hand to change x to a longer name. Doesn't that sound ugly to you? It's not a technical problem, but a matter of style. Everything works as it is now, but works is not always enough. -- /Dennis ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [HACKERS] A rough roadmap for internationalization fixes
Peter Eisentraut kirjutas T, 25.11.2003 kell 21:13: Greg Stark writes: This sounds like you want to completely reimplement all of the locale handling provided by the OS? That seems like a dead-end approach to me. There's no way your handling will ever be as complete or as well optimized as some OS's. Actually, I'm pretty sure it will be more complete. About the optimization, we'll have to see. Better to find ways to use the OS gettext and locale handling on platforms that provide good interfaces. There are no such platforms to my knowledge. Unless you consider ICU (http://oss.software.ibm.com/icu/) as a platform ;) We will hardly ever be more complete than it. There are no such libraries. I keep hearing ICU, but that is much too bloated. At least it is kind of standard and also something what will be maintained for foreseeable future, it also has a compatible license and is available on all platforms of interest to postgresql. And I am not sure that this bloat will affect us too much unless we want to start maintaining a parallel copy - glibc is much more bloated than ICU . But if you insist on rolling your own library, you can always use ICU to write regression test to compare yours with ... - Hannu ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [HACKERS] Function parameter names
Dennis Bjorklund [EMAIL PROTECTED] writes: and the identifier x (as all identifiers) can not be too long. Still, one can create the function and update the system table by hand to change x to a longer name. Doesn't that sound ugly to you? It has always been, and likely always will be, possible to use manual updating of the system catalogs to arrive at states that you could not get into otherwise, and which might or might not work correctly, for whatever value of correctly you think is correct. This doesn't particularly bother me, since we have always told people that manual updates are unsupported and are strictly for people who know exactly what they're doing. If it really bugs you, possibly the column could be declared as varchar(NAMEDATALEN-1)[] rather than text[], but I think the amount of effort needed to make that happen within the .bki file would be well out of proportion to the usefulness. (Actually, it'd still not be 100% right, since varchar(N) counts characters not bytes ...) regards, tom lane ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [HACKERS] A rough roadmap for internationalization fixes
Kurt Roeckx [EMAIL PROTECTED] writes: You can encode unicode in different ways, and UTF-8 is only one of them. Is there a problem with using UCS-2 (except that it would require more storage for ASCII)? UCS-2 is impractical without some *extremely* wide-ranging changes in the backend. To take just the most obvious point, doesn't it require allowing embedded zero bytes in text strings? regards, tom lane ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
[HACKERS] 7.4final regression failure on uw713
Hi, Don't know if it's bad, but make check reports a regression failure on join. Here's the regression.diffs *** ./expected/join.out Thu Sep 25 08:58:06 2003 --- ./results/join.out Tue Nov 25 23:46:27 2003 *** *** 1732,1739 | 6 | 6 | six | | 7 | 7 | seven | | 8 | 8 | eight | - | | | null | | | 0 | zero | (13 rows) SELECT '' AS xxx, * --- 1732,1739 | 6 | 6 | six | | 7 | 7 | seven | | 8 | 8 | eight | | | 0 | zero | + | | | null | (13 rows) SELECT '' AS xxx, * *** *** 1752,1759 | 6 | 6 | six | | 7 | 7 | seven | | 8 | 8 | eight | - | | | null | | | 0 | zero | (13 rows) SELECT '' AS xxx, * --- 1752,1759 | 6 | 6 | six | | 7 | 7 | seven | | 8 | 8 | eight | | | 0 | zero | + | | | null | (13 rows) SELECT '' AS xxx, * *** *** 1793,1800 -+---+---+---+ | 0 | | zero | | 1 | 4 | one | -1 - | 2 | 3 | two | 2 | 2 | 3 | two | 4 | 3 | 2 | three | -3 | 4 | 1 | four | | 5 | 0 | five | -5 --- 1793,1800 -+---+---+---+ | 0 | | zero | | 1 | 4 | one | -1 | 2 | 3 | two | 4 + | 2 | 3 | two | 2 | 3 | 2 | three | -3 | 4 | 1 | four | | 5 | 0 | five | -5 *** *** 1815,1822 -+---+---+---+ | 0 | | zero | | 1 | 4 | one | -1 - | 2 | 3 | two | 2 | 2 | 3 | two | 4 | 3 | 2 | three | -3 | 4 | 1 | four | | 5 | 0 | five | -5 --- 1815,1822 -+---+---+---+ | 0 | | zero | | 1 | 4 | one | -1 | 2 | 3 | two | 4 + | 2 | 3 | two | 2 | 3 | 2 | three | -3 | 4 | 1 | four | | 5 | 0 | five | -5 == -- Olivier PRENANT Tel: +33-5-61-50-97-00 (Work) 6, Chemin d'Harraud Turrou +33-5-61-50-97-01 (Fax) 31190 AUTERIVE +33-6-07-63-80-64 (GSM) FRANCE Email: [EMAIL PROTECTED] -- Make your life a dream, make your dream a reality. (St Exupery) ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [HACKERS] PANIC: rename from /data/pg_xlog/0000002200000009
Yurgis Baykshtis [EMAIL PROTECTED] writes: I just noticed that the rename panic errors like this one: PANIC: rename from /data/pg_xlog/0003001F to /data/pg_xlog/0003002C (initialization of log file 3, segment 44) failed: No such file or directory come shortly AFTER the following messages LOG: recycled transaction log file 0003001B LOG: recycled transaction log file 0003001C LOG: recycled transaction log file 0003001D LOG: recycled transaction log file 0003001E LOG: removing transaction log file 0003001F LOG: removing transaction log file 00030020 LOG: removing transaction log file 00030021 LOG: removing transaction log file 00030022 So, you can see that 0003001F file was previously deleted by the logic in MoveOfflineLogs() function. Interesting ... Now what I can see is that MoveOfflineLogs() does not seem to be synchronized between backends. It's certainly supposed to be, because the only place it is called from holds the CheckPointLock while it's doing it. If more than one backend is able to run MoveOfflineLogs at a time, then the LWLock code is simply broken. That seems unlikely, as just about nothing would work reliably if LWLock failed to lock out concurrent operations. What I suspect at this point is a cygwin bug: somehow, its implementation of readdir() is able to retrieve a stale view of a directory. I'd suggest pinging the cygwin developers to see if that idea strikes a chord or not. [ thinks for a bit... ] It might be that it isn't even a stale-data issue, but that readdir() misbehaves if there are concurrent insert, rename or delete operations carried out in the same directory. (The renames or deletes would be coming from MoveOfflineLogs itself, the inserts, if any, from concurrent backends finding that they need more WAL space.) Again I would call that a cygwin bug, as we've not seen reports of comparable behavior anywhere else. Also, we have a suspicion that the problem happens even with only one client connected to postgres. Unless the clients are issuing explicit CHECKPOINT operations, that wouldn't matter, because MoveOfflineLogs is called only from checkpointing, and the postmaster never creates more than one background checkpoint process at a time. (So there are actually two levels of protection in place against concurrent execution of this code.) regards, tom lane ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] 7.4final regression failure on uw713
[EMAIL PROTECTED] writes: Don't know if it's bad, but make check reports a regression failure on join. I believe we'd determined that this is an acceptable platform-specific behavior. regards, tom lane ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [HACKERS] A rough roadmap for internationalization fixes
About storing data in the database, I would expect it to work with any encoding, just like I would expect pg to be able to store images in any format. What's stopping us supporting the other Unicode encodings, eg. UCS-16 which could save Japansese storage space. Chris ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [HACKERS] Function parameter names
Dennis Bjorklund [EMAIL PROTECTED] writes: It's strange to allow identifiers to be of any length in the system table when there is no way to create it using normal syntax. I agree with Tom -- that doesn't seem strange to me at all. -Neil ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [HACKERS] Release cycle length
On Mon, Nov 24, 2003 at 11:08:44PM -0600, Jim C. Nasby wrote: Has anyone looked at using replication as a migration method? If Looked at? Sure. Heck, I've done it. Yes, it works. Is it painless? Well, that depends on whether you think using erserver is painless. ;-) It's rather less downtime than pg_dump | psql, I'll tell you. A -- Andrew Sullivan 204-4141 Yonge Street Afilias CanadaToronto, Ontario Canada [EMAIL PROTECTED] M2P 2A8 +1 416 646 3304 x110 ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] [PERFORM] More detail on settings for pgavd?
On Fri, Nov 21, 2003 at 07:51:17PM -0500, Greg Stark wrote: The second vacuum waits for the lock to become available. If the situation got really bad there could end up being a growing queue of vacuums waiting. Those of us who have run into this know that the situation got really bad is earlier than one might think. And it can indeed cause some pretty pathological behaviour. A -- Andrew Sullivan 204-4141 Yonge Street Afilias CanadaToronto, Ontario Canada [EMAIL PROTECTED] M2P 2A8 +1 416 646 3304 x110 ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [HACKERS] A rough roadmap for internationalization fixes
Greg Stark [EMAIL PROTECTED] writes: The only advantage to adding locales per-column and/or per-index is the notational simplicity. Well, actually, the reason we are interested in doing it is the SQL spec demands it. regards, tom lane ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster