[PATCHES] default database creation with initdb
As per discussion on -hackers the attached patch creates the 'default' database at initdb time as a default target for initial connections to keep template1 free from connections and available as template source. I consider this DB a system object, so it's created before make_template0 sets the last_system_oid (wondering why template0 isn't considered a system db too) Regards, Andreas Index: src/bin/initdb/initdb.c === RCS file: /projects/cvsroot/pgsql/src/bin/initdb/initdb.c,v retrieving revision 1.83 diff -u -r1.83 initdb.c --- src/bin/initdb/initdb.c 30 Apr 2005 08:08:51 - 1.83 +++ src/bin/initdb/initdb.c 18 Jun 2005 08:37:16 - @@ -177,6 +177,7 @@ static void set_info_version(void); static void setup_schema(void); static void vacuum_db(void); +static void make_default(void); static void make_template0(void); static void trapsig(int signum); static void check_ok(void); @@ -1828,6 +1829,38 @@ } /* + * copy template1 to pg_system + */ +static void +make_default(void) +{ + PG_CMD_DECL; + char **line; + static char *pg_system_setup[] = { + CREATE DATABASE \default\;\n, + REVOKE CREATE,TEMPORARY ON DATABASE \default\ FROM public;\n, + NULL + }; + + fputs(_(copying template1 to default ... ), stdout); + fflush(stdout); + + snprintf(cmd, sizeof(cmd), +\%s\ %s template1 %s, +backend_exec, backend_options, +DEVNULL); + + PG_CMD_OPEN; + + for (line = pg_system_setup; *line; line++) + PG_CMD_PUTS(*line); + + PG_CMD_CLOSE; + + check_ok(); +} + +/* * copy template1 to template0 */ static void @@ -2606,6 +2639,8 @@ vacuum_db(); + make_default(); + make_template0(); if (authwarning != NULL) ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [PATCHES] default database creation with initdb
Umm. Tiny item, but your comment still refers to the database as pg_system ;-) //Magnus -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Andreas Pflug Sent: Saturday, June 18, 2005 10:42 AM To: PostgreSQL-patches Subject: [PATCHES] default database creation with initdb As per discussion on -hackers the attached patch creates the 'default' database at initdb time as a default target for initial connections to keep template1 free from connections and available as template source. I consider this DB a system object, so it's created before make_template0 sets the last_system_oid (wondering why template0 isn't considered a system db too) Regards, Andreas ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [PATCHES] default database creation with initdb
Magnus Hagander wrote: Umm. Tiny item, but your comment still refers to the database as pg_system ;-) :-) Regards, Andreas Index: src/bin/initdb/initdb.c === RCS file: /projects/cvsroot/pgsql/src/bin/initdb/initdb.c,v retrieving revision 1.83 diff -u -r1.83 initdb.c --- src/bin/initdb/initdb.c 30 Apr 2005 08:08:51 - 1.83 +++ src/bin/initdb/initdb.c 18 Jun 2005 08:54:07 - @@ -177,6 +177,7 @@ static void set_info_version(void); static void setup_schema(void); static void vacuum_db(void); +static void make_default(void); static void make_template0(void); static void trapsig(int signum); static void check_ok(void); @@ -1828,6 +1829,38 @@ } /* + * copy template1 to default + */ +static void +make_default(void) +{ + PG_CMD_DECL; + char **line; + static char *default_setup[] = { + CREATE DATABASE \default\;\n, + REVOKE CREATE,TEMPORARY ON DATABASE \default\ FROM public;\n, + NULL + }; + + fputs(_(copying template1 to default ... ), stdout); + fflush(stdout); + + snprintf(cmd, sizeof(cmd), +\%s\ %s template1 %s, +backend_exec, backend_options, +DEVNULL); + + PG_CMD_OPEN; + + for (line = default_setup; *line; line++) + PG_CMD_PUTS(*line); + + PG_CMD_CLOSE; + + check_ok(); +} + +/* * copy template1 to template0 */ static void @@ -2606,6 +2639,8 @@ vacuum_db(); + make_default(); + make_template0(); if (authwarning != NULL) ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [PATCHES] default database creation with initdb
Andreas Pflug [EMAIL PROTECTED] writes: + CREATE DATABASE \default\;\n, + REVOKE CREATE,TEMPORARY ON DATABASE \default\ FROM public;\n, Uh, why the rights revocation? That makes the thing essentially useless ... except to superusers, who will not be affected anyway. I don't think this is a sane default. In any case, fixing initdb for this is trivial. I count something north of 160 other references to template1 in the current sources, scattered across 60+ files. Every one of those has to be looked at and possibly changed. regards, tom lane ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [PATCHES] default database creation with initdb
On Sat, Jun 18, 2005 at 09:27:49 -0400, Robert Treat [EMAIL PROTECTED] wrote: On Saturday 18 June 2005 04:55, Andreas Pflug wrote: Magnus Hagander wrote: Umm. Tiny item, but your comment still refers to the database as pg_system ;-) What is the purpose of this database? A generalized, shared resource for tool makers and add-on packages to store information in PostgreSQL, or a working database that is usable (and to be used) out of the box for new users? I really don't think we want the latter... I can see users connecting via psql and then playing around with different add/create type statements. It is all too common a question from newbies... does postgresql have a default database to get started with? They'll see this database and begin creating schema and using this as thier main database, and I think we ought to avoid that. If people don't like pg_system, pg_addons seem like a much safer name to go with imho. I believe the intention is that things that need to connect to some database to do their work (e.g. psql -l, createuser) will connect to that database. createdb will still connect to template1, but now will be less likely to have a conflict with another user being connected to template1 at the same time. I didn't check the patch to see if the behavior of the psql -l and createuser were changed or if just the initdb behavior was changed. I don't think it will be a big deal if people put stuff in that database. ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PATCHES] TODO Item - Return compressed length of TOAST datatypes (WIP)
Mark Kirkwood [EMAIL PROTECTED] writes: I thought I would have a look at: (Datatypes) Add function to return compressed length of TOAST data values. My recollection of that discussion is that we just wanted something that would return the actual VARSIZE() of the datum. You're building something way too confusing ... A more interesting point is that the way you've declared the function, it will only work on text values, which is pretty limiting. Ideally we'd define this thing as pg_datum_length(any-varlena-type) returns int but there is no pseudotype corresponding to any varlena type. I was thinking about this the other day in connection with my proposal to make something that could return the TOAST value OID of an out-of-line datum. I think the only non-restrictive way to do it would be to declare the function as taking any, and then add a runtime check using the get_fn_expr_argtype() mechanism to test that what you've been given is in fact a varlena datatype. regards, tom lane ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PATCHES] Post-mortem: final 2PC patch
Tom Lane wrote: I've attached the 2PC patch as applied in case you want to have a look. I did some fairly significant hacking on it, and I thought it'd be a good idea to enumerate some of the changes: FYI: this commit seems to cause problems/crashes during make check on my OpenBSD/Sparc64 buildfarmclient: http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=spoonbilldt=2005-06-17%2023:50:04 I just checked manually with a fresh checkout - and I got similiar failures. Stefan ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PATCHES] Post-mortem: final 2PC patch
Stefan Kaltenbrunner [EMAIL PROTECTED] writes: FYI: this commit seems to cause problems/crashes during make check on my OpenBSD/Sparc64 buildfarmclient: http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=spoonbilldt=2005-06-17%2023:50:04 I just checked manually with a fresh checkout - and I got similiar failures. Can you get a stack trace from the core dump? As best I can tell from the buildfarm report, one of the regression tests is causing a sig11 --- but it's *not* the prepared_xacts test, so I'm not sure the failure is related to this patch. regards, tom lane ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PATCHES] Post-mortem: final 2PC patch
Tom Lane wrote: Stefan Kaltenbrunner [EMAIL PROTECTED] writes: FYI: this commit seems to cause problems/crashes during make check on my OpenBSD/Sparc64 buildfarmclient: http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=spoonbilldt=2005-06-17%2023:50:04 I just checked manually with a fresh checkout - and I got similiar failures. Can you get a stack trace from the core dump? (gdb) bt #0 0x50377ba4 in memcpy () from /usr/lib/libc.so.34.2 #1 0x00326efc in hash_search () #2 0x00343430 in pg_tzset () #3 0x001fbcf0 in assign_timezone () #4 0x00330e88 in set_config_option () #5 0x0029b5b0 in PostgresMain () #6 0x0026b140 in BackendRun () #7 0x0026a9f0 in BackendStartup () #8 0x002685bc in ServerLoop () #9 0x00267b88 in PostmasterMain () #10 0x00220cc8 in main () rebuilding with --enable-debug right now ... As best I can tell from the buildfarm report, one of the regression tests is causing a sig11 --- but it's *not* the prepared_xacts test, so I'm not sure the failure is related to this patch. hmm - maybe one of the timezone patches ? Stefan ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [PATCHES] Post-mortem: final 2PC patch
Stefan Kaltenbrunner [EMAIL PROTECTED] writes: Can you get a stack trace from the core dump? (gdb) bt #0 0x50377ba4 in memcpy () from /usr/lib/libc.so.34.2 #1 0x00326efc in hash_search () #2 0x00343430 in pg_tzset () #3 0x001fbcf0 in assign_timezone () #4 0x00330e88 in set_config_option () #5 0x0029b5b0 in PostgresMain () #6 0x0026b140 in BackendRun () #7 0x0026a9f0 in BackendStartup () #8 0x002685bc in ServerLoop () #9 0x00267b88 in PostmasterMain () #10 0x00220cc8 in main () hmm - maybe one of the timezone patches ? Looks suspicious, doesn't it. How long since you last tested on that machine? regards, tom lane ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PATCHES] Post-mortem: final 2PC patch
Tom Lane wrote: Stefan Kaltenbrunner [EMAIL PROTECTED] writes: Can you get a stack trace from the core dump? (gdb) bt #0 0x50377ba4 in memcpy () from /usr/lib/libc.so.34.2 #1 0x00326efc in hash_search () #2 0x00343430 in pg_tzset () #3 0x001fbcf0 in assign_timezone () #4 0x00330e88 in set_config_option () #5 0x0029b5b0 in PostgresMain () #6 0x0026b140 in BackendRun () #7 0x0026a9f0 in BackendStartup () #8 0x002685bc in ServerLoop () #9 0x00267b88 in PostmasterMain () #10 0x00220cc8 in main () hmm - maybe one of the timezone patches ? Looks suspicious, doesn't it. How long since you last tested on that machine? *argl* - it's not 2PC ... the machine had some issues a week ago or so - but it looks like the problem occured first here: http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=spoonbilldt=2005-06-15%2023:50:04 and in that changeset we have some timezone-patches ... Stefan ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PATCHES] Post-mortem: final 2PC patch
Stefan Kaltenbrunner [EMAIL PROTECTED] writes: the machine had some issues a week ago or so - but it looks like the problem occured first here: Hmm, what kind of issues, and are you sure they are fixed? The stack trace looks to me like it is trying to apply the PGTZ setting that's coming in from the client during startup. Now I could believe a platform-specific breakage there from Magnus' recent hacking, but the problem is that *every* backend launched during the regression tests is going to be doing this, and doing it in the same way with the same value. It's pretty tough to see why some backends would fail at this spot and some not ... unless what it is is an intermittent hardware problem. Is there a version of memtest86 that works on your machine? regards, tom lane ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [PATCHES] Post-mortem: final 2PC patch
Tom Lane wrote: Stefan Kaltenbrunner [EMAIL PROTECTED] writes: the machine had some issues a week ago or so - but it looks like the problem occured first here: Hmm, what kind of issues, and are you sure they are fixed? admin error :-). I had a second postgresql instance running causing the buildfarm runs to report a failure because of shared memory-limits. Had nothing to do with hardware problems though ... The stack trace looks to me like it is trying to apply the PGTZ setting that's coming in from the client during startup. Now I could believe a platform-specific breakage there from Magnus' recent hacking, but the problem is that *every* backend launched during the regression tests is going to be doing this, and doing it in the same way with the same value. It's pretty tough to see why some backends would fail at this spot and some not ... unless what it is is an intermittent hardware problem. Is there a version of memtest86 that works on your machine? memtest86 works on x86 CPU's only afaik - no ? anyway - here is the promised backtrace: #0 0x4489fba4 in memcpy () from /usr/lib/libc.so.34.2 #1 0x00326f9c in hash_search (hashp=0xa6e030, keyPtr=0xa5ff90, action=HASH_ENTER, foundPtr=0x0) at dynahash.c:653 #2 0x003434f0 in pg_tzset (name=0xa5ff90 PST8PDT) at pgtz.c:1039 #3 0x001fbcf0 in assign_timezone (value=0xa5ff90 PST8PDT, doit=1 '\001', source=PGC_S_CLIENT) at variable.c:351 #4 0x00330f28 in set_config_option (name=0xa52830 timezone, value=0xa52870 PST8PDT, context=10645504, source=PGC_S_CLIENT, isLocal=0 '\0', changeVal=1 '\001') at guc.c:3748 #5 0x0029b5f0 in PostgresMain (argc=4, argv=0xa52928, username=0xa52740 mastermind) at postgres.c:2759 #6 0x0026b180 in BackendRun (port=0xa77800) at postmaster.c:2800 #7 0x0026aa30 in BackendStartup (port=0xa77800) at postmaster.c:2440 #8 0x002685fc in ServerLoop () at postmaster.c:1221 #9 0x00267bc8 in PostmasterMain (argc=0, argv=0x57d8) at postmaster.c:930 #10 0x00220cc8 in main (argc=6, argv=0x57d8) at main.c:268 Stefan ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [PATCHES] Post-mortem: final 2PC patch
Stefan Kaltenbrunner [EMAIL PROTECTED] writes: anyway - here is the promised backtrace: #0 0x4489fba4 in memcpy () from /usr/lib/libc.so.34.2 #1 0x00326f9c in hash_search (hashp=0xa6e030, keyPtr=0xa5ff90, action=HASH_ENTER, foundPtr=0x0) at dynahash.c:653 #2 0x003434f0 in pg_tzset (name=0xa5ff90 PST8PDT) at pgtz.c:1039 #3 0x001fbcf0 in assign_timezone (value=0xa5ff90 PST8PDT, doit=1 '\001', source=PGC_S_CLIENT) at variable.c:351 #4 0x00330f28 in set_config_option (name=0xa52830 timezone, value=0xa52870 PST8PDT, context=10645504, source=PGC_S_CLIENT, isLocal=0 '\0', changeVal=1 '\001') at guc.c:3748 #5 0x0029b5f0 in PostgresMain (argc=4, argv=0xa52928, username=0xa52740 mastermind) at postgres.c:2759 Yeah, that's exactly what I thought it was doing: setting timezone PST8PDT due to the client PGTZ environment variable. So the question now is why this is an intermittent failure. Why doesn't every single regression test crash in the same place? regards, tom lane ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
[PATCHES] Typo in pl_funcs.c comment
pl_funcs.c has a comment that refers to functins, which I'm assuming should be functions. The attached patch corrects the spelling. -- Michael Fuhr http://www.fuhr.org/~mfuhr/ Index: src/pl/plpgsql/src/pl_funcs.c === RCS file: /projects/cvsroot/pgsql/src/pl/plpgsql/src/pl_funcs.c,v retrieving revision 1.43 diff -c -r1.43 pl_funcs.c *** src/pl/plpgsql/src/pl_funcs.c 14 Jun 2005 06:43:14 - 1.43 --- src/pl/plpgsql/src/pl_funcs.c 18 Jun 2005 19:54:57 - *** *** 1,5 /** ! * pl_funcs.c - Misc functins for the PL/pgSQL * procedural language * * IDENTIFICATION --- 1,5 /** ! * pl_funcs.c - Misc functions for the PL/pgSQL * procedural language * * IDENTIFICATION ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PATCHES] Post-mortem: final 2PC patch
On Sat, 18 Jun 2005, Tom Lane wrote: I've attached the 2PC patch as applied in case you want to have a look. I did some fairly significant hacking on it, and I thought it'd be a good idea to enumerate some of the changes: I modified the in-memory data structure so that there is a separate dummy PGPROC for each prepared transaction (GXACT), and we control the visibility of the transaction to TransactionIdIsInProgress by inserting or removing the PGPROC in the global ProcArray. This costs a little space but it has a lot of benefits: (..list of benefits..) Yep, that looks much better. In general, a prepared transaction now looks and behaves more like normal active transactions. That leaves less room for error. Another area that I changed was fsyncing of prepared transaction status files. I really didn't like expecting checkpoint to do this --- for one thing there are race conditions associated with fsyncing a status file that someone else is busy deleting. (You could possibly avoid that by having the checkpointer hold the TwoPhaseStateLock while it's fsyncing, but the performance implications of that are unpleasant.) The checkpoint coding dealt with the race condition by ignoring *all* open() failures, which is hardly what I'd call a robust solution. The way it's done in the committed patch, the prepare sequence is: 1. Write the status file, append a deliberately bad CRC, and fsync it. 2. Write the WAL record and fsync it. 3. Overwrite the bad CRC with a good one and fsync. It's annoying to have 3 fsyncs here; 2 would be nicer; but from a safety perspective I don't think there's much choice. Steps 2 and 3 form a critical section: if we complete 2 but fail at 3 we have to PANIC because on-disk state is not consistent with what it says in WAL. So I felt we had to do the full pushup in step 1 to be as certain as we can possibly be that step 3 won't fail. I'm not totally satisfied with this --- it's OK from a correctness standpoint but the performance leaves something to be desired. Ouch, that really hurts performance. In typical 2PC use, the state files live for a very short period of time, just long enough for the transaction manager to prepare all the resource managers participating in the global transaction, and then commit them. We're talking 1 s. If we let checkpoint to fsync the state files, we would only have to fsync those state files that happen to be alive when the checkpoint comes. And if we fsync the state files at the end of the checkpoint, after all the usual heap pages etc, it's very likely that even those rare state files that were alive when the checkpoint began, have already been deleted. So we're really talking about 1 fsync vs. 3 fsyncs. Can we figure out another way to solve the race condition? Would it in fact be ok for the checkpointer to hold the TwoPhaseStateLock, considering that it usually wouldn't be held for long, since usually the checkpoint would have very little work to do? Was there other reasons beside the race condition why you did this way? I also fooled around a bit with locking of GXACTs. In the original patch you prevent two backends from trying to commit/rollback the same GXACT by removing it from the pool at the start of the process. That's not workable anymore, so what I did was to add a field locking_xid showing the XID of the transaction currently working on the GXACT. As long as this XID is active according to ProcArray (ignoring the prepared xact itself of course) the GXACT cannot be touched by anyone else. This provides a convenient way to interlock creation of a GXACT as well as its eventual destruction. Ok. I also changed the handling of smgr pending deletions. The original patch made smgr a 2PC resource manager, but that does not work because XLOG replay doesn't (and shouldn't I think) run the resource managers; thus replay of a PREPARE/COMMIT sequence would fail to delete whatever files should have been deleted. The correct way to do this is that COMMIT or ROLLBACK PREPARED must emit a WAL record that looks exactly like a normal COMMIT or ABORT WAL record, including listing any files to be deleted. So I pulled the files-to-delete info into the 2PC file header rather than keeping it in resource records. Ok. - Heikki ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PATCHES] Post-mortem: final 2PC patch
Heikki Linnakangas [EMAIL PROTECTED] writes: On Sat, 18 Jun 2005, Tom Lane wrote: I'm not totally satisfied with this --- it's OK from a correctness standpoint but the performance leaves something to be desired. Ouch, that really hurts performance. In typical 2PC use, the state files live for a very short period of time, just long enough for the transaction manager to prepare all the resource managers participating in the global transaction, and then commit them. We're talking 1 s. If we let checkpoint to fsync the state files, we would only have to fsync those state files that happen to be alive when the checkpoint comes. That's a good point --- I was thinking this was basically 4 fsyncs per xact (counting the additional WAL fsync needed for COMMIT PREPARED) versus 3, but if the average lifetime of a state file is short then it's 4 vs 2, and what's more the 2 are on WAL, which should be way cheaper than fsyncing random files. And if we fsync the state files at the end of the checkpoint, after all the usual heap pages etc, it's very likely that even those rare state files that were alive when the checkpoint began, have already been deleted. That argument is bogus, at least with respect to the way you were doing it in the original patch, because what you were fsyncing was whatever existed when CheckPointTwoPhase() started. It could however be interesting if we actually implemented something that checked the age of the prepared xact. Can we figure out another way to solve the race condition? Would it in fact be ok for the checkpointer to hold the TwoPhaseStateLock, considering that it usually wouldn't be held for long, since usually the checkpoint would have very little work to do? If you're concerned about throughput of 2PC xacts then we can't sit on the TwoPhaseStateLock while doing I/O; that will block both preparation and commital of all 2PC xacts for a pretty long period in CPU terms. Here's a sketch of an idea inspired by your comment above: 1. In each gxact in shared memory, store the WAL offset of the PREPARE record, which we will know before we are ready to mark the gxact valid. 2. When CheckPointTwoPhase runs (which we'll put near the end of the checkpoint sequence), the only gxacts that need to be fsync'd are those that are marked valid and have a PREPARE WAL location older than the checkpoint's redo horizon (anything newer will be replayed from WAL on crash, so it doesn't need fsync to complete the checkpoint). If you're right that the lifespan of a state file is often shorter than the time needed for a checkpoint, this wins big. In any case we'll never have to fsync state files that disappear before the next checkpoint. 3. One way to handle CheckPointTwoPhase is: * At start, take TwoPhaseStateLock (can be in shared mode) for just long enough to scan the gxact list and make a list of the XID of things that need fsync per above rule. * Without the lock, try to open and fsync each item in the list. Success: remove from list ENOENT failure on open: add to list of not-there failures Any other failure: ereport(ERROR) * If the failure list is not empty, again take TwoPhaseStateLock in shared mode, and check that each of the failures is now gone (or at least marked invalid); if so it's OK, otherwise ereport the ENOENT error. Another possibility is to further extend the locking protocol for gxacts so that the checkpointer can lock just the item it is fsyncing (which is not possible at the moment because the checkpointer hasn't got an XID, but probably we could think of another approach). But that would certainly delay attempts to commit the item being fsync'd, whereas the above approach might not have to do so, depending on the filesystem implementation. Now there's a small problem with this approach, which is that we cannot store the PREPARE WAL record location in the state files, since the state file has to be completely computed before writing the WAL record. However, we don't really need to do that: during recovery of a prepared xact we know the thing has been fsynced (either originally, or when we rewrote it during the WAL recovery sequence --- we can force an immediate fsync in that one case). So we can just put zero, or maybe better the current end-of-WAL location, into the reconstructed gxact in memory. Thoughts? regards, tom lane ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PATCHES] Post-mortem: final 2PC patch
On Sat, 18 Jun 2005, Tom Lane wrote: Heikki Linnakangas [EMAIL PROTECTED] writes: Can we figure out another way to solve the race condition? Would it in fact be ok for the checkpointer to hold the TwoPhaseStateLock, considering that it usually wouldn't be held for long, since usually the checkpoint would have very little work to do? If you're concerned about throughput of 2PC xacts then we can't sit on the TwoPhaseStateLock while doing I/O; that will block both preparation and commital of all 2PC xacts for a pretty long period in CPU terms. Here's a sketch of an idea inspired by your comment above: 1. In each gxact in shared memory, store the WAL offset of the PREPARE record, which we will know before we are ready to mark the gxact valid. 2. When CheckPointTwoPhase runs (which we'll put near the end of the checkpoint sequence), the only gxacts that need to be fsync'd are those that are marked valid and have a PREPARE WAL location older than the checkpoint's redo horizon (anything newer will be replayed from WAL on crash, so it doesn't need fsync to complete the checkpoint). If you're right that the lifespan of a state file is often shorter than the time needed for a checkpoint, this wins big. In any case we'll never have to fsync state files that disappear before the next checkpoint. 3. One way to handle CheckPointTwoPhase is: * At start, take TwoPhaseStateLock (can be in shared mode) for just long enough to scan the gxact list and make a list of the XID of things that need fsync per above rule. * Without the lock, try to open and fsync each item in the list. Success: remove from list ENOENT failure on open: add to list of not-there failures Any other failure: ereport(ERROR) * If the failure list is not empty, again take TwoPhaseStateLock in shared mode, and check that each of the failures is now gone (or at least marked invalid); if so it's OK, otherwise ereport the ENOENT error. In step 3.1, is it safe to skip gxacts not marked as valid? The gxact is marked as valid after the prepare record is written to WAL. If checkpoint runs after the WAL record is written but before the gxact is marked as valid, it doesn't get fsynced. Right? Otherwise, looks good to me. Another possibility is to further extend the locking protocol for gxacts so that the checkpointer can lock just the item it is fsyncing (which is not possible at the moment because the checkpointer hasn't got an XID, but probably we could think of another approach). But that would certainly delay attempts to commit the item being fsync'd, whereas the above approach might not have to do so, depending on the filesystem implementation. The above sketch is much better. Now there's a small problem with this approach, which is that we cannot store the PREPARE WAL record location in the state files, since the state file has to be completely computed before writing the WAL record. However, we don't really need to do that: during recovery of a prepared xact we know the thing has been fsynced (either originally, or when we rewrote it during the WAL recovery sequence --- we can force an immediate fsync in that one case). So we can just put zero, or maybe better the current end-of-WAL location, into the reconstructed gxact in memory. This reminds me of something. What should we do about XID wraparounds and prepared transactions? Should we have some mechanism to freeze prepared transactions, like heap tuples? At the minimum, I think we should issue a warning if the xid counter approaches the oldest prepared transaction. A transaction shouldn't live that long in normal use, but I can imagine an orphaned transaction sitting there for years if it doesn't hold any locks etc that bother other applications. I don't think we should implement heuristic commit/rollback, though. That creates a whole new class of problems. - Heikki ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PATCHES] Post-mortem: final 2PC patch
Heikki Linnakangas [EMAIL PROTECTED] writes: In step 3.1, is it safe to skip gxacts not marked as valid? The gxact is marked as valid after the prepare record is written to WAL. If checkpoint runs after the WAL record is written but before the gxact is marked as valid, it doesn't get fsynced. Right? It's safe because we have the CheckpointStartLock: no checkpoint occurring in that interval can be trying to set its redo pointer after the PREPARE record. This is essentially the same race condition previously detected for COMMIT vs. writing clog, and we need the interlock with either approach to fsyncing 2PC. This reminds me of something. What should we do about XID wraparounds and prepared transactions? I don't think we need any special protection. The prepared xacts will be blocking advancement of GlobalXmin, and the (new in 8.1) XID wrap detection code will shut us down safely. A transaction shouldn't live that long in normal use, but I can imagine an orphaned transaction sitting there for years if it doesn't hold any locks etc that bother other applications. No, because people will notice the effects on VACUUM if nothing else. I don't think we should implement heuristic commit/rollback, though. That creates a whole new class of problems. Agreed. I did put in the time-of-prepare display in pg_prepared_xacts, so anyone who really wants this can program the behavior they want from outside the database. regards, tom lane ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PATCHES] TODO Item - Return compressed length of TOAST datatypes
Tom Lane wrote: Mark Kirkwood [EMAIL PROTECTED] writes: I thought I would have a look at: (Datatypes) Add function to return compressed length of TOAST data values. My recollection of that discussion is that we just wanted something that would return the actual VARSIZE() of the datum. You're building something way too confusing ... I was guessing a little about exactly what was wanted - for some reason I couldn't access the mail archives for the last few days (however, can now). A more interesting point is that the way you've declared the function, it will only work on text values, which is pretty limiting. Ideally we'd define this thing as pg_datum_length(any-varlena-type) returns int but there is no pseudotype corresponding to any varlena type. I was thinking about this the other day in connection with my proposal to make something that could return the TOAST value OID of an out-of-line datum. I think the only non-restrictive way to do it would be to declare the function as taking any, and then add a runtime check using the get_fn_expr_argtype() mechanism to test that what you've been given is in fact a varlena datatype. Yes - was thinking I needed to check if the Datum was a varlena (and didn't know how to do it...), so thanks for the pointer! With these thoughts in mind, I'll have another go :-) Cheers Mark ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PATCHES] Typo in pl_funcs.c comment
Thanks, applied. --- Michael Fuhr wrote: pl_funcs.c has a comment that refers to functins, which I'm assuming should be functions. The attached patch corrects the spelling. -- Michael Fuhr http://www.fuhr.org/~mfuhr/ [ Attachment, skipping... ] ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq -- Bruce Momjian| http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup.| Newtown Square, Pennsylvania 19073 ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
[PATCHES] Python setof patch
Hi, This patch allows the PL/Python module to do (SRF) functions. The patch was taken from the CVS version. I have modified the plpython.c file and have added a test sql script for testing the functionality. It was actually the script that was in the 8.0.3 version but have since been removed. In order to signal the end of a set, the called python function must simply return plpy.EndOfSet and the set would be returned. Please feel free to email me if you experience any problems. My next project I am working in is to get the PL/Python module to return records (ie. structured data) Regards Gerrit van Dyk AgileWorks (Pty) Ltd diff -c -r -P orig/pgsql/src/pl/plpython/plpython.c new/pgsql/src/pl/plpython/plpython.c *** orig/pgsql/src/pl/plpython/plpython.c 2005-06-15 10:55:02.0 +0200 --- new/pgsql/src/pl/plpython/plpython.c2005-06-14 14:12:01.0 +0200 *** *** 286,291 --- 286,294 static PyObject *PLy_exc_fatal = NULL; static PyObject *PLy_exc_spi_error = NULL; + /* End-of-set Indication */ + static PyObject *PLy_endofset = NULL; + /* some globals for the python module */ static char PLy_plan_doc[] = { *** *** 770,775 --- 773,788 fcinfo-isnull = true; rv = (Datum) NULL; } + /* test for end-of-set condition */ + else if (fcinfo-flinfo-fn_retset plrv == PLy_endofset) + { + ReturnSetInfo *rsi; + + fcinfo-isnull = true; + rv = (Datum)NULL; + rsi = (ReturnSetInfo *)fcinfo-resultinfo; + rsi-isDone = ExprEndResult; + } else { fcinfo-isnull = false; *** *** 2317,2325 --- 2330,2340 PLy_exc_error = PyErr_NewException(plpy.Error, NULL, NULL); PLy_exc_fatal = PyErr_NewException(plpy.Fatal, NULL, NULL); PLy_exc_spi_error = PyErr_NewException(plpy.SPIError, NULL, NULL); + PLy_endofset = PyErr_NewException(plpy.EndOfSet,NULL,NULL); PyDict_SetItemString(plpy_dict, Error, PLy_exc_error); PyDict_SetItemString(plpy_dict, Fatal, PLy_exc_fatal); PyDict_SetItemString(plpy_dict, SPIError, PLy_exc_spi_error); + PyDict_SetItemString(plpy_dict, EndOfSet, PLy_endofset); /* * initialize main module, and add plpy diff -c -r -P orig/pgsql/src/pl/plpython/sql/plpython_setof.sql new/pgsql/src/pl/plpython/sql/plpython_setof.sql *** orig/pgsql/src/pl/plpython/sql/plpython_setof.sql 1970-01-01 02:00:00.0 +0200 --- new/pgsql/src/pl/plpython/sql/plpython_setof.sql2005-06-14 14:12:01.0 +0200 *** *** 0 --- 1,12 + + CREATE or replace FUNCTION test_setof() returns setof text + AS + 'if GD.has_key(calls): + GD[calls] = GD[calls] + 1 + if GD[calls] 2: + del GD[calls] + return plpy.EndOfSet + else: + GD[calls] = 1 + return str(GD[calls])' + LANGUAGE plpythonu; ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PATCHES] Add PG version number to NLS files
Hi! Bruce Momjian [2005-06-15 15:26 -0400]: Peter Eisentraut wrote: Bruce Momjian wrote: The 'bind' calles in the binaries are going to look for the proper version. Does that help, or is libpq the only thing we need to handle? Shared libraries have their version number embedded in the file name for the explicit purpose of installing more than one version side by side. Therefore, the PO files also need to make arrangements for side by side installation. No such promises are made for non-library files. OK, seems we need examples of how our current setup fails, both for libpq and binaries. Debian's and Ubuntu's packages currently look like this (binary - translation domain): /usr/lib/libpq3.so: libpq3 /usr/lib/libpq4.so: libpq4 /usr/lib/postgresql/7.4/bin/postmaster: postgres-7.4 /usr/lib/postgresql/8.0/bin/postmaster: postgres-8.0 /usr/lib/postgresql/7.4/bin/psql: psql-7.4 /usr/lib/postgresql/8.0/bin/psql: psql-8.0 [similar for all other client binaries) Without my domain patches, i. e. with upstream's scheme it would look like: /usr/lib/libpq3.so: libpq /usr/lib/libpq4.so: libpq /usr/lib/postgresql/7.4/bin/postmaster: postgres /usr/lib/postgresql/8.0/bin/postmaster: postgres As you see, there is a conflict, so to be able to install both pacakges at the same time, one translation version had to be picked and stuffed into a separate -translations package, which both versions depend on. That's ugly and would cause one version to have some missing translations, therefore I patched the domains to be version-specific. I do think that adopting my scheme at least for libpq really makes sense. Adopting it for the binaries would not do any harm, but it might not be conformant to your packaging policy, which I don't really know. The clashing domains are the only things that prevents the parallel installation of different major versions, so the question whether or not you want to adopt it mainly boils down to whether you want to support parallel server installations. I am grateful that you did the SONAME change upstream, that relieved Debian and probably other vendors of much pain. OTOH, the Debian specific translation domain changes are really simple and don't impose a significant maintenance overhead, so there is no urgency at all. Thanks and have a nice day! Martin -- Martin Pitthttp://www.piware.de Ubuntu Developer http://www.ubuntu.com Debian Developer http://www.debian.org signature.asc Description: Digital signature