Re: [HACKERS] strncpy is not a safe version of strcpy
On Thu, Aug 14, 2014 at 02:50:02AM -0400, Tom Lane wrote: Noah Misch n...@leadboat.com writes: On Wed, Aug 13, 2014 at 10:21:50AM -0400, Tom Lane wrote: I believe that we deal with this by the expedient of checking the lengths of tablespace paths in advance, when the tablespace is created. The files under scrutiny here are not located in a tablespace. Even if they were, isn't the length of $PGDATA/pg_tblspc the important factor? The length of $PGDATA is of no relevance whatsoever; we chdir into that directory at startup, and subsequently all paths are implicitly relative to there. If there is any backend code that's prepending $PGDATA to something else, it's wrong to start with. Ah; quite right. -- Noah Misch EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Support for N synchronous standby servers
On Thu, Aug 14, 2014 at 8:34 PM, Fujii Masao masao.fu...@gmail.com wrote: +At any one time there will be at a number of active synchronous standbys +defined by varnamesynchronous_standby_num/; transactions waiting It's better to use xref linkend=guc-synchronous-standby-num, instead. Fixed. +for commit will be allowed to proceed after those standby servers +confirms receipt of their data. The synchronous standbys will be Typo: confirms - confirm Fixed. + para +Specifies the number of standbys that support +firsttermsynchronous replication/, as described in +xref linkend=synchronous-replication, and listed as the first +elements of xref linkend=guc-synchronous-standby-names. + /para + para +Default value is 1. + /para synchronous_standby_num is defined with PGC_SIGHUP. So the following should be added into the document. This parameter can only be set in the postgresql.conf file or on the server command line. Fixed. The name of the parameter synchronous_standby_num sounds to me that the transaction must wait for its WAL to be replicated to s_s_num standbys. But that's not true in your patch. If s_s_names is empty, replication works asynchronously whether the value of s_s_num is. I'm afraid that it's confusing. The description of s_s_num is not sufficient. I'm afraid that users can easily misunderstand that they can use quorum commit feature by using s_s_names and s_s_num. That is, the transaction waits for its WAL to be replicated to any s_s_num standbys listed in s_s_names. I reworked the docs to mention all that. Yes things are a bit different than any quorum commit facility (how to parametrize that simply without a parameter mapping one to one the items of s_s_names?), as this facility relies on the order of the items of s_s_names and the fact that stansbys are connected at a given time. When s_s_num is set to larger value than max_wal_senders, we should warn that? Actually I have done a bit more than that by forbidding setting s_s_num to a value higher than max_wal_senders. Thoughts? Now that we discuss the interactions with other parameters. Another thing that I am wondering about now is: what should we do if we specify s_s_num to a number higher than the elements in s_s_names? Currently, the patch gives the priority to s_s_num, in short if we set s_s_num to 100, server will wait for 100 servers to confirm commit even if there are less than 100 elements in s_s_names. I chose this way because it looks saner particularly if s_s_names = '*'. Thoughts once again? +for (i = 0; i num_sync; i++) +{ +volatile WalSnd *walsndloc = WalSndCtl-walsnds[sync_standbys[i]]; + +if (min_write_pos walsndloc-write) +min_write_pos = walsndloc-write; +if (min_flush_pos walsndloc-flush) +min_flush_pos = walsndloc-flush; +} I don't think that it's safe to see those shared values without spinlock. Looking at walsender.c you are right. I have updated the code to use the mutex lock of the walsender whose values are being read from. Regards, -- Michael On Thu, Aug 14, 2014 at 4:34 AM, Fujii Masao masao.fu...@gmail.com wrote: On Wed, Aug 13, 2014 at 4:10 PM, Michael Paquier michael.paqu...@gmail.com wrote: On Wed, Aug 13, 2014 at 2:10 PM, Fujii Masao masao.fu...@gmail.com wrote: I sent the SIGSTOP signal to the walreceiver process in one of sync standbys, and then ran write transactions again. In this case, they must not be completed because their WAL cannot be replicated to the standby that its walreceiver was stopped. But they were successfully completed. At the end of SyncRepReleaseWaiters, SYNC_REP_WAIT_WRITE and SYNC_REP_WAIT_FLUSH in walsndctl were able to update with only one wal sender in sync, making backends wake up even if other standbys did not catch up. But we need to scan all the synchronous wal senders and find the minimum write and flush positions and update walsndctl with those values. Well that's a code path I forgot to cover. Attached is an updated patch fixing the problem you reported. +At any one time there will be at a number of active synchronous standbys +defined by varnamesynchronous_standby_num/; transactions waiting It's better to use xref linkend=guc-synchronous-standby-num, instead. +for commit will be allowed to proceed after those standby servers +confirms receipt of their data. The synchronous standbys will be Typo: confirms - confirm + para +Specifies the number of standbys that support +firsttermsynchronous replication/, as described in +xref linkend=synchronous-replication, and listed as the first +elements of xref linkend=guc-synchronous-standby-names. + /para + para +Default value is 1. + /para synchronous_standby_num is defined with
Re: [HACKERS] Minmax indexes
On 08/15/2014 02:02 AM, Alvaro Herrera wrote: Alvaro Herrera wrote: Heikki Linnakangas wrote: I'm sure this still needs some cleanup, but here's the patch, based on your v14. Now that I know what this approach looks like, I still like it much better. The insert and update code is somewhat more complicated, because you have to be careful to lock the old page, new page, and revmap page in the right order. But it's not too bad, and it gets rid of all the complexity in vacuum. It seems there is some issue here, because pageinspect tells me the index is not growing properly for some reason. minmax_revmap_data gives me this array of TIDs after a bunch of insert/vacuum/delete/ etc: I fixed this issue, and did a lot more rework and bugfixing. Here's v15, based on v14-heikki2. Thanks! I think remaining issues are mostly minimal (pageinspect should output block number alongside each tuple, now that we have it, for example.) There's this one issue I left in my patch version that I think we should do something about: + /* +* No luck. Assume that the revmap was updated concurrently. +* +* XXX: it would be nice to add some kind of a sanity check here to +* avoid looping infinitely, if the revmap points to wrong tuple for +* some reason. +*/ This happens when we follow the revmap to a tuple, but find that the tuple points to a different block than what the revmap claimed. Currently, we just assume that it's because the tuple was updated concurrently, but while hacking, I frequently had a broken index where the revmap pointed to bogus tuples or the tuples had a missing/wrong block number on them, and ran into infinite loop here. It's clearly a case of a corrupt index and shouldn't happen, but I would imagine that it's a fairly typical way this would fail in production too because of hardware issues or bugs. So I think we need to work a bit harder to stop the looping and throw an error instead. Perhaps something as simple as keeping a loop counter and giving up after 1000 attempts would be good enough. The window between releasing the lock on the revmap, and acquiring the lock on the page containing the MMTuple is very narrow, so the chances of losing that race to a concurrent update more than 1-2 times in a row is vanishingly small. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] option -T in pg_basebackup doesn't work on windows
Thank you. The code looks correct. I confirmed that the pg_basebackup could relocate the tablespace directory on Windows. I marked this patch as ready for committer. Regards MauMau -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Removing dependency to wsock32.lib when compiling code on WIndows
On Mon, Aug 11, 2014 at 10:48 PM, Noah Misch n...@leadboat.com wrote: Consistency with the MSVC build is desirable, either HAVE_GETADDRINFO for both or !HAVE_GETADDRINFO for both. Hm. Looking here getattrinfo has been added in ws2_32 and was not present in wsock32: http://msdn.microsoft.com/en-us/library/windows/desktop/ms738520%28v=vs.85%29.aspx And this change in configure.in is the root cause of the regression: -AC_SEARCH_LIBS(socket, [socket wsock32]) +AC_SEARCH_LIBS(socket, [socket ws2_32]) Btw, how do you determine if MSVC is using HAVE_GETADDRINFO? Is it decided by the inclusion of getaddrinfo.c in @pgportfiles of Mkvdbuild.pm? -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Minmax indexes
On 08/15/2014 10:26 AM, Heikki Linnakangas wrote: On 08/15/2014 02:02 AM, Alvaro Herrera wrote: Alvaro Herrera wrote: Heikki Linnakangas wrote: I'm sure this still needs some cleanup, but here's the patch, based on your v14. Now that I know what this approach looks like, I still like it much better. The insert and update code is somewhat more complicated, because you have to be careful to lock the old page, new page, and revmap page in the right order. But it's not too bad, and it gets rid of all the complexity in vacuum. It seems there is some issue here, because pageinspect tells me the index is not growing properly for some reason. minmax_revmap_data gives me this array of TIDs after a bunch of insert/vacuum/delete/ etc: I fixed this issue, and did a lot more rework and bugfixing. Here's v15, based on v14-heikki2. Thanks! I think remaining issues are mostly minimal (pageinspect should output block number alongside each tuple, now that we have it, for example.) There's this one issue I left in my patch version that I think we should do something about: + /* +* No luck. Assume that the revmap was updated concurrently. +* +* XXX: it would be nice to add some kind of a sanity check here to +* avoid looping infinitely, if the revmap points to wrong tuple for +* some reason. +*/ This happens when we follow the revmap to a tuple, but find that the tuple points to a different block than what the revmap claimed. Currently, we just assume that it's because the tuple was updated concurrently, but while hacking, I frequently had a broken index where the revmap pointed to bogus tuples or the tuples had a missing/wrong block number on them, and ran into infinite loop here. It's clearly a case of a corrupt index and shouldn't happen, but I would imagine that it's a fairly typical way this would fail in production too because of hardware issues or bugs. So I think we need to work a bit harder to stop the looping and throw an error instead. Perhaps something as simple as keeping a loop counter and giving up after 1000 attempts would be good enough. The window between releasing the lock on the revmap, and acquiring the lock on the page containing the MMTuple is very narrow, so the chances of losing that race to a concurrent update more than 1-2 times in a row is vanishingly small. Reading the patch more closely, I see that you added a check that when we loop, we throw an error if the new item pointer in the revmap is the same as before. In theory, it's possible that two concurrent updates happen: one that moves the tuple we're looking for elsewhere, and another that moves it back again. The probability of that is also vanishingly small, so maybe that's OK. Or we could check the LSN; if the revmap has been updated, its LSN must've changed. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_receivexlog and replication slots
Actually I came up with the same need as Magnus, but a bit later, so... Attached is a patch to support physical slot creation and drop in pg_receivexlog with the addition of new options --create and --drop. It would be nice to have that in 9.4, but I will not the one deciding that at the end :) Code has been refactored with what is already available in pg_recvlogical for the slot creation and drop. I think that create/drop options of a slot is unnecessary. It is not different from performing pg_create_physical_replication_slot() by psql. At consistency with pg_recvlogical, do you think about --start? Initial review. The patch was not able to be applied to the source file at this morning time. commit-id:5ff5bfb5f0d83a538766903275b230499fa9ebe1 [postgres postgresql-5ff5bfb]$ patch -p1 20140812_physical_slot_receivexlog.patch patching file doc/src/sgml/ref/pg_receivexlog.sgml patching file src/bin/pg_basebackup/pg_receivexlog.c Hunk #2 FAILED at 80. 1 out of 7 hunks FAILED -- saving rejects to file src/bin/pg_basebackup/pg_receivexlog.c.rej patching file src/bin/pg_basebackup/pg_recvlogical.c patching file src/bin/pg_basebackup/streamutil.c patching file src/bin/pg_basebackup/streamutil.h The patch was applied to the source file before August 12. warning comes out then make. commit-id:6aa61580e08d58909b2a8845a4087b7699335ee0 [postgres postgresql-6aa6158]$ make /dev/null streamutil.c: In function ‘CreateReplicationSlot’: streamutil.c:244: warning: suggest parentheses around ‘’ within ‘||’ Regards, -- Furuya Osamu -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_shmem_allocations view
Hi On Thu, May 8, 2014 at 4:28 PM, Andres Freund and...@2ndquadrant.com wrote: Ok. A new version of the patches implementing that are attached. Including a couple of small fixups and docs. The latter aren't extensive, but that doesn't seem to be warranted anyway. Is it really actually useful to expose the segment off(set) to users? Seems to me like unnecessary internal details leaking out. Regards, Marti -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_receivexlog and replication slots
Thanks for your review. On Fri, Aug 15, 2014 at 12:56 AM, furu...@pm.nttdata.co.jp wrote: At consistency with pg_recvlogical, do you think about --start? I did not add that for the sake of backward-compatibility as in pg_recvlogical an action is mandatory. It is not the case now of pg_receivexlog. [postgres postgresql-6aa6158]$ make /dev/null streamutil.c: In function 'CreateReplicationSlot': streamutil.c:244: warning: suggest parentheses around '' within '||' I see. Here is a rebased patch. Regards, -- Michael From 18e754569c1fd97f4346be935341b27e228775a4 Mon Sep 17 00:00:00 2001 From: Michael Paquier mich...@otacoo.com Date: Fri, 15 Aug 2014 17:15:31 +0900 Subject: [PATCH] Add support for physical slot creation/deletion in pg_receivexlog Physical slot creation can be done with --create and drop with --drop. In both cases --slot is needed. Code for replication slot creation and drop is refactored with what was available in pg_recvlogical. --- doc/src/sgml/ref/pg_receivexlog.sgml | 29 ++ src/bin/pg_basebackup/pg_receivexlog.c | 73 + src/bin/pg_basebackup/pg_recvlogical.c | 66 +++ src/bin/pg_basebackup/streamutil.c | 97 ++ src/bin/pg_basebackup/streamutil.h | 7 +++ 5 files changed, 203 insertions(+), 69 deletions(-) diff --git a/doc/src/sgml/ref/pg_receivexlog.sgml b/doc/src/sgml/ref/pg_receivexlog.sgml index 5916b8f..7e46005 100644 --- a/doc/src/sgml/ref/pg_receivexlog.sgml +++ b/doc/src/sgml/ref/pg_receivexlog.sgml @@ -72,6 +72,35 @@ PostgreSQL documentation titleOptions/title para +applicationpg_receivexlog/application can run in one of two following +modes, which control physical replication slot: + +variablelist + + varlistentry + termoption--create/option/term + listitem + para +Create a new physical replication slot with the name specified in +option--slot/option, then exit. + /para + /listitem + /varlistentry + + varlistentry + termoption--drop/option/term + listitem + para +Drop the replication slot with the name specified +in option--slot/option, then exit. + /para + /listitem + /varlistentry +/variablelist + + /para + + para The following command-line options control the location and format of the output. diff --git a/src/bin/pg_basebackup/pg_receivexlog.c b/src/bin/pg_basebackup/pg_receivexlog.c index a8b9ad3..98d034a 100644 --- a/src/bin/pg_basebackup/pg_receivexlog.c +++ b/src/bin/pg_basebackup/pg_receivexlog.c @@ -38,6 +38,8 @@ static int noloop = 0; static int standby_message_timeout = 10 * 1000; /* 10 sec = default */ static int fsync_interval = 0; /* 0 = default */ static volatile bool time_to_abort = false; +static bool do_create_slot = false; +static bool do_drop_slot = false; static void usage(void); @@ -78,6 +80,9 @@ usage(void) printf(_( -w, --no-password never prompt for password\n)); printf(_( -W, --password force password prompt (should happen automatically)\n)); printf(_( -S, --slot=SLOTNAMEreplication slot to use\n)); + printf(_(\nOptional actions:\n)); + printf(_( --create create a new replication slot (for the slot's name see --slot)\n)); + printf(_( --drop drop the replication slot (for the slot's name see --slot)\n)); printf(_(\nReport bugs to pgsql-b...@postgresql.org.\n)); } @@ -261,14 +266,6 @@ StreamLog(void) uint32 hi, lo; - /* - * Connect in replication mode to the server - */ - conn = GetConnection(); - if (!conn) - /* Error message already written in GetConnection() */ - return; - if (!CheckServerVersionForStreaming(conn)) { /* @@ -370,6 +367,9 @@ main(int argc, char **argv) {status-interval, required_argument, NULL, 's'}, {slot, required_argument, NULL, 'S'}, {verbose, no_argument, NULL, 'v'}, +/* action */ + {create, no_argument, NULL, 1}, + {drop, no_argument, NULL, 2}, {NULL, 0, NULL, 0} }; @@ -453,6 +453,13 @@ main(int argc, char **argv) case 'v': verbose++; break; +/* action */ + case 1: +do_create_slot = true; +break; + case 2: +do_drop_slot = true; +break; default: /* @@ -477,10 +484,26 @@ main(int argc, char **argv) exit(1); } + if (replication_slot == NULL (do_drop_slot || do_create_slot)) + { + fprintf(stderr, _(%s: replication slot needed with action --create or --drop\n), progname); + fprintf(stderr, _(Try \%s --help\ for more information.\n), +progname); + exit(1); + } + + if (do_drop_slot do_create_slot) + { + fprintf(stderr, _(%s: cannot use --create together with --drop\n), progname); + fprintf(stderr, _(Try \%s --help\ for more information.\n), +progname); + exit(1); + } + /* * Required arguments */ - if (basedir == NULL) + if (basedir == NULL !do_create_slot !do_drop_slot) { fprintf(stderr, _(%s: no target
Re: [HACKERS] pg_shmem_allocations view
On 2014-08-15 11:12:11 +0300, Marti Raudsepp wrote: Hi On Thu, May 8, 2014 at 4:28 PM, Andres Freund and...@2ndquadrant.com wrote: Ok. A new version of the patches implementing that are attached. Including a couple of small fixups and docs. The latter aren't extensive, but that doesn't seem to be warranted anyway. Is it really actually useful to expose the segment off(set) to users? Seems to me like unnecessary internal details leaking out. Yes. This is clearly developer oriented and I'd more than once wished I could see where some stray pointer is pointing to... That's not really possible without something like this. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_shmem_allocations view
On 2014-08-14 22:16:31 -0700, Michael Paquier wrote: And here are some comments about patch 2: - Patch applies with some hunks. - Some typos are present s#memory segments..#memory segments. (double dots) s#NULL#literalNULL/ (in the docs as this refers to a value) Will fix. - Your thoughts about providing separate patches for each view? What this patch does is straight-forward, but pg_shmem_allocations does not actually depend on the first patch adding size and name to the dsm fields. So IMO it makes sense to separate each feature properly. I don't know, seems a bit like busywork to me. Postgres doesn't really very granular commits... - off should be renamed to offset for pg_get_shmem_allocations. ok. - Is it really worth showing unused shared memory? I'd rather rip out the last portion of pg_get_shmem_allocations. It's actually really helpful. There's a couple situations where you possibly can run out of that spare memory and into troubles. Which currently aren't diagnosable. Similarly we currently can't diagnose whether we're superflously allocate too much 'reserve' shared memory. - For refcnt in pg_get_dynamic_shmem_allocations, could you add a comment mentioning that refcnt = 1 means that the item is moribund and 0 is unused, and that reference count for active dsm segments only begins from 2? I would imagine that this is enough, instead of using some define's defining the ID from which a dsm item is considered as active. Ok. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Reporting the commit LSN at commit time
On 2014-08-14 12:21:38 -0400, Robert Haas wrote: On Sat, Aug 9, 2014 at 12:54 PM, Andres Freund and...@2ndquadrant.com wrote: I don't think we really need to embed it at that level. And it doesn't have to be always on - only clients that ask for it need to get the answer. Something like COMMIT WITH (report_commit_lsn ON); or similar might do the trick? And what does that actually do? Send back a result-set, or a new protocol message? What I was thinking of was to return COMMIT X/X instead of COMMIT. Since that's only sent when COMMIT WITH (report_commit_lsn ON) was set it won't break clients/libraries that don't need it. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pgcrypto: PGP armor headers
Hi, On 8/8/14 3:18 PM, I wrote: Currently there's no way to generate or extract armor headers from the PGP armored format in pgcrypto. I've written a patch to add the support. Latest version of the patch here, having fixed some small coding issues. .marko *** a/contrib/pgcrypto/Makefile --- b/contrib/pgcrypto/Makefile *** *** 26,32 MODULE_big = pgcrypto OBJS = $(SRCS:.c=.o) $(WIN32RES) EXTENSION = pgcrypto ! DATA = pgcrypto--1.1.sql pgcrypto--1.0--1.1.sql pgcrypto--unpackaged--1.0.sql PGFILEDESC = pgcrypto - cryptographic functions REGRESS = init md5 sha1 hmac-md5 hmac-sha1 blowfish rijndael \ --- 26,32 OBJS = $(SRCS:.c=.o) $(WIN32RES) EXTENSION = pgcrypto ! DATA = pgcrypto--1.2.sql pgcrypto--1.1--1.2.sql pgcrypto--1.0--1.1.sql pgcrypto--unpackaged--1.0.sql PGFILEDESC = pgcrypto - cryptographic functions REGRESS = init md5 sha1 hmac-md5 hmac-sha1 blowfish rijndael \ *** a/contrib/pgcrypto/expected/pgp-armor.out --- b/contrib/pgcrypto/expected/pgp-armor.out *** *** 102,104 em9va2E= --- 102,362 -END PGP MESSAGE- '); ERROR: Corrupt ascii-armor + -- corrupt + select pgp_armor_header(' + -BEGIN PGP MESSAGE- + foo: + + em9va2E= + = + -END PGP MESSAGE- + ', 'foo'); + ERROR: Corrupt ascii-armor + -- empty + select pgp_armor_header(' + -BEGIN PGP MESSAGE- + foo: + + em9va2E= + = + -END PGP MESSAGE- + ', 'foo'); + pgp_armor_header + -- + + (1 row) + + -- simple + select pgp_armor_header(' + -BEGIN PGP MESSAGE- + foo: bar + + em9va2E= + = + -END PGP MESSAGE- + ', 'foo'); + pgp_armor_header + -- + bar + (1 row) + + -- uninteresting keys, part 1 + select pgp_armor_header(' + -BEGIN PGP MESSAGE- + foo: found + bar: ignored + + em9va2E= + = + -END PGP MESSAGE- + ', 'foo'); + pgp_armor_header + -- + found + (1 row) + + -- uninteresting keys, part 2 + select pgp_armor_header(' + -BEGIN PGP MESSAGE- + bar: ignored + foo: found + + em9va2E= + = + -END PGP MESSAGE- + ', 'foo'); + pgp_armor_header + -- + found + (1 row) + + -- uninteresting keys, part 3 + select pgp_armor_header(' + -BEGIN PGP MESSAGE- + bar: ignored + foo: found + bar: ignored + + em9va2E= + = + -END PGP MESSAGE- + ', 'foo'); + pgp_armor_header + -- + found + (1 row) + + -- insane keys, part 1 + select pgp_armor_header(' + -BEGIN PGP MESSAGE- + insane:key : + + em9va2E= + = + -END PGP MESSAGE- + ', 'insane:key '); + pgp_armor_header + -- + + (1 row) + + -- insane keys, part 2 + select pgp_armor_header(' + -BEGIN PGP MESSAGE- + insane:key : text value here + + em9va2E= + = + -END PGP MESSAGE- + ', 'insane:key '); + pgp_armor_header + -- + text value here + (1 row) + + -- long value + select pgp_armor_header(' + -BEGIN PGP MESSAGE- + long: this value is more than 76 characters long, but it should still parse correctly as that''s permitted by RFC 4880 + + em9va2E= + = + -END PGP MESSAGE- + ', 'long'); + pgp_armor_header + - + this value is more than 76 characters long, but it should still parse correctly as that's permitted by RFC 4880 + (1 row) + + -- long value, split up + select pgp_armor_header(' + -BEGIN PGP MESSAGE- + long: this value is more than 76 characters long, but it should still + long: parse correctly as that''s permitted by RFC 4880 + + em9va2E= + = + -END PGP MESSAGE- + ', 'long'); + pgp_armor_header + - + this value is more than 76 characters long, but it should still parse correctly as that's permitted by RFC 4880 + (1 row) + + -- long value, split up, part 2 + select pgp_armor_header(' + -BEGIN PGP MESSAGE- + long: this value is more than + long: 76 characters long, but it should still + long: parse correctly as that''s permitted by RFC 4880 + + em9va2E= + = + -END PGP MESSAGE- + ', 'long'); + pgp_armor_header + - + this value is more than 76 characters long, but it should still parse correctly as that's permitted by RFC 4880 + (1 row) + + -- long value, split up, part 3 + select pgp_armor_header(' + -BEGIN PGP
Re: [HACKERS] Unwanted LOG during recovery of DROP TABLESPACE REDO
On 7/16/14 4:33 PM, Tom Lane wrote: Rajeev rastogi rajeev.rast...@huawei.com writes: I found and fixed a bug that causes recovery (crash recovery , PITR) to throw unwanted LOG message if the tablespace symlink is not found during the processing of DROP TABLESPACE redo. LOG: could not remove symbolic link pg_tblspc/16384: No such file or directory I don't think that's a bug: it's the designed behavior. Why should we complicate the code to not print a log message in a situation where it's unclear if the case is expected or not? I agree with Tom here; this doesn't seem like an improvement. .marko -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] pgbench --tuple-size option
After publishing some test results with pgbench on SSD with varying page size, Josh Berkus pointed out that pgbench uses small 100-bytes tuples, and that results may be different with other tuple sizes. This patch adds an option to change the default tuple size, so that this can be tested easily. -- Fabien.diff --git a/contrib/pgbench/pgbench.c b/contrib/pgbench/pgbench.c index 2f7d80e..709022c 100644 --- a/contrib/pgbench/pgbench.c +++ b/contrib/pgbench/pgbench.c @@ -119,6 +119,10 @@ int scale = 1; */ int fillfactor = 100; +/* approximate number of bytes for rows + */ +int tupsize = 100; + /* * create foreign key constraints on the tables? */ @@ -359,6 +363,7 @@ usage(void) create indexes in the specified tablespace\n --tablespace=TABLESPACE create tables in the specified tablespace\n --unlogged-tablescreate tables as unlogged tables\n + --tuple-size=NUM target tuple size (default: 100)\n \nBenchmarking options:\n -c, --client=NUM number of concurrent database clients (default: 1)\n -C, --connectestablish new connection for each transaction\n @@ -1704,32 +1709,37 @@ init(bool is_no_vacuum) const char *table; /* table name */ const char *smcols; /* column decls if accountIDs are 32 bits */ const char *bigcols; /* column decls if accountIDs are 64 bits */ - int declare_fillfactor; + int declare_fillfactor; /* whether to add a fillfactor */ + int tupsize_correction; /* tupsize correction */ }; static const struct ddlinfo DDLs[] = { { pgbench_history, - tid int,bid int,aidint,delta int,mtime timestamp,filler char(22), - tid int,bid int,aid bigint,delta int,mtime timestamp,filler char(22), - 0 + tid int,bid int,aidint,delta int,mtime timestamp,filler char, + tid int,bid int,aid bigint,delta int,mtime timestamp,filler char, + 0, + 78 }, { pgbench_tellers, - tid int not null,bid int,tbalance int,filler char(84), - tid int not null,bid int,tbalance int,filler char(84), - 1 + tid int not null,bid int,tbalance int,filler char, + tid int not null,bid int,tbalance int,filler char, + 1, + 16 }, { pgbench_accounts, - aidint not null,bid int,abalance int,filler char(84), - aid bigint not null,bid int,abalance int,filler char(84), - 1 + aidint not null,bid int,abalance int,filler char, + aid bigint not null,bid int,abalance int,filler char, + 1, + 16 }, { pgbench_branches, - bid int not null,bbalance int,filler char(88), - bid int not null,bbalance int,filler char(88), - 1 + bid int not null,bbalance int,filler char, + bid int not null,bbalance int,filler char, + 1, + 12 } }; static const char *const DDLINDEXes[] = { @@ -1767,6 +1777,9 @@ init(bool is_no_vacuum) char buffer[256]; const struct ddlinfo *ddl = DDLs[i]; const char *cols; + int ts = tupsize - ddl-tupsize_correction; + + if (ts 1) ts = 1; /* Remove old table, if it exists. */ snprintf(buffer, sizeof(buffer), drop table if exists %s, ddl-table); @@ -1790,9 +1803,9 @@ init(bool is_no_vacuum) cols = (scale = SCALE_32BIT_THRESHOLD) ? ddl-bigcols : ddl-smcols; - snprintf(buffer, sizeof(buffer), create%s table %s(%s)%s, + snprintf(buffer, sizeof(buffer), create%s table %s(%s(%d))%s, unlogged_tables ? unlogged : , - ddl-table, cols, opts); + ddl-table, cols, ts, opts); executeStatement(con, buffer); } @@ -2504,6 +2517,7 @@ main(int argc, char **argv) {unlogged-tables, no_argument, unlogged_tables, 1}, {sampling-rate, required_argument, NULL, 4}, {aggregate-interval, required_argument, NULL, 5}, + {tuple-size, required_argument, NULL, 6}, {rate, required_argument, NULL, 'R'}, {NULL, 0, NULL, 0} }; @@ -2811,6 +2825,15 @@ main(int argc, char **argv) } #endif break; + case 6: +initialization_option_set = true; +tupsize = atoi(optarg); +if (tupsize = 0) +{ + fprintf(stderr, invalid tuple size: %d\n, tupsize); + exit(1); +} +break; default: fprintf(stderr, _(Try \%s --help\ for more information.\n), progname); exit(1); diff --git a/doc/src/sgml/pgbench.sgml b/doc/src/sgml/pgbench.sgml index 23bfa9e..e6210e7 100644 --- a/doc/src/sgml/pgbench.sgml +++ b/doc/src/sgml/pgbench.sgml @@ -515,6 +515,15 @@ pgbench optional replaceableoptions/ /optional replaceabledbname/ /varlistentry varlistentry + termoption--tuple-size=/optionreplaceablesize//term + listitem + para +Set target size of tuples. Default is 100 bytes. + /para + /listitem + /varlistentry + + varlistentry termoption-v/option/term termoption--vacuum-all/option/term listitem -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pgbench --tuple-size option
On 2014-08-15 11:46:52 +0200, Fabien COELHO wrote: After publishing some test results with pgbench on SSD with varying page size, Josh Berkus pointed out that pgbench uses small 100-bytes tuples, and that results may be different with other tuple sizes. This patch adds an option to change the default tuple size, so that this can be tested easily. I don't think it's beneficial to put this into pgbench. There really isn't a relevant benefit over using a custom script here. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pgbench --tuple-size option
Hello Andres, This patch adds an option to change the default tuple size, so that this can be tested easily. I don't think it's beneficial to put this into pgbench. There really isn't a relevant benefit over using a custom script here. The scripts to run are the standard ones. The difference is in the *initialization* phase (-i), namely the filler attribute size. There is no custom script for initialization in pgbench, so ISTM that this argument does not apply here. -- Fabien. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pgbench --tuple-size option
On 2014-08-15 11:58:41 +0200, Fabien COELHO wrote: Hello Andres, This patch adds an option to change the default tuple size, so that this can be tested easily. I don't think it's beneficial to put this into pgbench. There really isn't a relevant benefit over using a custom script here. The scripts to run are the standard ones. The difference is in the *initialization* phase (-i), namely the filler attribute size. There is no custom script for initialization in pgbench, so ISTM that this argument does not apply here. The custom initialization is to run a manual ALTER after the initialization. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pgbench --tuple-size option
I don't think it's beneficial to put this into pgbench. There really isn't a relevant benefit over using a custom script here. The scripts to run are the standard ones. The difference is in the *initialization* phase (-i), namely the filler attribute size. There is no custom script for initialization in pgbench, so ISTM that this argument does not apply here. The custom initialization is to run a manual ALTER after the initialization. Sure, it can be done this way. I'm not sure about the implication of ALTER on the table storage, thus I prefer all benchmarks to run exactly the same straightforward way in all cases so as to avoid unwanted effects on what I'm trying to measure, which is already noisy and unstable enough. -- Fabien. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pgbench --tuple-size option
On 2014-08-15 12:17:31 +0200, Fabien COELHO wrote: I don't think it's beneficial to put this into pgbench. There really isn't a relevant benefit over using a custom script here. The scripts to run are the standard ones. The difference is in the *initialization* phase (-i), namely the filler attribute size. There is no custom script for initialization in pgbench, so ISTM that this argument does not apply here. The custom initialization is to run a manual ALTER after the initialization. Sure, it can be done this way. I'm not sure about the implication of ALTER on the table storage, Should be fine in this case. But if that's what you're concerned about - understandably - it seems to make more sense to split -i into two. One to create the tables, and another to fill them. That'd allow to do manual stuff inbetween. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Function to know last log write timestamp
On Fri, Aug 15, 2014 at 3:40 AM, Andres Freund and...@2ndquadrant.com wrote: On 2014-08-14 14:37:22 -0400, Robert Haas wrote: On Thu, Aug 14, 2014 at 2:21 PM, Andres Freund and...@2ndquadrant.com wrote: On 2014-08-14 14:19:13 -0400, Robert Haas wrote: That's about the idea. However, what you've got there is actually unsafe, because shmem-counter++ is not an atomic operation. It reads the counter (possibly even as two separate 4-byte loads if the counter is an 8-byte value), increments it inside the CPU, and then writes the resulting value back to memory. If two backends do this concurrently, one of the updates might be lost. All these are only written by one backend, so it should be safe. Note that that coding pattern, just without memory barriers, is all over pgstat.c Ah, OK. If there's a separate slot for each backend, I agree that it's safe. We should probably add barriers to pgstat.c, too. Yea, definitely. I think this is rather borked on weaker architectures. It's just that the consequences of an out of date/torn value are rather low, so it's unlikely to be noticed. Imo we should encapsulate the changecount modifications/checks somehow instead of repeating the barriers, Asserts, comments et al everywhere. So what about applying the attached patch first, which adds the macros to load and store the changecount with the memory barries, and changes pgstat.c use them. Maybe this patch needs to be back-patch to at least 9.4? After applying the patch, I will rebase the pg_last_xact_insert_timestamp patch and post it again. Regards, -- Fujii Masao *** a/src/backend/postmaster/pgstat.c --- b/src/backend/postmaster/pgstat.c *** *** 2563,2569 pgstat_bestart(void) beentry = MyBEEntry; do { ! beentry-st_changecount++; } while ((beentry-st_changecount 1) == 0); beentry-st_procpid = MyProcPid; --- 2563,2569 beentry = MyBEEntry; do { ! pgstat_increment_changecount_before(beentry); } while ((beentry-st_changecount 1) == 0); beentry-st_procpid = MyProcPid; *** *** 2588,2595 pgstat_bestart(void) beentry-st_appname[NAMEDATALEN - 1] = '\0'; beentry-st_activity[pgstat_track_activity_query_size - 1] = '\0'; ! beentry-st_changecount++; ! Assert((beentry-st_changecount 1) == 0); /* Update app name to current GUC setting */ if (application_name) --- 2588,2594 beentry-st_appname[NAMEDATALEN - 1] = '\0'; beentry-st_activity[pgstat_track_activity_query_size - 1] = '\0'; ! pgstat_increment_changecount_after(beentry); /* Update app name to current GUC setting */ if (application_name) *** *** 2624,2635 pgstat_beshutdown_hook(int code, Datum arg) * before and after. We use a volatile pointer here to ensure the * compiler doesn't try to get cute. */ ! beentry-st_changecount++; beentry-st_procpid = 0; /* mark invalid */ ! beentry-st_changecount++; ! Assert((beentry-st_changecount 1) == 0); } --- 2623,2633 * before and after. We use a volatile pointer here to ensure the * compiler doesn't try to get cute. */ ! pgstat_increment_changecount_before(beentry); beentry-st_procpid = 0; /* mark invalid */ ! pgstat_increment_changecount_after(beentry); } *** *** 2666,2672 pgstat_report_activity(BackendState state, const char *cmd_str) * non-disabled state. As our final update, change the state and * clear fields we will not be updating anymore. */ ! beentry-st_changecount++; beentry-st_state = STATE_DISABLED; beentry-st_state_start_timestamp = 0; beentry-st_activity[0] = '\0'; --- 2664,2670 * non-disabled state. As our final update, change the state and * clear fields we will not be updating anymore. */ ! pgstat_increment_changecount_before(beentry); beentry-st_state = STATE_DISABLED; beentry-st_state_start_timestamp = 0; beentry-st_activity[0] = '\0'; *** *** 2674,2681 pgstat_report_activity(BackendState state, const char *cmd_str) /* st_xact_start_timestamp and st_waiting are also disabled */ beentry-st_xact_start_timestamp = 0; beentry-st_waiting = false; ! beentry-st_changecount++; ! Assert((beentry-st_changecount 1) == 0); } return; } --- 2672,2678 /* st_xact_start_timestamp and st_waiting are also disabled */ beentry-st_xact_start_timestamp = 0; beentry-st_waiting = false; ! pgstat_increment_changecount_after(beentry); } return; } *** *** 2695,2701 pgstat_report_activity(BackendState state, const char *cmd_str) /* * Now update the status entry */ ! beentry-st_changecount++; beentry-st_state = state; beentry-st_state_start_timestamp = current_timestamp; --- 2692,2698 /* * Now update the status entry */ ! pgstat_increment_changecount_before(beentry);
Re: [HACKERS] pgbench --tuple-size option
I'm not sure about the implication of ALTER on the table storage, Should be fine in this case. But if that's what you're concerned about - understandably - Indeed, my (long) experience with benchmarks is that it is a much more complicated that it looks if you want to really understand what you are getting, and to get anything meaningful. it seems to make more sense to split -i into two. One to create the tables, and another to fill them. That'd allow to do manual stuff inbetween. Hmmm. This would mean much more changes than the pretty trivial patch I submitted: more options (2 parts init + compatibility with the previous case), splitting the init function, having a dependency and new error cases to check (you must have the table to fill them), some options apply to first part while other apply to second part, which would lead in any case to a signicantly more complicated documentation... a lot of trouble for my use case to answer Josh pertinent comments, and to be able to test the tuple size factor easily. Moreover, I would reject it myself as too much trouble for a small benefit. Feel free to reject the patch if you do not want it. I think that its cost/benefit is reasonable (one small option, small code changes, some benefit for people who want to measure performance in various cases). -- Fabien. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pgbench --tuple-size option
On 2014-08-15 13:33:20 +0200, Fabien COELHO wrote: it seems to make more sense to split -i into two. One to create the tables, and another to fill them. That'd allow to do manual stuff inbetween. Hmmm. This would mean much more changes than the pretty trivial patch I submitted FWIW, I find that patch really ugly. Adding the filler's with in a printf, after the actual DDL declaration. Without so much as a comment. Brr. : more options (2 parts init + compatibility with the previous case), splitting the init function, having a dependency and new error cases to check (you must have the table to fill them), some options apply to first part while other apply to second part, which would lead in any case to a signicantly more complicated documentation... a lot of trouble for my use case to answer Josh pertinent comments, and to be able to test the tuple size factor easily. Moreover, I would reject it myself as too much trouble for a small benefit. Well, it's something more generic, because it allows you do do more... Feel free to reject the patch if you do not want it. I think that its cost/benefit is reasonable (one small option, small code changes, some benefit for people who want to measure performance in various cases). I personally think this isn't worth the price. But I'm just one guy. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Support for N synchronous standby servers
On Fri, Aug 15, 2014 at 4:05 PM, Michael Paquier michael.paqu...@gmail.com wrote: On Thu, Aug 14, 2014 at 8:34 PM, Fujii Masao masao.fu...@gmail.com wrote: +At any one time there will be at a number of active synchronous standbys +defined by varnamesynchronous_standby_num/; transactions waiting It's better to use xref linkend=guc-synchronous-standby-num, instead. Fixed. +for commit will be allowed to proceed after those standby servers +confirms receipt of their data. The synchronous standbys will be Typo: confirms - confirm Fixed. + para +Specifies the number of standbys that support +firsttermsynchronous replication/, as described in +xref linkend=synchronous-replication, and listed as the first +elements of xref linkend=guc-synchronous-standby-names. + /para + para +Default value is 1. + /para synchronous_standby_num is defined with PGC_SIGHUP. So the following should be added into the document. This parameter can only be set in the postgresql.conf file or on the server command line. Fixed. The name of the parameter synchronous_standby_num sounds to me that the transaction must wait for its WAL to be replicated to s_s_num standbys. But that's not true in your patch. If s_s_names is empty, replication works asynchronously whether the value of s_s_num is. I'm afraid that it's confusing. The description of s_s_num is not sufficient. I'm afraid that users can easily misunderstand that they can use quorum commit feature by using s_s_names and s_s_num. That is, the transaction waits for its WAL to be replicated to any s_s_num standbys listed in s_s_names. I reworked the docs to mention all that. Yes things are a bit different than any quorum commit facility (how to parametrize that simply without a parameter mapping one to one the items of s_s_names?), as this facility relies on the order of the items of s_s_names and the fact that stansbys are connected at a given time. When s_s_num is set to larger value than max_wal_senders, we should warn that? Actually I have done a bit more than that by forbidding setting s_s_num to a value higher than max_wal_senders. Thoughts? You added check_synchronous_standby_num() as the GUC check function for synchronous_standby_num, and checked that there. But that seems to be wrong. You can easily see the following error messages even if synchronous_standby_num is smaller than max_wal_senders. The point is that synchronous_standby_num should be located before max_wal_senders in postgresql.conf. LOG: invalid value for parameter synchronous_standby_num: 0 DETAIL: synchronous_standby_num cannot be higher than max_wal_senders. Now that we discuss the interactions with other parameters. Another thing that I am wondering about now is: what should we do if we specify s_s_num to a number higher than the elements in s_s_names? Currently, the patch gives the priority to s_s_num, in short if we set s_s_num to 100, server will wait for 100 servers to confirm commit even if there are less than 100 elements in s_s_names. I chose this way because it looks saner particularly if s_s_names = '*'. Thoughts once again? I'm fine with this. As you gave an example, the number of entries in s_s_names can be smaller than the number of actual active sync standbys. For example, when s_s_names is set to 'hoge', more than one standbys with the name 'hoge' can connect to the server with sync mode. I still think that it's strange that replication can be async even when s_s_num is larger than zero. That is, I think that the transaction must wait for s_s_num sync standbys whether s_s_names is empty or not. OTOH, if s_s_num is zero, replication must be async whether s_s_names is empty or not. At least for me, it's intuitive to use s_s_num primarily to control the sync mode. Of course, other hackers may have different thoughts, so we need to keep our ear open for them. In the above design, one problem is that the number of parameters that those who want to set up only one sync replication need to change is incremented by one. That is, they need to change s_s_num additionally. If we are really concerned about this, we can treat a value of -1 in s_s_num as the special value, which allows us to control sync replication only by s_s_names as we do now. That is, if s_s_names is empty, replication would be async. Otherwise, only one standby with high-priority in s_s_names becomes sync one. Probably the default of s_s_num should be -1. Thought? The source code comments at the top of syncrep.c need to be udpated. It's worth checking whether there are other comments to be updated. Regards, -- Fujii Masao -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pgbench --tuple-size option
On Fri, Aug 15, 2014 at 8:36 PM, Andres Freund and...@2ndquadrant.com wrote: On 2014-08-15 13:33:20 +0200, Fabien COELHO wrote: it seems to make more sense to split -i into two. One to create the tables, and another to fill them. That'd allow to do manual stuff inbetween. Hmmm. This would mean much more changes than the pretty trivial patch I submitted FWIW, I find that patch really ugly. Adding the filler's with in a printf, after the actual DDL declaration. Without so much as a comment. Brr. : more options (2 parts init + compatibility with the previous case), splitting the init function, having a dependency and new error cases to check (you must have the table to fill them), some options apply to first part while other apply to second part, which would lead in any case to a signicantly more complicated documentation... a lot of trouble for my use case to answer Josh pertinent comments, and to be able to test the tuple size factor easily. Moreover, I would reject it myself as too much trouble for a small benefit. Well, it's something more generic, because it allows you do do more... Feel free to reject the patch if you do not want it. I think that its cost/benefit is reasonable (one small option, small code changes, some benefit for people who want to measure performance in various cases). I personally think this isn't worth the price. But I'm just one guy. I also don't like this feature. The benefit of this option seems too small. If we apply this, we might want to support other options, for example, option to change the data type of each column, option to create new index using minmax, option to change the fillfactor of each table, ...etc. There are countless such options, but I'm afraid that it's really hard to support so many options. Regards, -- Fujii Masao -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ALTER TABLESPACE MOVE command tag tweak
On Thu, Aug 14, 2014 at 6:12 PM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: We really should not be making changes of this type less than a month from our ostensible release date. That is not enough time for us to notice if the changes turn out to be not as good as we think they are. If it weren't for the fact that we'll be wedded forevermore to a bad choice of syntax, I might agree with you. But at this point, the alternatives we have are to fix it now, or fix it never. I don't like #2. Or postpone the release for another month or two. There's still a few other unresolved issues, too, like the problems with psql and expanded mode; and the JSONB toast problems. The latter is relatively new, but we don't have a proposed patch for it yet unless there's on in an email I haven't read yet, and the former has been lingering for many months without getting appreciably closer to a resolution. I like releasing in September as much as anyone, but that contemplates people taking care to get known issues fixed before the second half of August. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Another logical decoding assertion failure
http://www.postgresql.org/message-id/blu436-smtp12682d628f61ab9736099c3dc...@phx.gbl recalls me that I also saw an assertion failure recently. Although I wanted to isolate and report my issue when my vacation is over, this report made me curious whether I saw the same. Eventually it seems to be a different symptom. Following are the steps that trigger the failure in my environment (9.5devel), although it's not always that straightforward - sometimes I need to create and populate one more table. (I use the 'test_decoding' contrib module.) postgres=# SELECT pg_create_logical_replication_slot('my_slot', 'test_decoding'); pg_create_logical_replication_slot (my_slot,0/16F3B30) (1 row) postgres=# CREATE TABLE a AS SELECT * FROM pg_logical_slot_get_changes('my_slot', NULL, NULL) WITH NO DATA; SELECT 0 postgres=# INSERT INTO a SELECT * FROM pg_logical_slot_get_changes('my_slot', NULL, NULL); INSERT 0 2 postgres=# INSERT INTO a SELECT * FROM pg_logical_slot_get_changes('my_slot', NULL, NULL); The connection to the server was lost. Attempting reset: Failed. ! The stack trace (should I upload / send the whole core file anywhere?): (gdb) bt #0 0x7f12d0640849 in raise () from /lib64/libc.so.6 #1 0x7f12d0641cd8 in abort () from /lib64/libc.so.6 #2 0x008c2d51 in ExceptionalCondition (conditionName=0xa926a0 !(((bool)((relation)-rd_refcnt == 0))), errorType=0xa92210 FailedAssertion, fileName=0xa92104 relcache.c, lineNumber=1892) at assert.c:54 #3 0x008b2776 in RelationDestroyRelation (relation=0x7f12c766d240, remember_tupdesc=0 '\000') at relcache.c:1892 #4 0x008b2c28 in RelationClearRelation (relation=0x7f12c766d240, rebuild=1 '\001') at relcache.c:2106 #5 0x008b2fa3 in RelationFlushRelation (relation=0x7f12c766d240) at relcache.c:2204 #6 0x008b30b5 in RelationCacheInvalidateEntry (relationId=16384) at relcache.c:2256 #7 0x008abc65 in LocalExecuteInvalidationMessage (msg=0x28c1260) at inval.c:567 #8 0x0073e1f5 in ReorderBufferExecuteInvalidations (rb=0x28b0318, txn=0x28b0430) at reorderbuffer.c:1798 #9 0x0073ddbf in ReorderBufferForget (rb=0x28b0318, xid=1279, lsn=24154944) at reorderbuffer.c:1645 #10 0x007383f8 in DecodeCommit (ctx=0x2894658, buf=0x7fff3e658c30, xid=1279, dboid=12736, commit_time=461418357793855, nsubxacts=0, sub_xids=0x28af650, ninval_msgs=69, msgs=0x28af650) at decode.c:539 #11 0x00737c91 in DecodeXactOp (ctx=0x2894658, buf=0x7fff3e658c30) at decode.c:207 #12 0x007379ce in LogicalDecodingProcessRecord (ctx=0x2894658, record=0x28af610) at decode.c:103 #13 0x0073b0d6 in pg_logical_slot_get_changes_guts (fcinfo=0x7fff3e658f50, confirm=1 '\001', binary=0 '\000') at logicalfuncs.c:432 #14 0x0073b221 in pg_logical_slot_get_changes (fcinfo=0x7fff3e658f50) at logicalfuncs.c:478 #15 0x0063e1a3 in ExecMakeTableFunctionResult (funcexpr=0x288be78, econtext=0x288b758, argContext=0x28b5580, expectedDesc=0x288ccd8, randomAccess=0 '\000') at execQual.c:2157 #16 0x0065e302 in FunctionNext (node=0x288ba18) at nodeFunctionscan.c:95 #17 0x0064537e in ExecScanFetch (node=0x288ba18, accessMtd=0x65e24b FunctionNext, recheckMtd=0x65e630 FunctionRecheck) at execScan.c:82 #18 0x006453ed in ExecScan (node=0x288ba18, accessMtd=0x65e24b FunctionNext, recheckMtd=0x65e630 FunctionRecheck) at execScan.c:132 #19 0x0065e665 in ExecFunctionScan (node=0x288ba18) at nodeFunctionscan.c:269 #20 0x0063a649 in ExecProcNode (node=0x288ba18) at execProcnode.c:426 #21 0x0065ca53 in ExecModifyTable (node=0x288b8c0) at nodeModifyTable.c:926 #22 0x0063a577 in ExecProcNode (node=0x288b8c0) at execProcnode.c:377 #23 0x00638465 in ExecutePlan (estate=0x288b518, planstate=0x288b8c0, operation=CMD_INSERT, sendTuples=0 '\000', numberTuples=0, direction=ForwardScanDirection, dest=0x2815ac0) at execMain.c:1475 #24 0x0063667a in standard_ExecutorRun (queryDesc=0x288f948, direction=ForwardScanDirection, count=0) at execMain.c:308 #25 0x00636512 in ExecutorRun (queryDesc=0x288f948, direction=ForwardScanDirection, count=0) at execMain.c:256 #26 0x007998ec in ProcessQuery (plan=0x28159c8, sourceText=0x2854d18 INSERT INTO a SELECT * FROM pg_logical_slot_get_changes('my_slot', NULL, NULL);, params=0x0, dest=0x2815ac0, completionTag=0x7fff3e659920 ) at pquery.c:185 #27 0x0079b16e in PortalRunMulti (portal=0x2890638, isTopLevel=1 '\001', dest=0x2815ac0, altdest=0x2815ac0, completionTag=0x7fff3e659920 ) at pquery.c:1279 #28 0x0079a7b1 in PortalRun (portal=0x2890638, count=9223372036854775807, isTopLevel=1 '\001', dest=0x2815ac0, altdest=0x2815ac0, completionTag=0x7fff3e659920 ) at pquery.c:816 #29 0x0079487b in exec_simple_query ( query_string=0x2854d18 INSERT INTO a SELECT * FROM pg_logical_slot_get_changes('my_slot', NULL, NULL);) at
Re: [HACKERS] Minmax indexes
On Fri, Aug 15, 2014 at 8:02 AM, Alvaro Herrera alvhe...@2ndquadrant.com wrote: Alvaro Herrera wrote: Heikki Linnakangas wrote: I'm sure this still needs some cleanup, but here's the patch, based on your v14. Now that I know what this approach looks like, I still like it much better. The insert and update code is somewhat more complicated, because you have to be careful to lock the old page, new page, and revmap page in the right order. But it's not too bad, and it gets rid of all the complexity in vacuum. It seems there is some issue here, because pageinspect tells me the index is not growing properly for some reason. minmax_revmap_data gives me this array of TIDs after a bunch of insert/vacuum/delete/ etc: I fixed this issue, and did a lot more rework and bugfixing. Here's v15, based on v14-heikki2. I've not read the patch yet. But while testing the feature, I found that * Brin index cannot be created on CHAR(n) column. Maybe other data types have the same problem. * FILLFACTOR cannot be set in brin index. Are these intentional? Regards, -- Fujii Masao -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Another logical decoding assertion failure
On 2014-08-15 14:53:45 +0200, Antonin Houska wrote: http://www.postgresql.org/message-id/blu436-smtp12682d628f61ab9736099c3dc...@phx.gbl recalls me that I also saw an assertion failure recently. Although I wanted to isolate and report my issue when my vacation is over, this report made me curious whether I saw the same. Eventually it seems to be a different symptom. That's something separate, yes. Following are the steps that trigger the failure in my environment (9.5devel), although it's not always that straightforward - sometimes I need to create and populate one more table. (I use the 'test_decoding' contrib module.) postgres=# SELECT pg_create_logical_replication_slot('my_slot', 'test_decoding'); pg_create_logical_replication_slot (my_slot,0/16F3B30) (1 row) postgres=# CREATE TABLE a AS SELECT * FROM pg_logical_slot_get_changes('my_slot', NULL, NULL) WITH NO DATA; SELECT 0 postgres=# INSERT INTO a SELECT * FROM pg_logical_slot_get_changes('my_slot', NULL, NULL); INSERT 0 2 postgres=# INSERT INTO a SELECT * FROM pg_logical_slot_get_changes('my_slot', NULL, NULL); The connection to the server was lost. Attempting reset: Failed. ! Is this just to reproduce the issue, or are you really storing the results of logical decoding in a table again? Why? That'll just result in circularity, no? It's not something I really thought about allowing when writing this. Possibly we can make it work, but I don't really see the point. We obviously shouldn't just crash, but that's not my point. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pgbench --tuple-size option
Hmmm. This would mean much more changes than the pretty trivial patch I submitted FWIW, I find that patch really ugly. Adding the filler's with in a printf, after the actual DDL declaration. Without so much as a comment. Brr. Indeed. I'm not too proud of that very point either:-) You are right that it deserves at the minimum a clear comment. To put the varying size in the DDL string means vsprintf and splitting the query building some more, which I do not find desirable. [...] Well, it's something more generic, because it allows you do do more... Apart from I do not need it (at least right now), and that it is more work, my opinion is that it would be rejected. Not a strong insentive to spend time in that direction. -- Fabien. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposal to add a QNX 6.5 port to PostgreSQL
On Thu, Aug 14, 2014 at 12:08 PM, Baker, Keith [OCDUS Non-JJ] kbak...@its.jnj.com wrote: I tried a combination of PIPE lock and file lock (fcntl) as Tom had suggested. Attached experimental patch has this logic... Postmaster : - get exclusive fcntl lock (to guard against race condition in PIPE-based lock) - check PIPE for any existing readers - open PIPE for read All other backends: - get shared fcnlt lock - open PIPE for read Hmm. This seems like it might almost work. But I don't see why the other backends need to care about fcntl() at all. How about this locking protocol: Postmaster: 1. Acquire an exclusive lock on some file in the data directory, maybe the control file, using fcntl(). 2. Open the named pipe for read. 3. Open the named pipe for write. 4. Close the named pipe for read. 5. Install a signal handler for SIGPIPE which sets a global variable. 6. Try to write to the pipe. 7. Check that the variable is set; if not, FATAL. 8. Revert SIGPIPE handler. 9. Close the named pipe for write. 10. Open the named pipe for read. 11. Release the fcntl() lock acquired in step 1. Regular backends don't need to do anything special, except that they need to make sure that the file descriptor opened in step 8 gets inherited by the right set of processes. That means that the close-on-exec flag should be turned on in the postmaster; except in EXEC_BACKEND builds, where it should be turned off but then turned on again by child processes before they do anything that might fork. It's impossible for two postmasters to start up at the same time because the fcntl() lock acquired at step 1 will block any newly-arriving postmaster until step 11 is completel. The first-to-close semantics of fcntl() aren't a problem for this purpose because we only execute a very limited amount of code over which we have full control while holding the lock. By the time the postmaster that gets the lock first completes step 10, any later-arriving postmaster is guaranteed to fall out at step 7 while that postmaster or any children who inherit the pipe descriptor remain alive. No process holds any resource that will survive its exit, so cleanup is fully automatic. This seems solid to me, but watch somebody find a problem with it... -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Fix search_path default value separator.
On Tue, Jul 15, 2014 at 12:20 AM, Christoph Martin christoph.r.mar...@gmail.com wrote: True. Both variants are legal, and most people won't ever notice. I stumbled across this while writing a test case for a transaction helper that sets/restores search_path before committing. The test was to simply compare the string values of `SHOW search_path;` before `BEGIN TRANSACTION;` and after `COMMIT;`. It's a non-issue, really, but since there's a patch and I cannot come up with a more common use case that would depend on the use of just-comma separators in the default value, I'd say it's more of a question of why not instead of why, isn't it? On 14 July 2014 16:58, Robert Haas robertmh...@gmail.com wrote: On Fri, Jul 11, 2014 at 6:09 AM, Christoph Martin christoph.r.mar...@gmail.com wrote: I noticed a minor inconsistency with the search_path separator used in the default configuration. The schemas of any search_path set using `SET search_path TO...` are separated by , (comma, space), while the default value is only separated by , (comma). The attached patch against master changes the separator of the default value to be consistent with the usual comma-space separators, and updates the documentation of `SHOW search_path;` accordingly. This massive three-byte change passes all 144 tests of make check. Heh. I'm not particularly averse to changing this, but I guess I don't see any particular benefit of changing it either. Either comma or comma-space is a legal separator, so why worry about it? This change might cause me to update the existing documents (which I need to maintain in my company) including the output example of default search_path. If the change is for the improvement, I'd be happy to do that, but it seems not. Also there might be some PostgreSQL extensions which their regression test shows the default search_path. This patch would make their developers spend the time to update the test. I'm sure that they are fine with that if it's for an improvement. But not. Regards, -- Fujii Masao -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 9.4 logical decoding assertion
On 2014-08-14 16:03:08 -0400, Steve Singer wrote: I hit the following on 9.4 testing logical decoding. TRAP: FailedAssertion(!(prev_first_lsn cur_txn-first_lsn), File: reorderbuffer.c, Line: 618) LOG: server process (PID 3801) was terminated by signal 6: Aborted I saw that recently while hacking around, but I thought it was because of stuff I'd added. But apparently not. Hm. I think I see how that might happen. It might be possible (and harmless) if two subxacts of the same toplevel xact have the same first_lsn. But if it's not just = vs it'd be worse. Unfortunately I don't have a core file and I haven't been able to reproduce this. Any information about the workload? Any chance you still have the data directory around? Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Reporting the commit LSN at commit time
Andres Freund and...@2ndquadrant.com writes: On 2014-08-14 12:21:38 -0400, Robert Haas wrote: And what does that actually do? Send back a result-set, or a new protocol message? What I was thinking of was to return COMMIT X/X instead of COMMIT. Since that's only sent when COMMIT WITH (report_commit_lsn ON) was set it won't break clients/libraries that don't need it. Au contraire: it will break any piece of code that is expecting a COMMIT command tag to look like exactly COMMIT and not COMMIT something. If you think there isn't any such, a look into libpq should disabuse you of that notion. (Admittedly, libpq's instance is only in the protocol-V2 code paths, but I'm sure that similar code exists elsewhere client-side.) The risk still remains, therefore, that one layer of the client-side software stack might try to enable this feature even though another layer is not prepared for it. Changing the command tag might actually be *more* dangerous than a new protocol message, rather than less so, because command tags are usually exposed at multiple layers of the stack --- libpq for instance happily returns them up to its caller. So it will be somewhere between difficult and impossible to be sure that one has fixed everything that needs fixing. And, again, I think that controlling this via something as widely changeable as a GUC is sheer folly, potentially even reaching the point of being a security bug. (Applications that fail to recognize when their transactions have committed would be broken in very nasty ways.) regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ALTER TABLESPACE MOVE command tag tweak
Robert Haas robertmh...@gmail.com writes: On Thu, Aug 14, 2014 at 6:12 PM, Tom Lane t...@sss.pgh.pa.us wrote: If it weren't for the fact that we'll be wedded forevermore to a bad choice of syntax, I might agree with you. But at this point, the alternatives we have are to fix it now, or fix it never. I don't like #2. Or postpone the release for another month or two. There's still a few other unresolved issues, too, like the problems with psql and expanded mode; and the JSONB toast problems. The latter is relatively new, but we don't have a proposed patch for it yet unless there's on in an email I haven't read yet, and the former has been lingering for many months without getting appreciably closer to a resolution. Yeah, if we do anything about JSONB we are certainly going to need another beta release, which means final couldn't be before end of Sept. at the very earliest. Still, it's always been project policy that we ship when it's ready, not when the calendar hits some particular point. As for the expanded-mode changes, I thought there was consensus to revert that from 9.4. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Fix search_path default value separator.
On Fri, Aug 15, 2014 at 10:40:59PM +0900, Fujii Masao wrote: Heh. I'm not particularly averse to changing this, but I guess I don't see any particular benefit of changing it either. Either comma or comma-space is a legal separator, so why worry about it? This change might cause me to update the existing documents (which I need to maintain in my company) including the output example of default search_path. If the change is for the improvement, I'd be happy to do that, but it seems not. Also there might be some PostgreSQL extensions which their regression test shows the default search_path. This patch would make their developers spend the time to update the test. I'm sure that they are fine with that if it's for an improvement. But not. Well, rename GUC often too for clearity, so I don't see adjusting white-space as something to avoid either. It is always about short-term adjustments vs. long-term clarity. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Reporting the commit LSN at commit time
On 2014-08-15 09:54:01 -0400, Tom Lane wrote: Andres Freund and...@2ndquadrant.com writes: On 2014-08-14 12:21:38 -0400, Robert Haas wrote: And what does that actually do? Send back a result-set, or a new protocol message? What I was thinking of was to return COMMIT X/X instead of COMMIT. Since that's only sent when COMMIT WITH (report_commit_lsn ON) was set it won't break clients/libraries that don't need it. Au contraire: it will break any piece of code that is expecting a COMMIT command tag to look like exactly COMMIT and not COMMIT something. Well, if your code doesn't support it. Don't use it. The risk still remains, therefore, that one layer of the client-side software stack might try to enable this feature even though another layer is not prepared for it. Well, then the user will have to fix that. It's not like the feature will magically start to be used by itself. One alternative would be to expose a pg_get_last_commit_lsn(); function that'd return the the last commit's lsn stored in a static variable if set. That'll increase the window in which the connection can break, but that window already exists. And, again, I think that controlling this via something as widely changeable as a GUC is sheer folly, potentially even reaching the point of being a security bug. (Applications that fail to recognize when their transactions have committed would be broken in very nasty ways.) *I*'ve never suggested making this depend on a guc. I think that'd a major PITA. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 9.4 logical decoding assertion
On 08/15/2014 09:42 AM, Andres Freund wrote: On 2014-08-14 16:03:08 -0400, Steve Singer wrote: I hit the following on 9.4 testing logical decoding. TRAP: FailedAssertion(!(prev_first_lsn cur_txn-first_lsn), File: reorderbuffer.c, Line: 618) LOG: server process (PID 3801) was terminated by signal 6: Aborted I saw that recently while hacking around, but I thought it was because of stuff I'd added. But apparently not. Hm. I think I see how that might happen. It might be possible (and harmless) if two subxacts of the same toplevel xact have the same first_lsn. But if it's not just = vs it'd be worse. Unfortunately I don't have a core file and I haven't been able to reproduce this. Any information about the workload? Any chance you still have the data directory around? I was running the slony regression tests but I ran the same tests script after a number of times after and the problem didn't reproduce itself. The last thing the tests did before the crash was part of the slony failover process. I am doing my testing running with all 5 nodes/databases under the same postmaster (giving something like 20 replication slots open) A few milliseconds before the one of the connections had just done a START_REPLICATION SLOT slon_4_2 LOGICAL 0/32721A58 and then that connection reported the socket being closed, but because so much was going on concurrently I can't say for sure if that connection experienced the assert or was closed because another backend asserted. I haven't done an initdb since, so I have the data directory but I've dropped and recreated all of my slots many times since so the wal files are long gone. Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [patch] pg_copy - a command for reliable WAL archiving
On Thu, Aug 14, 2014 at 1:52 PM, MauMau maumau...@gmail.com wrote: I fixed some minor mistakes. What's the main purpose of this tool? If it's for WAL archiving, the tool name pg_copy sounds too generic. We already have pg_archivecleanup, so maybe pg_archivecopy or something is better for the consistency? pg_copy in the patch copies the file to the destination in a straightforward way, i.e., directly copies the file to the dest file with actual name. This can cause the problem which some people reported. The problem is that, when the server crashes while WAL file is being archived by cp command, its partially-filled WAL file remains at the archival area. This half-baked archive file can cause various troubles. To address this, WAL file needs to be copied to the temporary file at first, then renamed to the actual name. I think that pg_copy should copy the WAL file in that way. Currently pg_copy always syncs the archive file, and there is no way to disable that. But I'm sure that not everyone want to sync the archive file. So I think that it's better to add the option specifying whether to sync the file or not, into pg_copy. Some users might want to specify whether to call posix_fadvise or not because they might need to re-read the archvied files just after the archiving. For example, network copy of the archived files from the archive area to remote site for disaster recovery. Do you recommend to use pg_copy for restore_command? If yes, it also should be documented. And in the WAL restore case, the restored WAL files are re-read soon by recovery, so posix_fadvise is not good in that case. Direct I/O and posix_fadvise are used only for destination file. But why not source file? That might be useful especially for restore_command case. At last, the big question is, is there really no OS command which provides the same functionality as pg_copy does? If there is, I'd like to avoid duplicate work basically. Regards, -- Fujii Masao -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_dump bug in 9.4beta2 and HEAD
Hi 2014-08-14 9:03 GMT+02:00 Heikki Linnakangas hlinnakan...@vmware.com: On 08/14/2014 06:53 AM, Joachim Wieland wrote: I'm seeing an assertion failure with pg_dump -c --if-exists which is not ready to handle BLOBs it seems: pg_dump: pg_backup_archiver.c:472: RestoreArchive: Assertion `mark != ((void *)0)' failed. To reproduce: $ createdb test $ pg_dump -c --if-exists test (works, dumps empty database) $ psql test -c select lo_create(1); $ pg_dump -c --if-exists test (fails, with the above mentioned assertion) The code tries to inject an IF EXISTS into the already-construct DROP command, but it doesn't work for large objects, because the deletion command looks like SELECT pg_catalog.lo_unlink(xxx). There is no DROP there. I believe we could use SELECT pg_catalog.lo_unlink(loid) FROM pg_catalog.pg_largeobject_metadata WHERE loid = xxx. pg_largeobject_metadata table didn't exist before version 9.0, but we don't guarantee pg_dump's output to be compatible in that direction anyway, so I think that's fine. The quick fix would be to add an exception for blobs, close to where Assert is. There are a few exceptions there already. A cleaner solution would be to add a new argument to ArchiveEntry and make the callers responsible for providing an DROP IF EXISTS query, but that's not too appetizing because for most callers it would be boring boilerplate code. Perhaps add an argument, but if it's NULL, ArchiveEntry would form the if-exists query automatically from the DROP query. I am sending two patches first is fast fix second fix is implementation of Heikki' idea. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers commit 68ba9d3b682c6e56b08d623ebc58e351ce1a6c55 Author: Pavel Stehule pavel.steh...@gooddata.com Date: Fri Aug 15 13:41:24 2014 +0200 heikki-lo-if-exists-fix diff --git a/src/bin/pg_dump/pg_backup_archiver.c b/src/bin/pg_dump/pg_backup_archiver.c index 3aebac8..0049be7 100644 --- a/src/bin/pg_dump/pg_backup_archiver.c +++ b/src/bin/pg_dump/pg_backup_archiver.c @@ -429,60 +429,100 @@ RestoreArchive(Archive *AHX) } else { - char buffer[40]; - char *mark; - char *dropStmt = pg_strdup(te-dropStmt); - char *dropStmtPtr = dropStmt; - PQExpBuffer ftStmt = createPQExpBuffer(); - /* - * Need to inject IF EXISTS clause after ALTER TABLE - * part in ALTER TABLE .. DROP statement + * LO doesn't use DDL stmts, instead it uses lo_unlink + * functions. But it should be injected too. */ - if (strncmp(dropStmt, ALTER TABLE, 11) == 0) + if (strncmp(te-desc, BLOB, 4) == 0) { - appendPQExpBuffer(ftStmt, - ALTER TABLE IF EXISTS); - dropStmt = dropStmt + 11; + Oid loid = InvalidOid; + char *mark; + + /* find and take foid */ + if (strncmp(te-dropStmt, SELECT pg_catalog.lo_unlink(', 29) == 0) + { +unsigned long cvt; +char *endptr; + +mark = te-dropStmt + 29; + +if (*mark != '\0') +{ + errno = 0; + cvt = strtoul(mark, endptr, 10); + + /* Ensure we have a valid Oid */ + if (errno == 0 mark != endptr *endptr == '\'') + loid = (Oid) cvt; +} + } + + if (loid == InvalidOid) +exit_horribly(modulename, cannot to identify oid of large object\n); + + /* + * Now we have valid foid and we can generate + * fault tolerant (not existency) SQL statement. + */ + DropBlobIfExists(AH, loid); } - - /* - * ALTER TABLE..ALTER COLUMN..DROP DEFAULT does not - * support the IF EXISTS clause, and therefore we - * simply emit the original command for such objects. - * For other objects, we need to extract the first - * part of the DROP which includes the object type. - * Most of the time this matches te-desc, so search - * for that; however for the different kinds of - * CONSTRAINTs, we know to search for hardcoded DROP - * CONSTRAINT instead. - */ - if (strcmp(te-desc, DEFAULT) == 0) - appendPQExpBuffer(ftStmt, %s, dropStmt); else { - if (strcmp(te-desc, CONSTRAINT) == 0 || -strcmp(te-desc, CHECK CONSTRAINT) == 0 || -strcmp(te-desc, FK CONSTRAINT) == 0) -strcpy(buffer, DROP CONSTRAINT); + char buffer[40]; + char *mark; + char *dropStmt = pg_strdup(te-dropStmt); + char *dropStmtPtr = dropStmt; + PQExpBuffer ftStmt = createPQExpBuffer(); + + /* + * Need to inject IF EXISTS clause after ALTER TABLE + * part in ALTER TABLE .. DROP statement + */ + if (strncmp(dropStmt, ALTER TABLE, 11) == 0) + { +appendPQExpBuffer(ftStmt, + ALTER TABLE IF
Re: [HACKERS] [patch] pg_copy - a command for reliable WAL archiving
Fujii Masao masao.fu...@gmail.com wrote: Direct I/O and posix_fadvise are used only for destination file. But why not source file? That might be useful especially for restore_command case. That would prevent people from piping the file through a compression utility. We should support piped I/O for input (and if we want to use it on the recovery side, for the output, too), at least as an option. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_dump bug in 9.4beta2 and HEAD
On Thu, Aug 14, 2014 at 10:03:57AM +0300, Heikki Linnakangas wrote: On 08/14/2014 06:53 AM, Joachim Wieland wrote: I'm seeing an assertion failure with pg_dump -c --if-exists which is not ready to handle BLOBs it seems: pg_dump: pg_backup_archiver.c:472: RestoreArchive: Assertion `mark != ((void *)0)' failed. To reproduce: $ createdb test $ pg_dump -c --if-exists test (works, dumps empty database) $ psql test -c select lo_create(1); $ pg_dump -c --if-exists test (fails, with the above mentioned assertion) The code tries to inject an IF EXISTS into the already-construct DROP command, but it doesn't work for large objects, because the deletion command looks like SELECT pg_catalog.lo_unlink(xxx). There is no DROP there. The lo_* functions are probably too entrenched to be deprecated, but maybe we could come up with DML (or DDL, although that seems like a bridge too far) equivalents and use those. Not for 9.4, obviously. Cheers, David. -- David Fetter da...@fetter.org http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fet...@gmail.com iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposal to add a QNX 6.5 port to PostgreSQL
Robert Haas robertmh...@gmail.com writes: How about this locking protocol: Postmaster: 1. Acquire an exclusive lock on some file in the data directory, maybe the control file, using fcntl(). 2. Open the named pipe for read. 3. Open the named pipe for write. 4. Close the named pipe for read. 5. Install a signal handler for SIGPIPE which sets a global variable. 6. Try to write to the pipe. 7. Check that the variable is set; if not, FATAL. 8. Revert SIGPIPE handler. 9. Close the named pipe for write. 10. Open the named pipe for read. 11. Release the fcntl() lock acquired in step 1. Hm, this seems like it would work. A couple other thoughts: * I think 5..8 are overly complex: we can just set SIGPIPE to SIG_IGN (which is its usual setting in the postmaster already) and check for EPIPE from the write(). * There might be some benefit to swapping steps 9 and 10; at the very least, this would eliminate the need to use O_NONBLOCK while re-opening for read. * We talked about combining this technique with a plain file lock so that we would have belt-and-suspenders protection, in particular something that would have a chance of working across NFS clients. This would suggest leaving the fcntl lock in place, ie, don't do step 11, and also that the file-to-be-locked *not* have any other purpose (which would only increase the risk of losing the lock through careless open/close). Regular backends don't need to do anything special, except that they need to make sure that the file descriptor opened in step 8 gets inherited by the right set of processes. That means that the close-on-exec flag should be turned on in the postmaster; except in EXEC_BACKEND builds, where it should be turned off but then turned on again by child processes before they do anything that might fork. Meh. Do we really want to allow a new postmaster to start if there are any processes remaining that were launched by backends? I'd be inclined to just suppress close-on-exec, period. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Another logical decoding assertion failure
On 08/15/2014 03:16 PM, Andres Freund wrote: On 2014-08-15 14:53:45 +0200, Antonin Houska wrote: postgres=# SELECT pg_create_logical_replication_slot('my_slot', 'test_decoding'); pg_create_logical_replication_slot (my_slot,0/16F3B30) (1 row) postgres=# CREATE TABLE a AS SELECT * FROM pg_logical_slot_get_changes('my_slot', NULL, NULL) WITH NO DATA; SELECT 0 postgres=# INSERT INTO a SELECT * FROM pg_logical_slot_get_changes('my_slot', NULL, NULL); INSERT 0 2 postgres=# INSERT INTO a SELECT * FROM pg_logical_slot_get_changes('my_slot', NULL, NULL); The connection to the server was lost. Attempting reset: Failed. ! Is this just to reproduce the issue, or are you really storing the results of logical decoding in a table again? Why? That'll just result in circularity, no? It's not something I really thought about allowing when writing this. Possibly we can make it work, but I don't really see the point. We obviously shouldn't just crash, but that's not my point. It was basically an experiment. I tried to capture changes into a table. I don't know whether it's legal (i.e. whether the current call of pg_logical_slot_get_changes() should include changes that the current transaction did). Unloged / temporary table may be necessary for case like this. The reason I report it is that (I think), if such an use case is not legal, it should be detected somehow and handled using ereport / elog. // Tony -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Supporting Windows SChannel as OpenSSL replacement
On Tue, Aug 12, 2014 at 10:52 AM, Heikki Linnakangas hlinnakan...@vmware.com wrote: On 08/06/2014 08:37 PM, Jeff Janes wrote: But now it looks like 0002 needs a rebase I've committed the refactoring patch, and here's a rebased and improved version of the Windows SChannel implementation over that. On MinGW, I get the following error when compiling with options --host=x86_64-w64-mingw32 --without-zlib: be-secure.c: In function 'secure_open_server': be-secure.c:106:2: error: 'Port' has no member named 'peer_cn' be-secure.c:106:2: error: 'Port' has no member named 'peer_cn' make[3]: *** [be-secure.o] Error 1 make[2]: *** [libpq-recursive] Error 2 make[1]: *** [all-backend-recurse] Error 2 make: *** [all-src-recurse] Error 2 Should the ereport DEBUG2 be inside the #ifdef USE_SSL? Thanks, Jeff
Re: [HACKERS] Minmax indexes
On 08/15/2014 02:02 AM, Alvaro Herrera wrote: Alvaro Herrera wrote: Heikki Linnakangas wrote: I'm sure this still needs some cleanup, but here's the patch, based on your v14. Now that I know what this approach looks like, I still like it much better. The insert and update code is somewhat more complicated, because you have to be careful to lock the old page, new page, and revmap page in the right order. But it's not too bad, and it gets rid of all the complexity in vacuum. It seems there is some issue here, because pageinspect tells me the index is not growing properly for some reason. minmax_revmap_data gives me this array of TIDs after a bunch of insert/vacuum/delete/ etc: I fixed this issue, and did a lot more rework and bugfixing. Here's v15, based on v14-heikki2. So, the other design change I've been advocating is to store the revmap in the first N blocks, instead of having the two-level structure with array pages and revmap pages. Attached is a patch for that, to be applied after v15. When the revmap needs to be expanded, all the tuples on it are moved elsewhere one-by-one. That adds some latency to the unfortunate guy who needs to do that, but as the patch stands, the revmap is only ever extended by VACUUM or CREATE INDEX, so I think that's fine. Like with my previous patch, the point is to demonstrate how much simpler the code becomes this way; I'm sure there are bugs and cleanup still necessary. PS. Spotted one oversight in patch v15: callers of mm_doupdate must check the return value, and retry the operation if it returns false. - Heikki commit ce4df0e9dbd43f7e3d4fdf3f7920301f81f17d63 Author: Heikki Linnakangas heikki.linnakan...@iki.fi Date: Fri Aug 15 18:32:19 2014 +0300 Get rid of array pages. Instead, move all tuples out of the way diff --git a/contrib/pageinspect/mmfuncs.c b/contrib/pageinspect/mmfuncs.c index 6cd559a..51cc9e2 100644 --- a/contrib/pageinspect/mmfuncs.c +++ b/contrib/pageinspect/mmfuncs.c @@ -74,9 +74,6 @@ minmax_page_type(PG_FUNCTION_ARGS) case MINMAX_PAGETYPE_META: type = meta; break; - case MINMAX_PAGETYPE_REVMAP_ARRAY: - type = revmap array; - break; case MINMAX_PAGETYPE_REVMAP: type = revmap; break; @@ -343,11 +340,9 @@ minmax_metapage_info(PG_FUNCTION_ARGS) Page page; MinmaxMetaPageData *meta; TupleDesc tupdesc; - Datum values[3]; - bool nulls[3]; - ArrayBuildState *astate = NULL; + Datum values[4]; + bool nulls[4]; HeapTuple htup; - int i; page = verify_minmax_page(raw_page, MINMAX_PAGETYPE_META, metapage); @@ -361,22 +356,8 @@ minmax_metapage_info(PG_FUNCTION_ARGS) MemSet(nulls, 0, sizeof(nulls)); values[0] = CStringGetTextDatum(psprintf(0x%08X, meta-minmaxMagic)); values[1] = Int32GetDatum(meta-minmaxVersion); - - /* Extract (possibly empty) list of revmap array page numbers. */ - for (i = 0; i MAX_REVMAP_ARRAYPAGES; i++) - { - BlockNumber blkno; - - blkno = meta-revmapArrayPages[i]; - if (blkno == InvalidBlockNumber) - break; /* XXX or continue? */ - astate = accumArrayResult(astate, Int64GetDatum((int64) blkno), - false, INT8OID, CurrentMemoryContext); - } - if (astate == NULL) - nulls[2] = true; - else - values[2] = makeArrayResult(astate, CurrentMemoryContext); + values[2] = Int32GetDatum(meta-pagesPerRange); + values[3] = Int64GetDatum(meta-lastRevmapPage); htup = heap_form_tuple(tupdesc, values, nulls); @@ -384,34 +365,6 @@ minmax_metapage_info(PG_FUNCTION_ARGS) } /* - * Return the BlockNumber array stored in a revmap array page - */ -Datum -minmax_revmap_array_data(PG_FUNCTION_ARGS) -{ - bytea *raw_page = PG_GETARG_BYTEA_P(0); - Page page; - ArrayBuildState *astate = NULL; - RevmapArrayContents *contents; - Datum blkarr; - int i; - - page = verify_minmax_page(raw_page, MINMAX_PAGETYPE_REVMAP_ARRAY, - revmap array); - - contents = (RevmapArrayContents *) PageGetContents(page); - - for (i = 0; i contents-rma_nblocks; i++) - astate = accumArrayResult(astate, - Int64GetDatum((int64) contents-rma_blocks[i]), - false, INT8OID, CurrentMemoryContext); - Assert(astate != NULL); - - blkarr = makeArrayResult(astate, CurrentMemoryContext); - PG_RETURN_DATUM(blkarr); -} - -/* * Return the TID array stored in a minmax revmap page */ Datum @@ -437,7 +390,7 @@ minmax_revmap_data(PG_FUNCTION_ARGS) /* Extract values from the revmap page */ contents = (RevmapContents *) PageGetContents(page); MemSet(nulls, 0, sizeof(nulls)); - values[0] = Int64GetDatum((uint64) contents-rmr_logblk); + values[0] = Int64GetDatum((uint64) 0); /* Extract (possibly empty) list of TIDs in this page. */ for (i = 0; i REGULAR_REVMAP_PAGE_MAXITEMS; i++) diff --git a/contrib/pageinspect/pageinspect--1.2.sql b/contrib/pageinspect/pageinspect--1.2.sql index 56c9ba8..cba90ca 100644 --- a/contrib/pageinspect/pageinspect--1.2.sql +++ b/contrib/pageinspect/pageinspect--1.2.sql @@ -110,7 +110,7 @@ LANGUAGE C STRICT; --
Re: [HACKERS] Supporting Windows SChannel as OpenSSL replacement
On 08/15/2014 08:16 PM, Jeff Janes wrote: On Tue, Aug 12, 2014 at 10:52 AM, Heikki Linnakangas hlinnakan...@vmware.com wrote: On 08/06/2014 08:37 PM, Jeff Janes wrote: But now it looks like 0002 needs a rebase I've committed the refactoring patch, and here's a rebased and improved version of the Windows SChannel implementation over that. On MinGW, I get the following error when compiling with options --host=x86_64-w64-mingw32 --without-zlib: be-secure.c: In function 'secure_open_server': be-secure.c:106:2: error: 'Port' has no member named 'peer_cn' be-secure.c:106:2: error: 'Port' has no member named 'peer_cn' make[3]: *** [be-secure.o] Error 1 make[2]: *** [libpq-recursive] Error 2 make[1]: *** [all-backend-recurse] Error 2 make: *** [all-src-recurse] Error 2 Should the ereport DEBUG2 be inside the #ifdef USE_SSL? Yeah. I've been thinking though, perhaps we should always have the ssl_in_use, peer_cn and peer_cert_valid members in the Port struct. If not compiled with USE_SSL, they would just always be false/NULL. Then we wouldn't need #ifdefs around all the places that check hose fields either. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 9.5: Memory-bounded HashAgg
On Thu, Aug 14, 2014 at 2:21 PM, Jeff Davis pg...@j-davis.com wrote: On Thu, 2014-08-14 at 12:53 -0400, Tom Lane wrote: Oh? So if we have aggregates like array_agg whose memory footprint increases over time, the patch completely fails to avoid bloat? Yes, in its current form. I might think a patch with such a limitation was useful, if it weren't for the fact that aggregates of that nature are a large part of the cases where the planner misestimates the table size in the first place. Any complication that we add to nodeAgg should be directed towards dealing with cases that the planner is likely to get wrong. In my experience, the planner has a lot of difficulty estimating the cardinality unless it's coming from a base table without any operators above it (other than maybe a simple predicate). This is probably a lot more common than array_agg problems, simply because array_agg is relatively rare compared with GROUP BY in general. I think that's right, and I rather like your (Jeff's) approach. It's definitely true that we could do better if we have a mechanism for serializing and deserializing group states, but (1) I think an awful lot of cases would get an awful lot better even just with the approach proposed here and (2) I doubt we would make the serialization/deserialization interfaces mandatory, so even if we had that we'd probably want a fallback strategy anyway. Furthermore, I don't really see that we're backing ourselves into a corner here. If prohibiting creation of additional groups isn't sufficient to control memory usage, but we have serialization/deserialization functions, we can just pick an arbitrary subset of the groups that we're storing in memory and spool their transition states off to disk, thus reducing memory even further. I understand Tomas' point to be that this is quite different from what we do for hash joins, but I think it's a different problem. In the case of a hash join, there are two streams of input tuples, and we've got to batch them in compatible ways. If we were to, say, exclude an arbitrary subset of tuples from the hash table instead of basing it on the hash code, we'd have to test *every* outer tuple against the hash table for *every* batch. That would incur a huge amount of additional cost vs. being able to discard outer tuples once the batch to which they pertain has been processed. But the situation here isn't comparable, because there's only one input stream. I'm pretty sure we'll want to keep track of which transition states we've spilled due to lack of memory as opposed to those which were never present in the table at all, so that we can segregate the unprocessed tuples that pertain to spilled transition states from the ones that pertain to a group we haven't begun yet. And it might be that if we know (or learn as we go along) that we're going to vastly blow out work_mem it makes sense to use batching to segregate the tuples that we decide not to process onto N tapes binned by hash code, so that we have a better chance that future batches will be the right size to fit in memory. But I'm not convinced that there's a compelling reason why the *first* batch has to be chosen by hash code; we're actually best off picking any arbitrary set of groups that does the best job reducing the amount of data remaining to be processed, at least if the transition states are fixed size and maybe even if they aren't. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ALTER TABLESPACE MOVE command tag tweak
On Fri, Aug 15, 2014 at 9:57 AM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: On Thu, Aug 14, 2014 at 6:12 PM, Tom Lane t...@sss.pgh.pa.us wrote: If it weren't for the fact that we'll be wedded forevermore to a bad choice of syntax, I might agree with you. But at this point, the alternatives we have are to fix it now, or fix it never. I don't like #2. Or postpone the release for another month or two. There's still a few other unresolved issues, too, like the problems with psql and expanded mode; and the JSONB toast problems. The latter is relatively new, but we don't have a proposed patch for it yet unless there's on in an email I haven't read yet, and the former has been lingering for many months without getting appreciably closer to a resolution. Yeah, if we do anything about JSONB we are certainly going to need another beta release, which means final couldn't be before end of Sept. at the very earliest. Still, it's always been project policy that we ship when it's ready, not when the calendar hits some particular point. Absolutely. As for the expanded-mode changes, I thought there was consensus to revert that from 9.4. Me too. In fact, I think that's been the consensus for many months, but unless I'm mistaken it ain't done. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Minmax indexes
Fujii Masao wrote: I've not read the patch yet. But while testing the feature, I found that * Brin index cannot be created on CHAR(n) column. Maybe other data types have the same problem. Yeah, it's just a matter of adding an opclass for it -- pretty simple stuff really, because you don't need to write any code, just add a bunch of catalog entries and an OPCINFO line in mmsortable.c. Right now there are opclasses for the following types: int4 numeric text date timestamp with time zone timestamp time with time zone time char We can eventually extend to cover all types that have btree opclasses, but we can do that in a separate commit. I'm also considering removing the opclass for time with time zone, as it's a pretty useless type. I mostly added the ones that are there as a way to test that it behaved reasonably in the various cases (pass by val vs. not, variable width vs. fixed, different alignment requirements) Of course, the real interesting part is adding a completely different opclass, such as one that stores bounding boxes. * FILLFACTOR cannot be set in brin index. I hadn't added this one because I didn't think there was much point previously, but I think it might now be useful to allow same-page updates. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposal to add a QNX 6.5 port to PostgreSQL
On Fri, Aug 15, 2014 at 12:02 PM, Tom Lane t...@sss.pgh.pa.us wrote: * I think 5..8 are overly complex: we can just set SIGPIPE to SIG_IGN (which is its usual setting in the postmaster already) and check for EPIPE from the write(). wfm. * There might be some benefit to swapping steps 9 and 10; at the very least, this would eliminate the need to use O_NONBLOCK while re-opening for read. Also wfm. * We talked about combining this technique with a plain file lock so that we would have belt-and-suspenders protection, in particular something that would have a chance of working across NFS clients. This would suggest leaving the fcntl lock in place, ie, don't do step 11, and also that the file-to-be-locked *not* have any other purpose (which would only increase the risk of losing the lock through careless open/close). I'd be afraid that a secondary mechanism that mostly-but-not-really works could do more harm by allowing us to miss bugs in the primary, pipe-based locking mechanism than the good it would accomplish. Regular backends don't need to do anything special, except that they need to make sure that the file descriptor opened in step 8 gets inherited by the right set of processes. That means that the close-on-exec flag should be turned on in the postmaster; except in EXEC_BACKEND builds, where it should be turned off but then turned on again by child processes before they do anything that might fork. Meh. Do we really want to allow a new postmaster to start if there are any processes remaining that were launched by backends? I'd be inclined to just suppress close-on-exec, period. Seems like a pretty weird and artificial restriction. Anything that has done exec() will not be connected to shared memory, so it really doesn't matter whether it's still alive or not. People can and do write extensions that launch processes from PostgreSQL backends via fork()+exec(), and we've taken pains in the past not to break such cases. I don't see a reason to impose now (for no data-integrity-related reason) the rule that any such processes must not be daemons. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] How about a proper TEMPORARY TABLESPACE?
On Sun, Jun 29, 2014 at 3:19 AM, Matheus de Oliveira matioli.math...@gmail.com wrote: Hi Hackers, I have worked on that patch a little more. So now I have functional patch (although still WIP) attached. The feature works as following: - Added a boolean parameter only_temp_files to pg_tablespace.spcoptions; - This parameter can be set to true only during CREATE TABLESPACE, not on ALTER TABLESPACE (I have thought of ways of implementing the latter, and I'd like to discuss it more latter); - On the creation of relations, it is checked if it is a temporary-tablespace, and an error occurs when it is and the relation is not temporary (temp table or index on a temp table); - When a temporary file (either relation file or sort/agg file) is created inside a temporary-tablespace, the entire directories structure is created on-demand (e.g. if pg_tblspc/oid/TABLESPACE_VERSION_DIRECTORY is missing, it is created on demand) it is done on OpenTemporaryFileInTablespace, at fd.c (I wonder if shouldn't we do that for any tablespace) and on TablespaceCreateDbspace, at tablespace.c. I still haven't change documentation, as I think I need some insights about the changes. I have some more thoughts about the syntax and I still think that TEMP LOCATION syntax is better suited for this patch. First because of the nature of the changes I made, it seems more suitable to a column on pg_tablespace rather than an option. Second because no ALTER is available (so far) and I think it is odd to have an option that can't be changed. Third, I think TEMP keyword is more clear and users can be more used to it. Thoughts? I'm going to add the CF app entry next. Could I get some review now or after discussion about how things are going (remember I'm a newbie on this, so I'm a little lost)? I got the following warning messages at the compilation. dbcommands.c:420: warning: implicit declaration of function 'is_tablespace_temp_only' indexcmds.c:437: warning: implicit declaration of function 'is_tablespace_temp_only' tablecmds.c:527: warning: implicit declaration of function 'is_tablespace_temp_only' When I created temporary tablespace and executed pg_dumpall, it generated the SQL ALTER TABLESPACE hoge SET (only_temp_files=on);. This is clearly a bug because that SQL is not allowed to be executed. ERROR: this tablespace only allows temporary files I think that this ERROR message can be improved, for example, for the database creation case, what about something like the following? ERROR: database cannot be created on temporary tablespace HINT: temporary tablespace is allowed to store only temporary files. Regards, -- Fujii Masao -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] jsonb format is pessimal for toast compression
On 08/14/2014 07:24 PM, Tom Lane wrote: We can certainly reduce that. The question was whether it would be worth the effort to try. At this point, with three different test data sets having shown clear space savings, I think it is worth the effort. I'll poke into it tomorrow or over the weekend, unless somebody beats me to it. Note that I specifically created that data set to be a worst case: many top-level keys, no nesting, and small values. However, I don't think it's an unrealistic worst case. Interestingly, even on the unpatched, 1GB table case, the *index* on the JSONB is only 60MB. Which shows just how terrific the improvement in GIN index size/performance is. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] jsonb format is pessimal for toast compression
Josh Berkus j...@agliodbs.com writes: On 08/14/2014 07:24 PM, Tom Lane wrote: We can certainly reduce that. The question was whether it would be worth the effort to try. At this point, with three different test data sets having shown clear space savings, I think it is worth the effort. I'll poke into it tomorrow or over the weekend, unless somebody beats me to it. Note that I specifically created that data set to be a worst case: many top-level keys, no nesting, and small values. However, I don't think it's an unrealistic worst case. Interestingly, even on the unpatched, 1GB table case, the *index* on the JSONB is only 60MB. Which shows just how terrific the improvement in GIN index size/performance is. I've been poking at this, and I think the main explanation for your result is that with more JSONB documents being subject to compression, we're spending more time in pglz_decompress. There's no free lunch in that department: if you want compressed storage it's gonna cost ya to decompress. The only way I can get decompression and TOAST access to not dominate the profile on cases of this size is to ALTER COLUMN SET STORAGE PLAIN. However, when I do that, I do see my test patch running about 25% slower overall than HEAD on an explain analyze select jfield - 'key' from table type of query with 200-key documents with narrow fields (see attached perl script that generates the test data). It seems difficult to improve much on that for this test case. I put some logic into findJsonbValueFromContainer to calculate the offset sums just once not once per binary-search iteration, but that only improved matters 5% at best. I still think it'd be worth modifying the JsonbIterator code to avoid repetitive offset calculations, but that's not too relevant to this test case. Having said all that, I think this test is something of a contrived worst case. More realistic cases are likely to have many fewer keys (so that speed of the binary search loop is less of an issue) or else to have total document sizes large enough that inline PLAIN storage isn't an option, meaning that detoast+decompression costs will dominate. regards, tom lane #! /usr/bin/perl for (my $i = 0; $i 10; $i++) { print {; for (my $k = 1; $k = 200; $k++) { print , if $k 1; printf \k%d\: %d, $k, int(rand(10)); } print }\n; } -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Supporting Windows SChannel as OpenSSL replacement
Heikki Linnakangas hlinnakan...@vmware.com writes: On 08/15/2014 08:16 PM, Jeff Janes wrote: Should the ereport DEBUG2 be inside the #ifdef USE_SSL? Yeah. I've been thinking though, perhaps we should always have the ssl_in_use, peer_cn and peer_cert_valid members in the Port struct. If not compiled with USE_SSL, they would just always be false/NULL. Then we wouldn't need #ifdefs around all the places that check hose fields either. +1. This would also make it less risky for add-on code to touch the Port struct. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] jsonb format is pessimal for toast compression
I'm still getting up to speed on postgres development but I'd like to leave an opinion. We should add some sort of versionning to the jsonb format. This can be explored in the future in many ways. As for the current problem, we should explore the directory at the end option. It should improve compression and keep good access performance. A 4 byte header is sufficient to store the directory offset and some versionning bits. Em 15/08/2014 17:39, Tom Lane t...@sss.pgh.pa.us escreveu: Josh Berkus j...@agliodbs.com writes: On 08/14/2014 07:24 PM, Tom Lane wrote: We can certainly reduce that. The question was whether it would be worth the effort to try. At this point, with three different test data sets having shown clear space savings, I think it is worth the effort. I'll poke into it tomorrow or over the weekend, unless somebody beats me to it. Note that I specifically created that data set to be a worst case: many top-level keys, no nesting, and small values. However, I don't think it's an unrealistic worst case. Interestingly, even on the unpatched, 1GB table case, the *index* on the JSONB is only 60MB. Which shows just how terrific the improvement in GIN index size/performance is. I've been poking at this, and I think the main explanation for your result is that with more JSONB documents being subject to compression, we're spending more time in pglz_decompress. There's no free lunch in that department: if you want compressed storage it's gonna cost ya to decompress. The only way I can get decompression and TOAST access to not dominate the profile on cases of this size is to ALTER COLUMN SET STORAGE PLAIN. However, when I do that, I do see my test patch running about 25% slower overall than HEAD on an explain analyze select jfield - 'key' from table type of query with 200-key documents with narrow fields (see attached perl script that generates the test data). It seems difficult to improve much on that for this test case. I put some logic into findJsonbValueFromContainer to calculate the offset sums just once not once per binary-search iteration, but that only improved matters 5% at best. I still think it'd be worth modifying the JsonbIterator code to avoid repetitive offset calculations, but that's not too relevant to this test case. Having said all that, I think this test is something of a contrived worst case. More realistic cases are likely to have many fewer keys (so that speed of the binary search loop is less of an issue) or else to have total document sizes large enough that inline PLAIN storage isn't an option, meaning that detoast+decompression costs will dominate. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] GIST create index very very slow
CREATE INDEX USING GIST(timerange); On 1.3 million rows this took only 30 seconds. on 70 million its already taken over a day. I swear it didn't take this long on version 9.3 Is there some kind of known bug with GIST? CPU is at 4% or less and ram is at 150mbs IO usage is at 100% but most of it is writes? (like 3.5mbps!) which looks good but actually the size of the disk is only increasing by like 8 BYTES per second. This is really odd and I don't want to wait an indefinite amount of time. -- View this message in context: http://postgresql.1045698.n5.nabble.com/GIST-create-index-very-very-slow-tp5815011.html Sent from the PostgreSQL - hackers mailing list archive at Nabble.com. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] New Minmax index for geometry data type?
Hi Alvaro 2014-08-15 20:16 GMT+02:00 you answered: (...) Yeah, it's just a matter of adding an opclass for it -- pretty simple (...) Right now there are opclasses for the following types: (...) Of course, the real interesting part is adding a completely different opclass, such as one that stores bounding boxes. That was exactly what I was going to ask you regarding support of minmax (block range) index for GEOMETRY types (as defined in PostGIS): 1. What would be the advantage of such a minmax index over GiST besides size? 2. Are the plans to implement this? 3. If no, how large would you estimate the efforts to implement this in days for an experienced programmer like you? Yours, Stefan 2014-08-15 20:16 GMT+02:00 Alvaro Herrera alvhe...@2ndquadrant.com: Fujii Masao wrote: I've not read the patch yet. But while testing the feature, I found that * Brin index cannot be created on CHAR(n) column. Maybe other data types have the same problem. Yeah, it's just a matter of adding an opclass for it -- pretty simple stuff really, because you don't need to write any code, just add a bunch of catalog entries and an OPCINFO line in mmsortable.c. Right now there are opclasses for the following types: int4 numeric text date timestamp with time zone timestamp time with time zone time char We can eventually extend to cover all types that have btree opclasses, but we can do that in a separate commit. I'm also considering removing the opclass for time with time zone, as it's a pretty useless type. I mostly added the ones that are there as a way to test that it behaved reasonably in the various cases (pass by val vs. not, variable width vs. fixed, different alignment requirements) Of course, the real interesting part is adding a completely different opclass, such as one that stores bounding boxes. * FILLFACTOR cannot be set in brin index. I hadn't added this one because I didn't think there was much point previously, but I think it might now be useful to allow same-page updates. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [Bad Attachment] Re: [HACKERS] jsonb format is pessimal for toast compression
On 08/15/2014 01:38 PM, Tom Lane wrote: I've been poking at this, and I think the main explanation for your result is that with more JSONB documents being subject to compression, we're spending more time in pglz_decompress. There's no free lunch in that department: if you want compressed storage it's gonna cost ya to decompress. The only way I can get decompression and TOAST access to not dominate the profile on cases of this size is to ALTER COLUMN SET STORAGE PLAIN. However, when I do that, I do see my test patch running about 25% slower overall than HEAD on an explain analyze select jfield - 'key' from table type of query with 200-key documents with narrow fields (see attached perl script that generates the test data). Ok, that probably falls under the heading of acceptable tradeoffs then. Having said all that, I think this test is something of a contrived worst case. More realistic cases are likely to have many fewer keys (so that speed of the binary search loop is less of an issue) or else to have total document sizes large enough that inline PLAIN storage isn't an option, meaning that detoast+decompression costs will dominate. This was intended to be a worst case. However, I don't think that it's the last time we'll see the case of having 100 to 200 keys each with short values. That case was actually from some XML data which I'd already converted into a regular table (hence every row having 183 keys), but if JSONB had been available when I started the project, I might have chosen to store it as JSONB instead. It occurs to me that the matching data from a personals website would very much fit the pattern of having between 50 and 200 keys, each of which has a short value. So we don't need to *optimize* for that case, but it also shouldn't be disastrously slow or 300% of the size of comparable TEXT. Mind you, I don't find +80% to be disastrously slow (especially not with a space savings of 60%), so maybe that's good enough. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] strncpy is not a safe version of strcpy
On Thu, Aug 14, 2014 at 4:13 PM, Noah Misch n...@leadboat.com wrote: I share your (Kevin's) discomfort with our use of strlcpy(). I wouldn't mind someone replacing most strlcpy()/snprintf() calls with calls to wrappers that ereport(ERROR) on truncation. Though as reliability problems go, this one has been minor. Or maybe it would be better to just remove the restriction and just palloc something of the correct size? Although, that sounds like a much larger patch. I'd vote that the strlcpy should be used in the meantime. Regards David Rowley
Re: [HACKERS] jsonb format is pessimal for toast compression
Arthur Silva arthur...@gmail.com writes: We should add some sort of versionning to the jsonb format. This can be explored in the future in many ways. If we end up making an incompatible change to the jsonb format, I would support taking the opportunity to stick a version ID in there. But I don't want to force a dump/reload cycle *only* to do that. As for the current problem, we should explore the directory at the end option. It should improve compression and keep good access performance. Meh. Pushing the directory to the end is just a band-aid, and since it would still force a dump/reload, it's not a very enticing band-aid. The only thing it'd really fix is the first_success_by issue, which we could fix *without* a dump/reload by using different compression parameters for jsonb. Moving the directory to the end, by itself, does nothing to fix the problem that the directory contents aren't compressible --- and we now have pretty clear evidence that that is a significant issue. (See for instance Josh's results that increasing first_success_by did very little for the size of his dataset.) I think the realistic alternatives at this point are either to switch to all-lengths as in my test patch, or to use the hybrid approach of Heikki's test patch. IMO the major attraction of Heikki's patch is that it'd be upward compatible with existing beta installations, ie no initdb required (but thus, no opportunity to squeeze in a version identifier either). It's not showing up terribly well in the performance tests I've been doing --- it's about halfway between HEAD and my patch on that extract-a-key-from-a-PLAIN-stored-column test. But, just as with my patch, there are things that could be done to micro-optimize it by touching a bit more code. I did some quick stats comparing compressed sizes for the delicio.us data, printing quartiles as per Josh's lead: all-lengths {440,569,609,655,1257} Heikki's patch {456,582,624,671,1274} HEAD{493,636,684,744,1485} (As before, this is pg_column_size of the jsonb within a table whose rows are wide enough to force tuptoaster.c to try to compress the jsonb; otherwise many of these values wouldn't get compressed.) These documents don't have enough keys to trigger the first_success_by issue, so that HEAD doesn't look too awful, but still there's about an 11% gain from switching from offsets to lengths. Heikki's method captures much of that but not all. Personally I'd prefer to go to the all-lengths approach, but a large part of that comes from a subjective assessment that the hybrid approach is too messy. Others might well disagree. In case anyone else wants to do measurements on some more data sets, attached is a copy of Heikki's patch updated to apply against git tip. regards, tom lane diff --git a/src/backend/utils/adt/jsonb_util.c b/src/backend/utils/adt/jsonb_util.c index 04f35bf..47b2998 100644 *** a/src/backend/utils/adt/jsonb_util.c --- b/src/backend/utils/adt/jsonb_util.c *** convertJsonbArray(StringInfo buffer, JEn *** 1378,1385 errmsg(total size of jsonb array elements exceeds the maximum of %u bytes, JENTRY_POSMASK))); ! if (i 0) meta = (meta ~JENTRY_POSMASK) | totallen; copyToBuffer(buffer, metaoffset, (char *) meta, sizeof(JEntry)); metaoffset += sizeof(JEntry); } --- 1378,1387 errmsg(total size of jsonb array elements exceeds the maximum of %u bytes, JENTRY_POSMASK))); ! if (i % JBE_STORE_LEN_STRIDE == 0) meta = (meta ~JENTRY_POSMASK) | totallen; + else + meta |= JENTRY_HAS_LEN; copyToBuffer(buffer, metaoffset, (char *) meta, sizeof(JEntry)); metaoffset += sizeof(JEntry); } *** convertJsonbObject(StringInfo buffer, JE *** 1430,1440 errmsg(total size of jsonb array elements exceeds the maximum of %u bytes, JENTRY_POSMASK))); ! if (i 0) meta = (meta ~JENTRY_POSMASK) | totallen; copyToBuffer(buffer, metaoffset, (char *) meta, sizeof(JEntry)); metaoffset += sizeof(JEntry); convertJsonbValue(buffer, meta, pair-value, level); len = meta JENTRY_POSMASK; totallen += len; --- 1432,1445 errmsg(total size of jsonb array elements exceeds the maximum of %u bytes, JENTRY_POSMASK))); ! if (i % JBE_STORE_LEN_STRIDE == 0) meta = (meta ~JENTRY_POSMASK) | totallen; + else + meta |= JENTRY_HAS_LEN; copyToBuffer(buffer, metaoffset, (char *) meta, sizeof(JEntry)); metaoffset += sizeof(JEntry); + /* put value */ convertJsonbValue(buffer, meta, pair-value, level); len = meta JENTRY_POSMASK; totallen += len; *** convertJsonbObject(StringInfo buffer, JE *** 1445,1451 errmsg(total size of jsonb array elements exceeds the maximum of %u bytes, JENTRY_POSMASK))); ! meta = (meta ~JENTRY_POSMASK) | totallen; copyToBuffer(buffer, metaoffset, (char *) meta,
Re: [HACKERS] Removing dependency to wsock32.lib when compiling code on WIndows
On Fri, Aug 15, 2014 at 12:49:36AM -0700, Michael Paquier wrote: Btw, how do you determine if MSVC is using HAVE_GETADDRINFO? Is it decided by the inclusion of getaddrinfo.c in @pgportfiles of Mkvdbuild.pm? src/include/pg_config.h.win32 dictates it, and we must keep @pgportfiles synchronized with the former's verdict. -- Noah Misch EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] strncpy is not a safe version of strcpy
On Sat, Aug 16, 2014 at 10:38:39AM +1200, David Rowley wrote: On Thu, Aug 14, 2014 at 4:13 PM, Noah Misch n...@leadboat.com wrote: I share your (Kevin's) discomfort with our use of strlcpy(). I wouldn't mind someone replacing most strlcpy()/snprintf() calls with calls to wrappers that ereport(ERROR) on truncation. Though as reliability problems go, this one has been minor. Or maybe it would be better to just remove the restriction and just palloc something of the correct size? Although, that sounds like a much larger patch. I'd vote that the strlcpy should be used in the meantime. I agree that, in principle, dynamic allocation might be better still. I also agree that it would impose more code churn, for an awfully-narrow benefit. Barring objections, I will commit your latest patch with some comments about why truncation is harmless for those two particular calls. -- Noah Misch EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Sample LDIF for pg_service.conf no longer works
When using pg_service.conf with LDAP, we document[1] the following sample LDIF for populating the LDAP server: version:1 dn:cn=mydatabase,dc=mycompany,dc=com changetype:add objectclass:top objectclass:groupOfUniqueNames cn:mydatabase uniqueMember:host=dbserver.mycompany.com uniqueMember:port=5439 uniqueMember:dbname=mydb uniqueMember:user=mydb_user uniqueMember:sslmode=require That presumably worked at one point, but OpenLDAP 2.4.23 and OpenLDAP 2.4.39 both reject it cryptically: ldap_add: Invalid syntax (21) additional info: uniqueMember: value #0 invalid per syntax uniqueMember is specified to bear a distinguished name. While OpenLDAP does not verify that uniqueMember values correspond to known DNs, it does verify that the value syntactically could be a DN. To give examples, o=foobar is always accepted, but xyz=foobar is always rejected: xyz is not an LDAP DN attribute type. Amid the LDAP core schema, device is the best-fitting objectClass having the generality required. Let's convert to that, as attached. I have verified that this works end-to-end. Thanks, nm [1] http://www.postgresql.org/docs/devel/static/libpq-ldap.html -- Noah Misch EnterpriseDB http://www.enterprisedb.com commit 957b58f (HEAD) Author: Noah Misch n...@leadboat.com AuthorDate: Mon Aug 4 00:03:21 2014 -0400 Commit: Noah Misch n...@leadboat.com CommitDate: Mon Aug 4 00:03:21 2014 -0400 Make pg_service.conf sample LDIF more portable. The aboriginal sample placed connection parameters in groupOfUniqueNames/uniqueMember. OpenLDAP, at least as early as version 2.4.23, rejects uniqueMember entries that do not conform to the syntax for a distinguished name. Use device/description, which is free-form. Back-patch to 9.4 for web site visibility. diff --git a/doc/src/sgml/libpq.sgml b/doc/src/sgml/libpq.sgml index db634a8..ef45fbf 100644 --- a/doc/src/sgml/libpq.sgml +++ b/doc/src/sgml/libpq.sgml @@ -7043,17 +7043,17 @@ version:1 dn:cn=mydatabase,dc=mycompany,dc=com changetype:add objectclass:top -objectclass:groupOfUniqueNames +objectclass:device cn:mydatabase -uniqueMember:host=dbserver.mycompany.com -uniqueMember:port=5439 -uniqueMember:dbname=mydb -uniqueMember:user=mydb_user -uniqueMember:sslmode=require +description:host=dbserver.mycompany.com +description:port=5439 +description:dbname=mydb +description:user=mydb_user +description:sslmode=require /programlisting might be queried with the following LDAP URL: programlisting -ldap://ldap.mycompany.com/dc=mycompany,dc=com?uniqueMember?one?(cn=mydatabase) +ldap://ldap.mycompany.com/dc=mycompany,dc=com?description?one?(cn=mydatabase) /programlisting /para -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers