Re: [HACKERS] pg_archivecleanup, and backup filename to specify as an argument
On Thu, Jul 2, 2015 at 8:02 PM, Fujii Masao masao.fu...@gmail.com wrote: Hi, While I'm implementing the patch around pg_archivecleanup, I found the following warning about pg_archivecleanup in the document. Note that the .backup file name passed to the program should not include the extension. ISTM that pg_archivecleanup works as expected even if the .backup file with the extension is specified as follows. So, isn't this warning already obsolete? Or I'm missing something? $ pg_archivecleanup -d -x .zip myarchive 00010009.0028.backup.zip pg_archivecleanup: keep WAL file myarchive/00010009 and later pg_archivecleanup: removing file myarchive/00010005.zip pg_archivecleanup: removing file myarchive/00010003.zip pg_archivecleanup: removing file myarchive/00010001.zip pg_archivecleanup: removing file myarchive/00010007.zip pg_archivecleanup: removing file myarchive/00010006.zip pg_archivecleanup: removing file myarchive/00010004.zip pg_archivecleanup: removing file myarchive/00010002.zip pg_archivecleanup: removing file myarchive/00010008.zip Even in 9.2 where -x option was added firstly, I confirmed that I could pass the .backup file name including the extension to pg_archivecleanup program. So I'm wondering if the warning in question was incorrect from the beginning... Or am I missing something? In the past thread of -x option patch, I found the following Robert comment. This makes me think that we unfortunately failed to notice his comment and finally added the unnecessary warning into the document. - http://www.postgresql.org/message-id/CA+TgmoZDYD_W7K_S1ZuEnqVNOaRWYCX=eetx+r27vb7akrr...@mail.gmail.com Also, I'm wondering about this warning in the documentation: +extension added by the compression program. Note that the +filename.backup/ file name passed to the program should not +include the extension. IIUC, the latest change you made makes that warning obsolete, no? [rhaas pgsql]$ contrib/pg_archivecleanup/pg_archivecleanup -d -x .gz . 00010010.0020.backup.gz pg_archivecleanup: keep WAL file ./00010010 and later - Regards, -- Fujii Masao -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [BUGS] BUG #13126: table constraint loses its comment
On Thu, Jul 2, 2015 at 11:16 PM, Petr Jelinek p...@2ndquadrant.com wrote: I was going through the code and have few comments: - Why do you change the return value of TryReuseIndex? Can't we use reuse the same OidIsValid(stmt-oldNode) check that ATExecAddIndex is doing instead? As pointed out by Heikki previously, that is actually unnecessary, comments are still lost even if the index is reused for constraints. So perhaps for simplicity we could just unconditionally recreate the comments all the time if they are available. - I think the changes to ATPostAlterTypeParse should follow more closely the coding of transformTableLikeClause - namely use the idxcomment I am not sure I follow here. Could you elaborate? - Also the changes to ATPostAlterTypeParse move the code somewhat uncomfortably over the 80 char width, I don't really see a way to fix that except for moving some of the nesting out to another function. Yes, I did some refactoring here by adding a set of new routines dedicated to attach generated commands to the correct queue. This way the code is empty of large switch/case blocks. Update patch is attached, with the issue reported by Heikki upthread fixed as well. Regards, -- Michael From 2244608d3d06b89202b506f31b00a7d58a57f9c5 Mon Sep 17 00:00:00 2001 From: Michael Paquier michael@otacoo.com Date: Fri, 3 Jul 2015 22:47:22 +0900 Subject: [PATCH] Ensure COMMENT persistency of indexes and constraint with ALTER TABLE When rewriting a table, in some cases indexes and constraints present on it need to be recreated from scratch, making any existing comment entry, as known as a description in pg_description, disappear after ALTER TABLE. This commit fixes this issue by tracking the existing constraint, indexes, and combinations of both when running ALTER TABLE and recreate COMMENT entries when appropriate. A set of regression tests is added to test all the new code paths added. --- src/backend/commands/tablecmds.c | 287 ++ src/include/nodes/parsenodes.h| 1 + src/test/regress/expected/alter_table.out | 95 ++ src/test/regress/sql/alter_table.sql | 37 4 files changed, 346 insertions(+), 74 deletions(-) diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c index d394713..78e6b5c 100644 --- a/src/backend/commands/tablecmds.c +++ b/src/backend/commands/tablecmds.c @@ -316,6 +316,13 @@ static void ATRewriteTables(AlterTableStmt *parsetree, List **wqueue, LOCKMODE lockmode); static void ATRewriteTable(AlteredTableInfo *tab, Oid OIDNewHeap, LOCKMODE lockmode); static AlteredTableInfo *ATGetQueueEntry(List **wqueue, Relation rel); +static void ATAttachQueueCommand(Oid oldId, Oid refRelId, List **wqueue, +Node *stm, Relation rel, bool rewrite); +static void ATAttachQueueIndexStmt(Oid oldId, List **wqueue, +IndexStmt *stmt, Relation rel, bool rewrite); +static void ATAttachQueueAlterTableStmt(Oid oldId, Oid refRelId, + List **wqueue, AlterTableStmt *stmt, + Relation rel, bool rewrite); static void ATSimplePermissions(Relation rel, int allowed_targets); static void ATWrongRelkindError(Relation rel, int allowed_targets); static void ATSimpleRecursion(List **wqueue, Relation rel, @@ -386,6 +393,12 @@ static void ATPostAlterTypeCleanup(List **wqueue, AlteredTableInfo *tab, static void ATPostAlterTypeParse(Oid oldId, Oid oldRelId, Oid refRelId, char *cmd, List **wqueue, LOCKMODE lockmode, bool rewrite); +static void RebuildObjectComment(AlteredTableInfo *tab, + int cmdidx, + ObjectType objtype, + Oid objid, + Oid classoid, + List *objname); static void TryReuseIndex(Oid oldId, IndexStmt *stmt); static void TryReuseForeignKey(Oid oldId, Constraint *con); static void change_owner_fix_column_acls(Oid relationOid, @@ -3498,6 +3511,9 @@ ATExecCmd(List **wqueue, AlteredTableInfo *tab, Relation rel, address = ATExecAddIndex(tab, rel, (IndexStmt *) cmd-def, true, lockmode); break; + case AT_ReAddComment: /* Re-add existing comment */ + CommentObject((CommentStmt *) cmd-def); + break; case AT_AddConstraint: /* ADD CONSTRAINT */ address = ATExecAddConstraint(wqueue, tab, rel, (Constraint *) cmd-def, @@ -4251,6 +4267,162 @@ ATGetQueueEntry(List **wqueue, Relation rel) return tab; } + +/* + * ATAttachQueueCommand + * + * Attach each generated command to the proper place in the work queue. + * Note this could result in creation of entirely new work-queue entries. + * + * Also note that the command subtypes have to be tweaked, because it + * turns out that re-creation of indexes and constraints has to act a bit + * differently from initial creation. + */ +static void +ATAttachQueueCommand(Oid oldId, Oid refRelId, List **wqueue, + Node *stm, Relation rel, bool rewrite) +{ + switch (nodeTag(stm)) + { + case T_IndexStmt: + ATAttachQueueIndexStmt(oldId, wqueue, + (IndexStmt *) stm, rel,
Re: [HACKERS] [PATCH] Generalized JSON output functions
On 07/03/2015 06:27 AM, Heikki Linnakangas wrote: On 05/27/2015 09:51 PM, Andrew Dunstan wrote: On 05/27/2015 02:37 PM, Robert Haas wrote: On Tue, May 26, 2015 at 2:50 AM, Shulgin, Oleksandr oleksandr.shul...@zalando.de wrote: Is it reasonable to add this patch to CommitFest now? It's always reasonable to add a patch to the CommitFest if you would like for it to be reviewed and avoid having it get forgotten about. There seems to be some disagreement about whether we want this, but don't let that stop you from adding it to the next CommitFest. I'm not dead set against it either. When I have time I will take a closer look. Andrew, will you have the time to review this? Please add yourself as reviewer in the commitfest app if you do. My 2 cents is that I agree with your initial reaction: This is a lot of infrastructure and generalizing things, for little benefit. Let's change the current code where we generate JSON to be consistent with whitespace, and call it a day. - Heikki I'm somewhat on vacation for the next week or so, so I won't claim it, but I'll try to make time to look at it. Other people (Merlin?) could also provide reviews. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Re: [HACKERS] GSoC 2015 proposal: Improve the performance of “ALTER TABLE .. SET LOGGED / UNLOGGED” statement
On Fri, Jul 3, 2015 at 4:48 PM, Fabrízio de Royes Mello fabriziome...@gmail.com wrote: On Thu, Apr 2, 2015 at 3:24 PM, Robert Haas robertmh...@gmail.com wrote: On Wed, Mar 25, 2015 at 9:46 PM, Fabrízio de Royes Mello fabriziome...@gmail.com wrote: http://www.postgresql.org/message-id/ca+tgmozm+-0r7h0edpzzjbokvvq+gavkchmno4fypveccw-...@mail.gmail.com I like the idea of the feature a lot, but the proposal to which you refer here mentions some problems, which I'm curious how you think you might solve. (I don't have any good ideas myself, beyond what I mentioned there.) You're right, and we have another design (steps 1 and 2 was implemented last year): *** ALTER TABLE changes 1) ATController 1.1) Acquire an AccessExclusiveLock (src/backend/commands/tablecmds.c - AlterTableGetLockLevel:3023) 2) Prepare to change relpersistence (src/backend/commands/tablecmds.c - ATPrepCmd:3249-3270) • check temp table (src/backend/commands/tablecmds.c - ATPrepChangePersistence:11074) • check foreign key constraints (src/backend/commands/tablecmds.c - ATPrepChangePersistence:11102) 3) FlushRelationBuffers, DropRelFileNodeBuffers and smgrimmedsync (MAIN_FORKNUM, FSM_FORKNUM, VISIBILITYMAP_FORKNUM and INIT_FORKNUM if exists) 4) Create a new fork called TRANSIENT INIT FORK: • from Unlogged to Logged (create _initl fork) (INIT_TO_LOGGED_FORKNUM) ∘ new forkName (src/common/relpath.c) called _initl ∘ insert XLog record to drop it if transaction abort • from Logged to Unlogged (create _initu fork) (INIT_TO_UNLOGGED_FORKUM) ∘ new forkName (src/common/relpath.c) called _initu ∘ insert XLog record to drop it if transaction abort AFAIU, while reading WAL, the server doesn't know whether the transaction that produced that WAL record aborted or committed. It's only when it sees a COMMIT/ABORT record down the line, it can confirm whether the transaction committed or aborted. So, one can only redo the things that WAL tells have been done. We can not undo things based on the WAL. We might record this fact somewhere while reading the WAL and act on it once we know the status of the transaction, but I do not see that as part of this idea. This comment applies to all the steps inserting WALs for undoing things. 5) Change the relpersistence in catalog (pg_class-relpersistence) (heap, toast, indexes) 6) Remove/Create INIT_FORK • from Unlogged to Logged ∘ remove the INIT_FORK and INIT_TO_LOGGED_FORK adding to the pendingDeletes queue • from Logged to Unlogged ∘ remove the INIT_TO_UNLOGGED_FORK adding to the pendingDeletes queue ∘ create the INIT_FORK using heap_create_init_fork ∘ insert XLog record to drop init fork if the transaction abort *** CRASH RECOVERY changes 1) During crash recovery (src/backend/access/transam/xlog.c:6507:ResetUnloggedRelations) This operation is carried out in two phases: one before replaying WAL records and second after that. Are you sure that the WALs generated for the unlogged or logged forks, as described above, have been taken care of? • if the transient fork _initl exists then ∘ drop the transient fork _initl ∘ if the init fork doesn't exist then create it ∘ reset relation • if the transient fork _initu exists then ∘ drop the transient fork _initl ∘ if the init fork exists then drop it ∘ don't reset the relation Consider case of converting unlogged to logged. The server crashes after 6th step when both the forks are removed. During crash recovery, it will not see any of the fork and won't reset the unlogged table. Remember the table is still unlogged since the transaction was aborted due to the crash. So, it has to have an init fork to be reset on crash recovery. Similarly while converting from logged to unlogged. The server crashes in the 6th step after creating the INIT_FORK, during crash recovery the init fork will be seen and a supposedly logged table will be trashed. The ideas in 1 and 2 might be better than having yet another init fork. 1. http://www.postgresql.org/message-id/533d457a.4030...@vmware.com Regards, -- Fabrízio de Royes Mello Consultoria/Coaching PostgreSQL Timbira: http://www.timbira.com.br Blog: http://fabriziomello.github.io Linkedin: http://br.linkedin.com/in/fabriziomello Twitter: http://twitter.com/fabriziomello Github: http://github.com/fabriziomello -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company
Re: [HACKERS] [BUGS] BUG #13126: table constraint loses its comment
On 2015-07-03 15:50, Michael Paquier wrote: On Thu, Jul 2, 2015 at 11:16 PM, Petr Jelinek p...@2ndquadrant.com wrote: I was going through the code and have few comments: - Why do you change the return value of TryReuseIndex? Can't we use reuse the same OidIsValid(stmt-oldNode) check that ATExecAddIndex is doing instead? As pointed out by Heikki previously, that is actually unnecessary, comments are still lost even if the index is reused for constraints. So perhaps for simplicity we could just unconditionally recreate the comments all the time if they are available. Ah ok, I missed Heikki's email. - I think the changes to ATPostAlterTypeParse should follow more closely the coding of transformTableLikeClause - namely use the idxcomment I am not sure I follow here. Could you elaborate? Well for indexes you don't really need to add the new AT command, as IndexStmt has char *idxcomment which it will automatically uses as comment if not NULL. While I am not huge fan of the idxcomment it doesn't seem to be easy to remove it in the future and it's what transformTableLikeClause uses so it might be good to be consistent with that. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WAL logging problem in 9.4.3?
On Fri, Jul 3, 2015 at 3:01 PM, Martijn van Oosterhout klep...@svana.org wrote: On Fri, Jul 03, 2015 at 02:34:44PM +0900, Fujii Masao wrote: Hmm, for me it is 100% reproducable. Are you familiar with Docker? I can probably construct a Dockerfile that reproduces it pretty reliably. I could reproduce the problem in the master branch by doing the following steps. Thank you, I wasn't sure if you could kill the server fast enough without containers, but it looks like immediate mode is enough. 1. start the PostgreSQL server with wal_level = minimal 2. execute the following SQL statements begin; create table test(id serial primary key); truncate table test; commit; 3. shutdown the server with immediate mode 4. restart the server (crash recovery occurs) 5. execute the following SQL statement select * from test; The optimization of TRUNCATE opereation that we can use when CREATE TABLE and TRUNCATE are executed in the same transaction block seems to cause the problem. In this case, only index file truncation is logged, and index creation in btbuild() is not logged because wal_level is minimal. Then at the subsequent crash recovery, index file is truncated to 0 byte... Very simple fix is to log an index creation in that case, but not sure if that's ok to do.. In 9.2 or before, this problem doesn't occur because no such error is thrown even if an index file size is zero. But in 9.3 or later, since the planner tries to read a meta page of an index to get the height of the btree tree, an empty index file causes such error. The planner was changed that way by commit 31f38f28, and the problem seems to be an oversight of that commit. I'm not familiar with that change of the planner, but ISTM that we can simply change _bt_getrootheight() so that 0 is returned if an index file is empty, i.e., meta page cannot be read, in order to work around the problem. Thought? Regards, -- Fujii Masao -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WAL logging problem in 9.4.3?
Fujii Masao masao.fu...@gmail.com writes: The optimization of TRUNCATE opereation that we can use when CREATE TABLE and TRUNCATE are executed in the same transaction block seems to cause the problem. In this case, only index file truncation is logged, and index creation in btbuild() is not logged because wal_level is minimal. Then at the subsequent crash recovery, index file is truncated to 0 byte... Very simple fix is to log an index creation in that case, but not sure if that's ok to do.. In 9.2 or before, this problem doesn't occur because no such error is thrown even if an index file size is zero. But in 9.3 or later, since the planner tries to read a meta page of an index to get the height of the btree tree, an empty index file causes such error. The planner was changed that way by commit 31f38f28, and the problem seems to be an oversight of that commit. What? You want to blame the planner for failing because the index was left corrupt by broken WAL replay? A failure would occur anyway at execution. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PostgreSQL 9.5 Alpha 1 build fail with perl 5.22
On 7/3/2015 8:19 AM, Michael Paquier wrote: On Fri, Jul 3, 2015 at 2:47 PM, Marco Atzeri marco.atz...@gmail.com wrote: On 7/2/2015 5:16 PM, Dave Page wrote: -lldap hstore_plperl.o: In function `hstore_to_plperl': /pub/devel/postgresql/prova/postgresql-9.5alpha1-1.i686/src/postgresql-9.5alpha1 /contrib/hstore_plperl/hstore_plperl.c:16: undefined reference to `hstoreUpgrade ' So... dangomushi is able to build it at least. Here are the logs to the last build for example: http://buildfarm.postgresql.org/cgi-bin/show_stage_log.pl?nm=dangomushidt=2015-07-02%2019%3A19%3A39stg=make-contrib What is the name of the library generated in hstore? Perhaps there is a mismatch here. OK thanks for the feedback. It may be a cygwin perl specific issue. I will investigate Regards Marco -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pgbench - allow backslash-continuations in custom scripts
Fabien COELHO coe...@cri.ensmp.fr writes: I'm pretty clearly in favor of doing correct lexing. I think we should generalize that and make it reusable. psql has it's own hacked up version already, there seems little point in having variedly good copies around. I must admit that I do not know how to share lexer rules but have different actions on them (psql vs sql parser vs ...), as the action code is intrinsically intertwined with expressions. Obviously this is scope creep of the first magnitude, but ISTM that it would be possible to share a lexer between psql and pgbench, since in both of them the basic requirement is break SQL commands apart and identify newline-terminated backslash commands. If we're gonna break pgbench's backwards compatibility anyway, there would be a whole lot to be said for just going over to psql's input parsing rules, lock stock 'n barrel; and this would be a good way to achieve that. As it stands, psqlscan.l has some external dependencies on the rest of psql, but we could perhaps refactor some of those away, and provide dummy implementations to satisfy others (eg pgbench could provide a dummy GetVariable() that just always returns NULL). So I'm imagining symlinking psqlscan.l into src/bin/pgbench and using it as-is (possibly after refactoring in psql). A possible issue is avoiding unnecessary invocations of flex, though. Maybe symlinking the .c file would work better. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PostgreSQL 9.5 Alpha 1 build fail with perl 5.22
On 07/03/2015 01:47 AM, Marco Atzeri wrote: On 7/2/2015 5:16 PM, Dave Page wrote: The PostgreSQL Global Development Group announces that the alpha release of PostgreSQL 9.5, the latest version of the world's leading open source database, is available today. This release contains previews of all of the features which will be available in the final release of version 9.5, although some details will change before then. Please download, test, and report what you find. Help Test for Bugs -- building on cygwin and $ perl --version This is perl 5, version 22, subversion 0 (v5.22.0) built for cygwin-thread-multi-64int build fail here, anyone seeing the same on other platforms ? make -C hstore_plperl all make[1]: Entering directory '/cygdrive/e/cyg_pub/devel/postgresql/prova/postgres ql-9.5alpha1-1.i686/build/contrib/hstore_plperl' gcc -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -We ndif-labels -Wmissing-format-attribute -Wformat-security -fno-strict-aliasing -f wrapv -fexcess-precision=standard -ggdb -O2 -pipe -Wimplicit-function-declaratio n -I/pub/devel/postgresql/prova/postgresql-9.5alpha1-1.i686/src/postgresql-9.5a lpha1/src/pl/plperl -I/pub/devel/postgresql/prova/postgresql-9.5alpha1-1.i686/sr c/postgresql-9.5alpha1/contrib/hstore -I. -I/pub/devel/postgresql/prova/postgres ql-9.5alpha1-1.i686/src/postgresql-9.5alpha1/contrib/hstore_plperl -I../../src/i nclude -I/pub/devel/postgresql/prova/postgresql-9.5alpha1-1.i686/src/postgresql- 9.5alpha1/src/include -I/usr/lib/perl5/5.22/i686-cygwin-threads-64int/CORE -c -o hstore_plperl.o /pub/devel/postgresql/prova/postgresql-9.5alpha1-1.i686/src/p ostgresql-9.5alpha1/contrib/hstore_plperl/hstore_plperl.c In file included from /pub/devel/postgresql/prova/postgresql-9.5alpha1-1.i686/sr c/postgresql-9.5alpha1/contrib/hstore_plperl/hstore_plperl.c:6:0: /pub/devel/postgresql/prova/postgresql-9.5alpha1-1.i686/src/postgresql-9.5alpha1 /contrib/hstore/hstore.h:79:0: warning: HS_KEY redefined #define HS_KEY(arr_,str_,i_) ((str_) + HSE_OFF((arr_)[2*(i_)])) ^ In file included from /usr/lib/perl5/5.22/i686-cygwin-threads-64int/CORE/perl.h: 3730:0, from /pub/devel/postgresql/prova/postgresql-9.5alpha1-1.i686/sr c/postgresql-9.5alpha1/src/pl/plperl/plperl.h:48, from /pub/devel/postgresql/prova/postgresql-9.5alpha1-1.i686/sr c/postgresql-9.5alpha1/contrib/hstore_plperl/hstore_plperl.c:4: /usr/lib/perl5/5.22/i686-cygwin-threads-64int/CORE/util.h:221:0: note: this is t he location of the previous definition # define HS_KEY(setxsubfn, popmark, apiver, xsver) \ ^ gcc -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -We ndif-labels -Wmissing-format-attribute -Wformat-security -fno-strict-aliasing -f wrapv -fexcess-precision=standard -ggdb -O2 -pipe -Wimplicit-function-declaratio n -shared -o hstore_plperl.dll -Wl,--out-implib=libhstore_plperl.a hstore_plpe rl.o -L../../src/port -L../../src/common -Wl,-no-undefined -Wl,--allow-multiple -definition -Wl,--enable-auto-import -L/usr/local/lib -Wl,--as-needed -L../.. /src/backend -lpostgres -lpgcommon -lpgport -lintl -lssl -lcrypto -lz -lreadline -lcrypt -lldap hstore_plperl.o: In function `hstore_to_plperl': /pub/devel/postgresql/prova/postgresql-9.5alpha1-1.i686/src/postgresql-9.5alpha1 /contrib/hstore_plperl/hstore_plperl.c:16: undefined reference to `hstoreUpgrade ' That #define clash is annoying, We should probably #undefine HS_KEY if it's defined before including hstore.h. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WAL logging problem in 9.4.3?
On Fri, Jul 03, 2015 at 12:53:56PM -0400, Tom Lane wrote: Fujii Masao masao.fu...@gmail.com writes: Okay, so probably we need to change WAL replay of TRUNCATE so that the index file is truncated to one containing only meta page instead of empty one. That is, the WAL replay of TRUNCATE would need to call index_build() after smgrtruncate() maybe. That seems completely unworkable. For one thing, index_build would expect to be able to do catalog lookups, but we can't assume that the catalogs are in a good state yet. I think the responsibility has to be on the WAL-writing end to emit WAL instructions that lead to a correct on-disk state. Putting complex behavior into the reading side is fundamentally misguided. Am I missing something. ISTM that if the truncate record was simply not logged at all everything would work fine. The whole point is that the table was created in this transaction and so if it exists the table on disk must be the correct representation. The broken index is just one symptom. The heap also shouldn't be truncated at all. If you insert a row before commit then after replay the tuple should be there still. Have a nice day, -- Martijn van Oosterhout klep...@svana.org http://svana.org/kleptog/ He who writes carelessly confesses thereby at the very outset that he does not attach much importance to his own thoughts. -- Arthur Schopenhauer signature.asc Description: Digital signature
Re: [HACKERS] Synch failover WAS: Support for N synchronous standby servers - take 2
On 2015-07-03 10:27:05 -0700, Josh Berkus wrote: On 07/03/2015 03:12 AM, Sawada Masahiko wrote: Thanks. So we can choice the next master server using by checking the progress of each server, if hot standby is enabled. And a such procedure is needed even today replication. I think that the #2 problem which is Josh pointed out seems to be solved; 1. I need to ensure that data is replicated to X places. 2. I need to *know* which places data was synchronously replicated to when the master goes down. And we can address #1 problem using quorum commit. It's not solved. I still have zero ways of knowing if a replica was in sync or not at the time the master went down. What? You pick the standby that's furthest ahead. And you use a high enough quorum so that given your tolerance for failures you'll always be able to reach at least one of the synchronous replicas. Then you promote the one with the highest LSN. Done. This is something that gets *easier* by quorum, not harder. I forked the subject line because I think that the inability to identify synch replicas under failover conditions is a serious problem with synch rep *today*, and pretending that it doesn't exist doesn't help us even if we don't fix it in 9.6. That's just not how failovers can sanely work. And again, you *have* the information you can have on the standbys already. You *know* what/from when the last replayed xact is. Let me give you three cases where our lack of information on the replica side about whether it thinks it's in sync or not causes synch rep to fail to protect data. The first case is one I've actually seen in production, and the other two are hypothetical but entirely plausible. Case #1: two synchronous replica servers have the application name synchreplica. An admin uses the wrong Chef template, and deploys a server which was supposed to be an async replica with the same recovery.conf template, and it ends up in the synchreplica group as well. Due to restarts (pushing out an update release), the new server ends up seizing and keeping sync. Then the master dies. Because the new server wasn't supposed to be a sync replica in the first place, it is not checked; they just fail over to the furthest ahead of the two original synch replicas, neither of which was actually in synch. Nobody can protect you against such configuration errors. We can make it harder to misconfigure, sure, but it doesn't have anything to do with the topic at hand. Case #2: 2 { local, london, nyc } setup. At 2am, the links between data centers become unreliable, such that the on-call sysadmin disables synch rep because commits on the master are intolerably slow. Then, at 10am, the links between data centers fail entirely. The day shift, not knowing that the night shift disabled sync, fail over to London thinking that they can do so with zero data loss. As I said earlier, you can check against that today by checking the last replayed timestamp. SELECT pg_last_xact_replay_timestamp(); You don't have to pick the one that used to be a sync replica. You pick the one with the most data received. If the day shift doesn't bother to check the standbys now, they'd not check either if they had some way to check whether a node was the chosen sync replica. Case #3 1 { london, frankfurt }, 1 { sydney, tokyo } multi-group priority setup. We lose communication with everything but Europe. How can we decide whether to wait to get sydney back, or to promote London immedately? You normally don't continue automatically at all in that situation. To avoid/minimize data loss you want to have a majority election system to select the new primary. That requires reaching the majority of the nodes. This isn't something specific to postgres, if you look at any solution out there, they're also doing it that way. Statically choosing which of the replicas in a group is the current sync one is a *bad* idea. You want to ensure that at least node in a group has received the data, and stop waiting as soon that's the case. It's an issue *now* that the only data we have about the state of sync rep is on the master, and dies with the master. And it severely limits the actual utility of our synch rep. People implement synch rep in the first place because the best effort of asynch rep isn't good enough for them, and yet when it comes to failover we're just telling them give it your best effort. We don't tell them that, but apparently you do. This subthread is getting absurd, stopping here. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Fix broken Install.bat when target directory contains a space
On 04/21/2015 04:02 PM, Michael Paquier wrote: On Tue, Apr 21, 2015 at 4:33 PM, Asif Naeem anaeem...@gmail.com wrote: The v2 patch looks good to me, just a minor concern on usage message i.e. C:\PG\postgresql\src\tools\msvcinstall Invalid command line options. Usage: install.bat targetdir [installtype] installtype: client It seems that there are two install options i.e. client, all (any other string other than client is being considered or treated as all), the following install command works i.e. install C:\PG\postgresql\inst option_does_not_exist As your patch effects this area of code, I thought to share these findings with you,o BTW, it is a minor thing that can be handled in another patch, Well, that's the same behavior that this script has been having for ages. Let's just update the usage message to mention both all and client. I see no point in breaking a behavior that has been like that for ages, and the main point of this patch is to fix the install path issue. Hmm. Why is install.bat not like build.bat, i.e. just a thin wrapper that just calls install.pl, passing all arguments? - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WAL logging problem in 9.4.3?
Fujii Masao masao.fu...@gmail.com writes: Okay, so probably we need to change WAL replay of TRUNCATE so that the index file is truncated to one containing only meta page instead of empty one. That is, the WAL replay of TRUNCATE would need to call index_build() after smgrtruncate() maybe. That seems completely unworkable. For one thing, index_build would expect to be able to do catalog lookups, but we can't assume that the catalogs are in a good state yet. I think the responsibility has to be on the WAL-writing end to emit WAL instructions that lead to a correct on-disk state. Putting complex behavior into the reading side is fundamentally misguided. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pgbench - allow backslash-continuations in custom scripts
I wrote: As it stands, psqlscan.l has some external dependencies on the rest of psql, but we could perhaps refactor some of those away, and provide dummy implementations to satisfy others (eg pgbench could provide a dummy GetVariable() that just always returns NULL). So I'm imagining symlinking psqlscan.l into src/bin/pgbench and using it as-is (possibly after refactoring in psql). A possible issue is avoiding unnecessary invocations of flex, though. Maybe symlinking the .c file would work better. A quick experiment with compiling psqlscan inside pgbench yields the following failures: pgbench.o: In function `psql_scan_setup': /home/postgres/pgsql/src/bin/pgbench/psqlscan.l:1239: undefined reference to `pset' pgbench.o: In function `escape_variable': /home/postgres/pgsql/src/bin/pgbench/psqlscan.l:1950: undefined reference to `pset' /home/postgres/pgsql/src/bin/pgbench/psqlscan.l:1950: undefined reference to `GetVariable' /home/postgres/pgsql/src/bin/pgbench/psqlscan.l:1956: undefined reference to `pset' /home/postgres/pgsql/src/bin/pgbench/psqlscan.l:1957: undefined reference to `psql_error' /home/postgres/pgsql/src/bin/pgbench/psqlscan.l:1971: undefined reference to `pset' /home/postgres/pgsql/src/bin/pgbench/psqlscan.l:1973: undefined reference to `psql_error' pgbench.o: In function `evaluate_backtick': /home/postgres/pgsql/src/bin/pgbench/psqlscan.l:1701: undefined reference to `psql_error' /home/postgres/pgsql/src/bin/pgbench/psqlscan.l:1712: undefined reference to `psql_error' /home/postgres/pgsql/src/bin/pgbench/psqlscan.l:1722: undefined reference to `psql_error' /home/postgres/pgsql/src/bin/pgbench/psqlscan.l:1728: undefined reference to `psql_error' pgbench.o: In function `yylex': /home/postgres/pgsql/src/bin/pgbench/psqlscan.l:511: undefined reference to `standard_strings' /home/postgres/pgsql/src/bin/pgbench/psqlscan.l:743: undefined reference to `pset' /home/postgres/pgsql/src/bin/pgbench/psqlscan.l:743: undefined reference to `GetVariable' /home/postgres/pgsql/src/bin/pgbench/psqlscan.l:751: undefined reference to `psql_error' /home/postgres/pgsql/src/bin/pgbench/psqlscan.l:1037: undefined reference to `pset' /home/postgres/pgsql/src/bin/pgbench/psqlscan.l:1037: undefined reference to `GetVariable' pgbench.o: In function `psql_scan_slash_option': /home/postgres/pgsql/src/bin/pgbench/psqlscan.l:1619: undefined reference to `pset' /home/postgres/pgsql/src/bin/pgbench/psqlscan.l:1628: undefined reference to `psql_error' The pset references are to pset.encoding, pset.db, or pset.vars. I'd think the best way to deal with the encoding and connection are to pass them as parameters to psql_scan_setup() which'd store them in the PsqlScanState. pset.vars is only passed to GetVariable. We could refactor that away somehow (although actually, why wouldn't we want to just implement variable substitution exactly like it is in psql? Maybe the right answer is to import psql/variables.c lock stock n barrel too...) psql_error() and standard_strings() wouldn't be hard to provide. So this is looking *eminently* doable. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] psql :: support for \ev viewname and \sv viewname
Jeevan Chalke jeevan.cha...@enterprisedb.com writes: Patch looks excellent now. No issues. Found a typo which I have fixed in the attached patch. Starting to look at this ... The business with numbering lines from SELECT seems to me to be completely nonsensical. In the first place, it fails to allow for views containing WITH clauses. But really it looks like it was cargo-culted over from \ef/\sf without understanding why those commands number lines the way they do. The reason they do that is that for errors occurring inside a function definition, the PL will typically report a line number relative to the function body text, and so we're trying to be helpful about interpreting line numbers of that kind. But there's no comparable behavior in the case of a view. If you fat-finger a view, you'll get a line number relative to the text of the whole CREATE command, eg regression=# create or replace view z as regression-# select 1/col regression-# from bar; ERROR: relation bar does not exist LINE 3: from bar; ^ So AFAICS, \ev and \sv should just number lines straightforwardly, with 1 being the first line of the CREATE command text. Am I missing something? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WAL logging problem in 9.4.3?
On 2015-07-03 18:49:31 +0200, Andres Freund wrote: But the more interesting question is why that's not hhappening today. RelationTruncateIndexes() does call the index_build() which should end up WAL logging the index creation. So that's because there's an XLogIsNeeded() preventing it. Maybe I'm just daft right now (35C outside, 32 inside, so ...), but I'm right now missing how the whole skip wal logging if relation has just been truncated optimization can ever actually be crashsafe unless we use a new relfilenode (which we don't!). Sure, we do an heap_sync() at the the end of the transaction. That's nice and all. But it doesn't help if we crash and re-start WAL apply from a checkpoint before the table was created. Because that'll replay the truncation. That's much worse than just the indexes - the rows added by a COPY without WAL logging will also be truncated away, no? -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] more RLS oversights
On Wed, Feb 25, 2015 at 11:37:24PM -0500, Stephen Frost wrote: * Stephen Frost (sfr...@snowman.net) wrote: I agree that it's great that we're catching issues prior to when the feature is released and look forward to anything else you (or anyone else!) finds. I've pushed a fix for this. Please let me know if you see any other issues or run into any problems. (1) CreatePolicy() and AlterPolicy() omit to call assign_expr_collations() on the node trees. Test case: begin; set row_security = force; create table t (c) as values ('bar'::text); create policy p on t using (c ('foo'::text COLLATE C)); alter table t enable row level security; table pg_policy; -- note :inputcollid 0 select * from t; -- ERROR: could not determine which collation ... rollback; (2) CreatePolicy() and AlterPolicy() omit to create a pg_shdepend entry for each role in the TO clause. Test case: begin; create role alice; create table t (c) as values ('bar'::text); grant select on table t to alice; create policy p on t to alice using (true); select refclassid::regclass, refobjid, refobjsubid, deptype from pg_depend where classid = 'pg_policy'::regclass; select refclassid::regclass, refobjid, deptype from pg_shdepend where classid = 'pg_policy'::regclass; savepoint q; drop role alice; rollback to q; revoke all on table t from alice; \d t drop role alice; \d t rollback; +static void +dumpPolicy(Archive *fout, PolicyInfo *polinfo) ... + if (polinfo-polqual != NULL) + appendPQExpBuffer(query, USING %s, polinfo-polqual); (3) The USING clause needs parentheses; a dump+reload failed like so: pg_restore: [archiver (db)] could not execute query: ERROR: syntax error at or near CASE LINE 2: CASE ^ Command was: CREATE POLICY p2 ON category FOR ALL TO PUBLIC USING CASE WHEN (current_user() = 'rls_regress_user1'::name) THE... Add the same parentheses to psql \d output also, keeping that output similar to the SQL syntax. (3a) I found this by hacking the rowsecurity.sql regression test to not drop its objects, then running the pg_upgrade test suite. New features that affect pg_dump should leave objects in the regression database to test the pg_dump support via that suite. (4) When DefineQueryRewrite() is about to convert a table to a view, it checks the table for features unavailable to views. For example, it rejects tables having triggers. It omits to reject tables having relrowsecurity or a pg_policy record. Test case: begin; set row_security = force; create table t (c) as select * from generate_series(1,5); create policy p on t using (c % 2 = 1); alter table t enable row level security; table t; truncate t; create rule _RETURN as on select to t do instead select * from generate_series(1,5) t0(c); table t; select polrelid::regclass from pg_policy; select relrowsecurity from pg_class where oid = 't'::regclass; rollback; + para + Referential integrity checks, such as unique or primary key constraints + and foreign key references, will bypass row security to ensure that + data integrity is maintained. Care must be taken when developing + schemas and row level policies to avoid a covert channel leak of + information through these referential integrity checks. ... + /* + * Row-level security should be disabled in the case where a foreign-key + * relation is queried to check existence of tuples that references the + * primary-key being modified. + */ + temp_sec_context = save_sec_context | SECURITY_LOCAL_USERID_CHANGE; + if (qkey-constr_queryno == RI_PLAN_CHECK_LOOKUPPK + || qkey-constr_queryno == RI_PLAN_CHECK_LOOKUPPK_FROM_PK + || qkey-constr_queryno == RI_PLAN_RESTRICT_DEL_CHECKREF + || qkey-constr_queryno == RI_PLAN_RESTRICT_UPD_CHECKREF) + temp_sec_context |= SECURITY_ROW_LEVEL_DISABLED; (5) This code does not use SECURITY_ROW_LEVEL_DISABLED for foreign keys with CASCADE, SET NULL or SET DEFAULT actions. The associated documentation says nothing about this presumably-important distinction. Is SECURITY_ROW_LEVEL_DISABLED mode even needed? We switch users to the owner of the FROM-clause table before running an RI query. That means use of this mode can only matter under row_security=force, right? I would rest easier if this mode went away, because it is a security landmine. If user code managed to run in this mode, it would bypass every policy in the database. (I find no such vulnerability today, because we use the mode only for parse analysis of ri_triggers.c queries.) (6) AlterPolicy() calls InvokeObjectPostAlterHook(PolicyRelationId, ...), but CreatePolicy() and DropPolicy() lack their respective hook invocations. (7) Using an aggregate function in a policy predicate elicits an inapposite error message due to use of EXPR_KIND_WHERE for parse analysis. Need a new ParseExprKind. Test case: begin; set row_security = force; create table t (c) as values
Re: [HACKERS] psql :: support for \ev viewname and \sv viewname
пт, 3 июля 2015 г. в 19:30, Tom Lane t...@sss.pgh.pa.us: So AFAICS, \ev and \sv should just number lines straightforwardly, with 1 being the first line of the CREATE command text. Am I missing something? Fixed. Now both \ev and \sv numbering lines starting with 1. New version attached. As I've already noticed that pg_get_viewdef() does not support full syntax of creating or replacing views. In my opinion, psql source code isn't the place where some formatting hacks should be. So, can you give me an idea how to produce already formatted output supporting WITH statement without breaking backward compatibility of pg_get_viewdef() internals? psql-ev-sv-support-v6.diff Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pgbench - allow backslash-continuations in custom scripts
(although actually, why wouldn't we want to just implement variable substitution exactly like it is in psql? Pgbench variable substitution is performed when the script is run, not while the file is being processed for being split, which is when a lexer would be used. The situation is not the same with psql. The most it could do would be to keep track of what substitution are done in queries. So this is looking *eminently* doable. Possibly. How much more effort would be involved compared to the quick patch I did, I wonder:-) -- Fabien. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WAL logging problem in 9.4.3?
On 2015-07-03 19:14:26 +0200, Martijn van Oosterhout wrote: Am I missing something. ISTM that if the truncate record was simply not logged at all everything would work fine. The whole point is that the table was created in this transaction and so if it exists the table on disk must be the correct representation. That'd not work either. Consider: BEGIN; CREATE TABLE ... INSERT; TRUNCATE; INSERT; COMMIT; If you replay that without a truncation wal record the second INSERT will try to add stuff to already occupied space. And they can have different lengths and stuff, so you cannot just ignore that fact. The broken index is just one symptom. Agreed. I think the problem is something else though. Namely that we reuse the relfilenode for heap_truncate_one_rel(). That's just entirely broken afaics. We need to allocate a new relfilenode and write stuff into that. Then we can forgo WAL logging the truncation record. If you insert a row before commit then after replay the tuple should be there still. The insert would be WAL logged. COPY skips wal logging tho. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Reducing the size of BufferTag remodeling forks
On 2015-07-03 13:59:07 -0300, Alvaro Herrera wrote: Andres Freund wrote: 2) Replace relation forks, with the exception of the init fork which is special anyway, with separate relfilenodes. Stored in seperate columns in pg_class. Different AMs have different fork needs; for heaps you want one main fork, one VM, one fsm. But for indexes, the VM fork is not necessary, and some AMs might want different ones. For instance, GIN would benefit from having separate forks to store the internal indexes, and BRIN would benefit from a separate fork for the revmap. What I'm saying is that I'm not sure it's okay to store the forks in pg_class columns Right. Part of the point of this design is that you could easily add further forks without system wide knowledge and that it is *not* required anymore that all relfilenodes are in one column. I think it'd probably make sense to have at least _vm and _fsm in pg_class, but we could easily add further ones elsewhere. instead perhaps we should have a separate catalog on which each relation can have many forks, or perhaps have the pg_class entry store an array (ick). Or perhaps rather than relvmfork (the pg_class column for the relfilenode for the VM fork) we could store relfilenode1, relfilenode2 where the value for each N fork is AM-specific. (so for heaps 1 is main, 2 is FSM, 3 is VM; for BRIN 1 is main, 2 is revmap; and so forth). None of these sound particularly pretty to me. An array of relfilenodes would probably be the least ugly one. FWIW the whole idea seems reasonable to me. I worry about concurrent traffic into the pg_relfilenode shared table -- if temp table creation is common across many databases, is it going to become a contention point? I don't think it'll be. It's essentially just inserts into a tiny table with a single index, right? We can do a bootload of inserts into one of these. In an *assert enabled* build: CREATE TABLE pg_relfilenode() WITH OIDS; ALTER TABLE pg_relfilenode ADD CONSTRAINT pg_relfilenode_pkey PRIMARY KEY(oid); which is pretty much how pg_relfilenode would look like. Although we'd not go through the whole executor for the inserts. pgbench -h localhost -p 5440 postgres -n -f /tmp/pg_relfilenode.sql -j 16 -c 16 -T 20 -P 1 cat /tmp/pg_relfilenode.sql: INSERT INTO pg_relfilenode DEFAULT VALUES; progress: 5.0 s, 32168.4 tps, lat 0.495 ms stddev 1.728 progress: 6.0 s, 33719.6 tps, lat 0.473 ms stddev 0.773 andres@awork2:~/src/postgresql$ pgbench -h localhost -p 5440 postgres -n -f /tmp/temptable.sql -j 16 -c 16 -T 20 -P 1 CREATE TEMPORARY TABLE blarg() ON COMMIT DROP; progress: 6.0 s, 5018.2 tps, lat 3.185 ms stddev 3.671 progress: 7.0 s, 4890.9 tps, lat 3.272 ms stddev 4.346 and that's with zero actual columns. If you instead add some: CREATE TEMPORARY TABLE blarg(id serial primary key, data text, blarg int) ON COMMIT DROP; progress: 7.0 s, 974.1 tps, lat 16.462 ms stddev 9.058 progress: 8.0 s, 999.9 tps, lat 16.045 ms stddev 7.011 So no, I don't think this'll be a relevant problem. We do so many inserts for a single temp table that a single insert into another one completely vanishes in comparison. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Synch failover WAS: Support for N synchronous standby servers - take 2
On 07/03/2015 03:12 AM, Sawada Masahiko wrote: Thanks. So we can choice the next master server using by checking the progress of each server, if hot standby is enabled. And a such procedure is needed even today replication. I think that the #2 problem which is Josh pointed out seems to be solved; 1. I need to ensure that data is replicated to X places. 2. I need to *know* which places data was synchronously replicated to when the master goes down. And we can address #1 problem using quorum commit. It's not solved. I still have zero ways of knowing if a replica was in sync or not at the time the master went down. Now, you and others have argued persuasively that there are valuable use cases for quorum commit even without solving that particular issue, but there's a big difference between we can work around this problem and the problem is solved. I forked the subject line because I think that the inability to identify synch replicas under failover conditions is a serious problem with synch rep *today*, and pretending that it doesn't exist doesn't help us even if we don't fix it in 9.6. Let me give you three cases where our lack of information on the replica side about whether it thinks it's in sync or not causes synch rep to fail to protect data. The first case is one I've actually seen in production, and the other two are hypothetical but entirely plausible. Case #1: two synchronous replica servers have the application name synchreplica. An admin uses the wrong Chef template, and deploys a server which was supposed to be an async replica with the same recovery.conf template, and it ends up in the synchreplica group as well. Due to restarts (pushing out an update release), the new server ends up seizing and keeping sync. Then the master dies. Because the new server wasn't supposed to be a sync replica in the first place, it is not checked; they just fail over to the furthest ahead of the two original synch replicas, neither of which was actually in synch. Case #2: 2 { local, london, nyc } setup. At 2am, the links between data centers become unreliable, such that the on-call sysadmin disables synch rep because commits on the master are intolerably slow. Then, at 10am, the links between data centers fail entirely. The day shift, not knowing that the night shift disabled sync, fail over to London thinking that they can do so with zero data loss. Case #3 1 { london, frankfurt }, 1 { sydney, tokyo } multi-group priority setup. We lose communication with everything but Europe. How can we decide whether to wait to get sydney back, or to promote London immedately? I could come up with numerous other situations, but all of the three above completely reasonable cases show how having the knowledge of what time a replica thought it was last in sync is vital to preventing bad failovers and data loss, and to knowing the quantity of data loss when it can't be prevented. It's an issue *now* that the only data we have about the state of sync rep is on the master, and dies with the master. And it severely limits the actual utility of our synch rep. People implement synch rep in the first place because the best effort of asynch rep isn't good enough for them, and yet when it comes to failover we're just telling them give it your best effort. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WAL logging problem in 9.4.3?
On 2015-07-03 19:02:29 +0200, Andres Freund wrote: Maybe I'm just daft right now (35C outside, 32 inside, so ...), but I'm right now missing how the whole skip wal logging if relation has just been truncated optimization can ever actually be crashsafe unless we use a new relfilenode (which we don't!). We actually used to use a different relfilenode, but optimized that away: cab9a0656c36739f59277b34fea8ab9438395869 commit cab9a0656c36739f59277b34fea8ab9438395869 Author: Tom Lane t...@sss.pgh.pa.us Date: Sun Aug 23 19:23:41 2009 + Make TRUNCATE do truncate-in-place when processing a relation that was created or previously truncated in the current (sub)transaction. This is safe since if the (sub)transaction later rolls back, we'd just discard the rel's current physical file anyway. This avoids unreasonable growth in the number of transient files when a relation is repeatedly truncated. Per a performance gripe a couple weeks ago from Todd Cook. to me the reasoning here looks flawed. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WAL logging problem in 9.4.3?
On Fri, Jul 3, 2015 at 11:52 PM, Tom Lane t...@sss.pgh.pa.us wrote: Fujii Masao masao.fu...@gmail.com writes: The optimization of TRUNCATE opereation that we can use when CREATE TABLE and TRUNCATE are executed in the same transaction block seems to cause the problem. In this case, only index file truncation is logged, and index creation in btbuild() is not logged because wal_level is minimal. Then at the subsequent crash recovery, index file is truncated to 0 byte... Very simple fix is to log an index creation in that case, but not sure if that's ok to do.. In 9.2 or before, this problem doesn't occur because no such error is thrown even if an index file size is zero. But in 9.3 or later, since the planner tries to read a meta page of an index to get the height of the btree tree, an empty index file causes such error. The planner was changed that way by commit 31f38f28, and the problem seems to be an oversight of that commit. What? You want to blame the planner for failing because the index was left corrupt by broken WAL replay? A failure would occur anyway at execution. Yep, right. I was not thinking that such index with file size 0 is corrupted because the reported problem didn't happen before that commit was added. But that's my fault. Such index can cause an error even in other code paths. Okay, so probably we need to change WAL replay of TRUNCATE so that the index file is truncated to one containing only meta page instead of empty one. That is, the WAL replay of TRUNCATE would need to call index_build() after smgrtruncate() maybe. Then how should we implement that? Invent new WAL record type that calls smgrtruncate() and index_build() during WAL replay? Or add the special flag to XLOG_SMGR_TRUNCATE record, and make WAL replay call index_build() only if the flag is found? Any other good idea? Anyway ISTM that we might need to add or modify WAL record. Regards, -- Fujii Masao -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WAL logging problem in 9.4.3?
On 2015-07-04 01:39:42 +0900, Fujii Masao wrote: Okay, so probably we need to change WAL replay of TRUNCATE so that the index file is truncated to one containing only meta page instead of empty one. That is, the WAL replay of TRUNCATE would need to call index_build() after smgrtruncate() maybe. Then how should we implement that? Invent new WAL record type that calls smgrtruncate() and index_build() during WAL replay? Or add the special flag to XLOG_SMGR_TRUNCATE record, and make WAL replay call index_build() only if the flag is found? Any other good idea? Anyway ISTM that we might need to add or modify WAL record. It's easy enough to log something like a metapage with log_newpage(). But the more interesting question is why that's not hhappening today. RelationTruncateIndexes() does call the index_build() which should end up WAL logging the index creation. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Reducing the size of BufferTag remodeling forks
Andres Freund wrote: 2) Replace relation forks, with the exception of the init fork which is special anyway, with separate relfilenodes. Stored in seperate columns in pg_class. Different AMs have different fork needs; for heaps you want one main fork, one VM, one fsm. But for indexes, the VM fork is not necessary, and some AMs might want different ones. For instance, GIN would benefit from having separate forks to store the internal indexes, and BRIN would benefit from a separate fork for the revmap. What I'm saying is that I'm not sure it's okay to store the forks in pg_class columns; instead perhaps we should have a separate catalog on which each relation can have many forks, or perhaps have the pg_class entry store an array (ick). Or perhaps rather than relvmfork (the pg_class column for the relfilenode for the VM fork) we could store relfilenode1, relfilenode2 where the value for each N fork is AM-specific. (so for heaps 1 is main, 2 is FSM, 3 is VM; for BRIN 1 is main, 2 is revmap; and so forth). FWIW the whole idea seems reasonable to me. I worry about concurrent traffic into the pg_relfilenode shared table -- if temp table creation is common across many databases, is it going to become a contention point? -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] More logging for autovacuum
On Thu, Jul 2, 2015 at 4:41 AM, Gurjeet Singh gurj...@singh.im wrote: On Fri, Aug 17, 2007 at 3:14 PM, Alvaro Herrera alvhe...@commandprompt.com wrote: Gregory Stark wrote: I'm having trouble following what's going on with autovacuum and I'm finding the existing logging insufficient. In particular that it's only logging vacuum runs *after* the vacuum finishes makes it hard to see what vacuums are running at any given time. Also, I want to see what is making autovacuum decide to forgo vacuuming a table and the log with that information is at DEBUG2. So did this idea go anywhere? Assuming the thread stopped here, I'd like to rekindle the proposal. log_min_messages acts as a single gate for everything headed for the server logs; controls for per-background process logging do not exist. If one wants to see DEBUG/INFO messages for just the Autovacuum (or checkpointer, bgwriter, etc.), they have to set log_min_messages to that level, but the result would be a lot of clutter from other processes to grovel through, to see the messages of interest. I think that will be quite helpful. During the patch development of parallel sequential scan, it was quite helpful to see the LOG messages of bgworkers, however one of the recent commits (91118f1a) have changed those to DEBUG1, now if I have to enable all DEBUG1 messages, then what I need will be difficult to find in all the log messages. Having control of separate logging for background tasks will serve such a purpose. The facilities don't yet exist, but it'd be nice if such parameters when unset (ie NULL) pick up the value of log_min_messages. So by default, the users will get the same behaviour as today, but can choose to tweak per background-process logging when needed. Will this proposal allow user to see all the messages by all the background workers or will it be per background task. Example in some cases user might want to see the messages by all bgworkers and in some cases one might want to see just messages by AutoVacuum and it's workers. I think here designing User Interface is an important work if others also agree that such an option is useful, could you please elaborate your proposal? With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] Determine operator from it's function
Jim Nasby jim.na...@bluetreble.com writes: On 7/3/15 2:33 AM, Heikki Linnakangas wrote: On 07/03/2015 01:20 AM, Jim Nasby wrote: Is there a way to determine the operator that resulted in calling the operator function? I thought fcinfo-flinfo-fn_expr might get set to the OpExpr, but seems it can be a FuncExpr even when called via an operator... Don't think there is. Why do you need to know? I'd like to support arbitrary operators in variant. Why would you expect there to be multiple operators pointing at the same function? If there were multiple operators pointing at the same function, why would you need to distinguish them? ISTM that such a situation would necessarily mean that there was no distinction worthy of notice. (The particular situation you are bitching about comes from the fact that eval_const_expressions's simplify_functions code deliberately ignores any distinction between operators and functions. But for its purposes, that is *correct*, and I will strongly resist any claim that it isn't. If you are unhappy then you defined your operators wrongly.) regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Idea: closing the loop for pg_ctl reload
Jan de Visser j...@de-visser.net writes: Attached a new patch, rebased against the current head. Errors in pg_hba.conf and pg_ident.conf are now also noticed. I checked the documentation for pg_ctl reload, and the only place where it's explained seems to be runtime.sgml and that description is so high-level that adding this new bit of functionality wouldn't make much sense. BTW, it's probably worth pointing out that the recent work on the pg_file_settings view has taken away a large part of the use-case for this, in that you can find out more with less risk by inspecting pg_file_settings before issuing SIGHUP, instead of SIGHUP'ing and hoping you didn't break anything too nastily. Also, you can use pg_file_settings remotely, unlike pg_ctl (though admittedly you still need contrib/adminpack or something to allow uploading a new config file if you're doing remote admin). I wonder whether we should consider inventing similar views for pg_hba.conf and pg_ident.conf. While that's not necessarily a reason not to adopt this patch anyway, it does mean that we should think twice about whether we need to add a couple of hundred lines for a facility that's less useful than what we already have. BTW, this version of this patch neglects to update the comments in miscadmin.h, and it makes the return convention for ProcessConfigFileInternal completely unintelligible IMO; the inaccuracy and inconsistency in the comments is a symptom of that. I didn't read it in enough detail to say whether there are other problems. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Idea: closing the loop for pg_ctl reload
On July 3, 2015 06:21:09 PM Tom Lane wrote: Jan de Visser j...@de-visser.net writes: Attached a new patch, rebased against the current head. Errors in pg_hba.conf and pg_ident.conf are now also noticed. I checked the documentation for pg_ctl reload, and the only place where it's explained seems to be runtime.sgml and that description is so high-level that adding this new bit of functionality wouldn't make much sense. BTW, it's probably worth pointing out that the recent work on the pg_file_settings view has taken away a large part of the use-case for this, in that you can find out more with less risk by inspecting pg_file_settings before issuing SIGHUP, instead of SIGHUP'ing and hoping you didn't break anything too nastily. Also, you can use pg_file_settings remotely, unlike pg_ctl (though admittedly you still need contrib/adminpack or something to allow uploading a new config file if you're doing remote admin). I wonder whether we should consider inventing similar views for pg_hba.conf and pg_ident.conf. While that's not necessarily a reason not to adopt this patch anyway, it does mean that we should think twice about whether we need to add a couple of hundred lines for a facility that's less useful than what we already have. Since you were the one proposing the feature, I'm going to leave it to you whether or not I should continue with this. I have no use for this feature; for me it just seemed like a nice bite-sized feature to get myself acquainted with the code base and the development process. As far as I'm concerned that goal has already been achieved, even though finalizing a patch towards commit (and having my name in the release notes ha ha) would be the icing on the cake. BTW, this version of this patch neglects to update the comments in miscadmin.h, and it makes the return convention for ProcessConfigFileInternal completely unintelligible IMO; the inaccuracy and inconsistency in the comments is a symptom of that. I didn't read it in enough detail to say whether there are other problems. OK, miscadmin.h. I'll go and look what that's all about. And would it make sense to find a better solution for the ProcessConfigFileInternal return value (which is convoluted, I agree - I went for the solution with the least impact on existing code), or should I improve documentation? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WAL logging problem in 9.4.3?
On 2015-07-03 18:38:37 -0400, Tom Lane wrote: Why exactly? The first truncation in the (sub)xact would have assigned a new relfilenode, why do we need another one? The file in question will go away on crash/rollback in any case, and no other transaction can see it yet. Consider: BEGIN; CREATE TABLE; INSERT largeval; TRUNCATE; INSERT 1; COPY; INSERT 2; COMMIT; INSERT 1 is going to be WAL logged. For that to work correctly TRUNCATE has to be WAL logged, as otherwise there'll be conflicting/overlapping tuples on the target page. But: The truncation itself is not fully wal logged, neither is the COPY. Both rely on heap_sync()/immedsync(). For that to be correct the current relfilenode's truncation may *not* be wal-logged, because the contents of the COPY or the truncation itself will only be on-disk, not in the WAL. Only being on-disk but not in the WAL is a problem if we crash and replay the truncate record. I'm prepared to believe that some bit of logic is doing the wrong thing in this state, but I do not agree that truncate-in-place is unworkable. Unless we're prepared to make everything that potentially WAL logs something do the rel-rd_createSubid == mySubid dance, I can't see that working. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Synch failover WAS: Support for N synchronous standby servers - take 2
On Sat, Jul 4, 2015 at 2:44 AM, Andres Freund wrote: This subthread is getting absurd, stopping here. Yeah, I agree with Andres here, we are making a mountain of nothing (Frenglish?). I'll send to the other thread some additional ideas soon using a JSON structure. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WAL logging problem in 9.4.3?
Martijn van Oosterhout klep...@svana.org writes: With inserts the WAL records look as follows (relfilenodes changed): ... And amazingly, the database cluster successfuly recovers and there's no error now. So the problem is *only* because there is no data in the table at commit time. Which indicates that it's the 'newroot record that saves the day normally. And it's apparently generated by the first insert. Yeah, because the correct empty state of a btree index is to have a metapage but no root page, so the first insert forces creation of a root page. And, by chance, btree_xlog_newroot restores the metapage from scratch, so this works even if the metapage had been missing or corrupt. However, things would still break if the first access to the index was a read attempt rather than an insert. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Asynchronous execution on FDW
On 07/02/2015 08:48 AM, Kyotaro HORIGUCHI wrote: - It was a problem when to give the first kick for async exec. It is not in ExecInit phase, and ExecProc phase does not fit, too. An extra phase ExecPreProc or something is too invasive. So I tried pre-exec callback. Any init-node can register callbacks on their turn, then the registerd callbacks are called just before ExecProc phase in executor. The first patch adds functions and structs to enable this. At a quick glance, I think this has all the same problems as starting the execution at ExecInit phase. The correct way to do this is to kick off the queries in the first IterateForeignScan() call. You said that ExecProc phase does not fit - why not? - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WAL logging problem in 9.4.3?
On 2015-07-03 19:26:05 +0200, Andres Freund wrote: On 2015-07-03 19:02:29 +0200, Andres Freund wrote: Maybe I'm just daft right now (35C outside, 32 inside, so ...), but I'm right now missing how the whole skip wal logging if relation has just been truncated optimization can ever actually be crashsafe unless we use a new relfilenode (which we don't!). We actually used to use a different relfilenode, but optimized that away: cab9a0656c36739f59277b34fea8ab9438395869 commit cab9a0656c36739f59277b34fea8ab9438395869 Author: Tom Lane t...@sss.pgh.pa.us Date: Sun Aug 23 19:23:41 2009 + Make TRUNCATE do truncate-in-place when processing a relation that was created or previously truncated in the current (sub)transaction. This is safe since if the (sub)transaction later rolls back, we'd just discard the rel's current physical file anyway. This avoids unreasonable growth in the number of transient files when a relation is repeatedly truncated. Per a performance gripe a couple weeks ago from Todd Cook. to me the reasoning here looks flawed. It looks to me we need to re-neg on this a bit. I think we can still be more efficient than the general codepath: We can drop the old relfilenode immediately. But pg_class.relfilenode has to differ from the old after the truncation. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WAL logging problem in 9.4.3?
Andres Freund and...@anarazel.de writes: On 2015-07-03 19:26:05 +0200, Andres Freund wrote: commit cab9a0656c36739f59277b34fea8ab9438395869 Author: Tom Lane t...@sss.pgh.pa.us Date: Sun Aug 23 19:23:41 2009 + Make TRUNCATE do truncate-in-place when processing a relation that was created or previously truncated in the current (sub)transaction. This is safe since if the (sub)transaction later rolls back, we'd just discard the rel's current physical file anyway. This avoids unreasonable growth in the number of transient files when a relation is repeatedly truncated. Per a performance gripe a couple weeks ago from Todd Cook. to me the reasoning here looks flawed. It looks to me we need to re-neg on this a bit. I think we can still be more efficient than the general codepath: We can drop the old relfilenode immediately. But pg_class.relfilenode has to differ from the old after the truncation. Why exactly? The first truncation in the (sub)xact would have assigned a new relfilenode, why do we need another one? The file in question will go away on crash/rollback in any case, and no other transaction can see it yet. I'm prepared to believe that some bit of logic is doing the wrong thing in this state, but I do not agree that truncate-in-place is unworkable. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Determine operator from it's function
On 7/3/15 2:33 AM, Heikki Linnakangas wrote: On 07/03/2015 01:20 AM, Jim Nasby wrote: Is there a way to determine the operator that resulted in calling the operator function? I thought fcinfo-flinfo-fn_expr might get set to the OpExpr, but seems it can be a FuncExpr even when called via an operator... Don't think there is. Why do you need to know? I'd like to support arbitrary operators in variant. I did initial testing and it looked like I was getting an OpExpr in fn_expr, but I think that's because I was using a real table to test with. When I do something like 'a' 'b' it looks like the operator gets written out of the plan. If that's indeed what's happening is there a hook I could use to change that behavior? -- Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX Data in Trouble? Get it in Treble! http://BlueTreble.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] More work on SortSupport for text - strcoll() and strxfrm() caching
Since apparently we're back to development work, I thought it was time to share a patch implementing a few additional simple tricks to make sorting text under a non-C locale even faster than in 9.5. These techniques are mostly effective when values are physically clustered together. This might be because there is a physical/logical correlation, but cases involving any kind of clustering of values are helped significantly. Caching == The basic idea is that we cache strxfrm() blobs. Separately, we exploit temporal locality and clustering of values by caching the result of the most recent strcoll()-resolved comparison performed. The strxfrm() technique helps a lot with low cardinality single attribute sorts if we can avoid most strxfrm() work. On the other hand, strcoll() comparison caching particularly helps with multi-attribute sorts where there is a low to moderate cardinality secondary attribute and low cardinality leading attribute. The master branch will still opportunistically take the equality memcmp() fastpath plenty of times for that second attribute, but there are no abbreviated keys to help when that doesn't work out (because it isn't the leading attribute). Regressions == The patch still helps with strcoll() avoidance when the ordering of a moderate cardinality attribute is totally random, but it helps much less there. I have not seen a regression for any case. I'm expecting someone to ask me to do something with the program I wrote last year, to prove the opportunistic memcmp() equality fastpath for text is free [1]. This patch has exactly the same tension as last year's memcmp() equality one [2]: I add something opportunistic, that in general might consistently not work out at all in some cases, and on the face of it implies extra costs for those cases -- costs which must be paid every single time. So as with the opportunistic memcmp() equality thing, the *actual* overhead for cases that do not benefit must be virtually zero for the patch to be worthwhile. That is the standard that I expect that this patch will be held to, too. Benchmark = The query that I've been trying this out with is a typical rollup query, using my cities sample data [3] (this is somewhat, although not perfectly correlated on (country, province) before sorting): postgres=# select country, province, count(*) from cities group by rollup (country, province); country | province | count -+---+ Afghanistan | Badaẖšan | 5 Afghanistan | Bādgīs | 2 Afghanistan | Baġlān | 5 Afghanistan | Balẖ | 6 Afghanistan | Bāmiyān | 3 Afghanistan | Farāh | 3 Afghanistan | Fāryāb | 4 Afghanistan | Ġawr | 3 *** SNIP * ... Zimbabwe| Manicaland | 22 Zimbabwe| Mashonaland Central | 13 Zimbabwe| Mashonaland East | 9 Zimbabwe| Mashonaland West | 21 Zimbabwe| Masvingo | 11 Zimbabwe| Matabeleland North | 8 Zimbabwe| Matabeleland South | 14 Zimbabwe| Midlands | 14 Zimbabwe| [null] |116 [null] | [null] | 317102 (3529 rows) With master, this takes about 525ms when it stabilizes after a few runs on my laptop. With the patch, it takes about 405ms. That's almost a 25% reduction in total run time. If I perform a more direct test of sort performance against this data with minimal non-sorting overhead, I see a reduction of as much as 30% in total query runtime (I chose this rollup query because it is obviously representative of the real world). If this data is *perfectly* correlated (e.g. because someone ran CLUSTER) and some sort can use the dubious bubble sort best case path [4] that we added to qsort back in 2006, the improvement still hold up at ~20%, I've found. Performance of the C locale --- For this particular rollup query, my 25% improvement leaves the collated text sort perhaps marginally faster than an equivalent query that uses the C locale (with or without the patch applied). It's hard to be sure that that effect is real -- many trials are needed -- but it's reasonable to speculate that it's possible to sometimes beat the C locale because of factors like final abbreviated key cardinality. It's easy to *contrive* a case where the C locale is beaten even with 9.5 -- just sort a bunch
Re: [HACKERS] Solaris testers wanted for strxfrm() behavior
On Wed, Jul 01, 2015 at 03:22:33AM -0400, Noah Misch wrote: On Tue, Jun 30, 2015 at 09:45:08AM -0400, Tom Lane wrote: Noah Misch n...@leadboat.com writes: On Sun, Jun 28, 2015 at 07:00:14PM -0400, Tom Lane wrote: Another idea would be to make a test during postmaster start to see if this bug exists, and fail if so. I'm generally on board with the thought that we don't need to work on systems with such a bad bug, but it would be a good thing if the failure was clean and produced a helpful error message, rather than looking like a Postgres bug. Failing cleanly on unpatched Solaris is adequate, agreed. A check at postmaster start isn't enough, because the postmaster might run in the C locale while individual databases or collations use problem locales. The safest thing is to test after every setlocale(LC_COLLATE) and newlocale(LC_COLLATE). That's once at backend start and once per backend per collation used, more frequent than I would like. Hmm. Solaris does not have locale_t or strxfrm_l(), so per-collation checks aren't relevant after all. Checking at postmaster start and backend start seems cheap enough, hence this patch. commit 17d0abf (HEAD) Author: Noah Misch n...@leadboat.com AuthorDate: Fri Jul 3 19:59:50 2015 -0400 Commit: Noah Misch n...@leadboat.com CommitDate: Fri Jul 3 19:59:50 2015 -0400 Revoke support for strxfrm() that write past the specified array length. This formalizes a decision implicit in commit 4ea51cdfe85ceef8afabceb03c446574daa0ac23 and adds clean detection of affected systems. Vendor updates are available for each such known bug. Back-patch to 9.5, where the aforementioned commit first appeared. diff --git a/src/backend/main/main.c b/src/backend/main/main.c index 2ecadd6..4fad6f3 100644 --- a/src/backend/main/main.c +++ b/src/backend/main/main.c @@ -149,6 +149,8 @@ main(int argc, char *argv[]) */ unsetenv(LC_ALL); + check_strxfrm_bug(); + /* * Catch standard options before doing much else, in particular before we * insist on not being root. diff --git a/src/backend/utils/adt/pg_locale.c b/src/backend/utils/adt/pg_locale.c index 84215e0..d91959e 100644 --- a/src/backend/utils/adt/pg_locale.c +++ b/src/backend/utils/adt/pg_locale.c @@ -855,6 +855,64 @@ IsoLocaleName(const char *winlocname) /* + * Detect aging strxfrm() implementations that, in a subset of locales, write + * past the specified buffer length. Affected users must update OS packages + * before using PostgreSQL 9.5 or later. + * + * Assume that the bug can come and go from one postmaster startup to another + * due to physical replication among diverse machines. Assume that the bug's + * presence will not change during the life of a particular postmaster. Given + * those assumptions, call this no less than once per postmaster startup per + * LC_COLLATE setting used. No known-affected system offers strxfrm_l(), so + * there is no need to consider pg_collation locales. + */ +void +check_strxfrm_bug(void) +{ + charbuf[32]; + const int canary = 0x7F; + boolok = true; + + /* +* Given a two-byte ASCII string and length limit 7, 8 or 9, Solaris 10 +* 05/08 returns 18 and modifies 10 bytes. It respects limits above or +* below that range. +* +* The bug is present in Solaris 8 as well; it is absent in Solaris 10 +* 01/13 and Solaris 11.2. Affected locales include is_IS.ISO8859-1, +* en_US.UTF-8, en_US.ISO8859-1, and ru_RU.KOI8-R. Unaffected locales +* include de_DE.UTF-8, de_DE.ISO8859-1, zh_TW.UTF-8, and C. +*/ + buf[7] = canary; + (void) strxfrm(buf, ab, 7); + if (buf[7] != canary) + ok = false; + + /* +* illumos bug #1594 was present in the source tree from 2010-10-11 to +* 2012-02-01. Given an ASCII string of any length and length limit 1, +* affected systems ignore the length limit and modify a number of bytes +* one less than the return value. The problem inputs for this bug do not +* overlap those for the Solaris bug, hence a distinct test. +* +* Affected systems include smartos-20110926T021612Z. Affected locales +* include en_US.ISO8859-1 and en_US.UTF-8. Unaffected locales include C. +*/ + buf[1] = canary; + (void) strxfrm(buf, a, 1); + if (buf[1] != canary) + ok = false; + + if (!ok) + ereport(ERROR, + (errcode(ERRCODE_SYSTEM_ERROR), +errmsg_internal(strxfrm(), in locale \%s\, writes past the specified array length, + setlocale(LC_COLLATE, NULL)), +errhint(Apply system library package updates.))); +} + + +/* * Cache
Re: [HACKERS] Idea: closing the loop for pg_ctl reload
On July 3, 2015 06:21:09 PM Tom Lane wrote: I wonder whether we should consider inventing similar views for pg_hba.conf and pg_ident.conf. (Apologies for the flurry of emails). Was there not an attempt at a view for pg_hba.conf which ended in a lot of bikeshedding and no decisions? -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Idea: closing the loop for pg_ctl reload
On July 3, 2015 09:24:36 PM Jan de Visser wrote: On July 3, 2015 06:21:09 PM Tom Lane wrote: BTW, this version of this patch neglects to update the comments in miscadmin.h, and it makes the return convention for ProcessConfigFileInternal completely unintelligible IMO; the inaccuracy and inconsistency in the comments is a symptom of that. I didn't read it in enough detail to say whether there are other problems. OK, miscadmin.h. I'll go and look what that's all about. And would it make sense to find a better solution for the ProcessConfigFileInternal return value (which is convoluted, I agree - I went for the solution with the least impact on existing code), or should I improve documentation? Heh. I actually touched that file. I completely missed those comments (or saw them, thought that I should update them, and then forgot about them - just as likely). I'll obviously fix this if we carry this to completion. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Idea: closing the loop for pg_ctl reload
Attached a new patch, rebased against the current head. Errors in pg_hba.conf and pg_ident.conf are now also noticed. I checked the documentation for pg_ctl reload, and the only place where it's explained seems to be runtime.sgml and that description is so high-level that adding this new bit of functionality wouldn't make much sense. On Sat, Apr 25, 2015 at 10:03 AM, Jan de Visser j...@de-visser.net wrote: On April 22, 2015 06:04:42 PM Payal Singh wrote: The following review has been posted through the commitfest application: make installcheck-world: not tested Implements feature: tested, failed Spec compliant: not tested Documentation:tested, failed Error in postgresql.conf gives the expected result on pg_ctl reload, although errors in pg_hba.conf file don't. Like before, reload completes fine without any information that pg_hba failed to load and only information is present in the log file. I'm assuming pg_ctl reload should prompt user if file fails to load irrespective of which file it is - postgresql.conf or pg_hba.conf. Will fix. Not hard, just move the code that writes the .pid file to after the pg_hba reload. There is no documentation change so far, but I guess that's not yet necessary. I will update the page for pg_ctl. At least the behaviour of -w/-W in relation to the reload command needs explained. gmake check passed all tests. The new status of this patch is: Waiting on Author jan diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c index df8037b..909a078 100644 --- a/src/backend/postmaster/postmaster.c +++ b/src/backend/postmaster/postmaster.c @@ -1235,6 +1235,15 @@ PostmasterMain(int argc, char *argv[]) #endif /* + * Update postmaster.pid with startup time as the last reload time: + */ + { + char last_reload_info[32]; + snprintf(last_reload_info, 32, %ld %d, (long) MyStartTime, 1); + AddToDataDirLockFile(LOCK_FILE_LINE_LAST_RELOAD, last_reload_info); + } + + /* * Remember postmaster startup time */ PgStartTime = GetCurrentTimestamp(); @@ -2349,6 +2358,8 @@ static void SIGHUP_handler(SIGNAL_ARGS) { int save_errno = errno; + boolreload_success; + charlast_reload_info[32]; PG_SETMASK(BlockSig); @@ -2356,7 +2367,8 @@ SIGHUP_handler(SIGNAL_ARGS) { ereport(LOG, (errmsg(received SIGHUP, reloading configuration files))); - ProcessConfigFile(PGC_SIGHUP); + reload_success = ProcessConfigFile(PGC_SIGHUP); + SignalChildren(SIGHUP); SignalUnconnectedWorkers(SIGHUP); if (StartupPID != 0) @@ -2379,13 +2391,25 @@ SIGHUP_handler(SIGNAL_ARGS) signal_child(PgStatPID, SIGHUP); /* Reload authentication config files too */ - if (!load_hba()) + if (!load_hba()) { + reload_success = 0; ereport(WARNING, (errmsg(pg_hba.conf not reloaded))); + } - if (!load_ident()) + if (!load_ident()) { + reload_success = 0; ereport(WARNING, (errmsg(pg_ident.conf not reloaded))); + } + + /* + * Write the current time and the result of the reload to the + * postmaster.pid file. + */ + snprintf(last_reload_info, 32, %ld %d, +(long) time(NULL), reload_success); + AddToDataDirLockFile(LOCK_FILE_LINE_LAST_RELOAD, last_reload_info); #ifdef EXEC_BACKEND /* Update the starting-point file for future children */ diff --git a/src/backend/utils/misc/guc-file.l b/src/backend/utils/misc/guc-file.l index 5b5846c..2f5537d 100644 --- a/src/backend/utils/misc/guc-file.l +++ b/src/backend/utils/misc/guc-file.l @@ -117,12 +117,13 @@ STRING \'([^'\\\n]|\\.|\'\')*\' * If a hard error occurs, no values will be changed. (There can also be * errors that prevent just one value from being changed.) */ -void +bool ProcessConfigFile(GucContext context) { int elevel; MemoryContext config_cxt; MemoryContext caller_cxt; +bool ok; /* * Config files are processed on startup (by the postmaster only) and on @@ -153,16 +154,19 @@ ProcessConfigFile(GucContext context) /* * Read and apply the config file. We don't need to examine the result. */ - (void) ProcessConfigFileInternal(context, true, elevel); +ok = ProcessConfigFileInternal(context, true, elevel) != NULL; /* Clean up */ MemoryContextSwitchTo(caller_cxt); MemoryContextDelete(config_cxt); + return ok; } /* * This function handles both actual config file (re)loads and execution of - * show_all_file_settings() (i.e., the pg_file_settings view). In the latter + * show_all_file_settings() (i.e., the pg_file_settings view). In the former + * case, the settings are applied and this function returns the ConfigVariable + * list when this is successful, or NULL when it is not. In the latter * case we don't apply any of the settings, but we make all the usual validity * checks, and we return the ConfigVariable list so that it can be printed out * by show_all_file_settings(). @@ -505,9 +509,13 @@ bail_out:
Re: [HACKERS] Rounding to even for numeric data type
Michael Paquier michael.paqu...@gmail.com writes: On Sat, May 2, 2015 at 9:53 PM, Fabien COELHO wrote: Quick review: patches applies, make check is fine, all is well. Thanks for the feedback, Fabien! All the casting tests could be put in numeric.sql, as there are all related to numeric and that would avoid duplicating the values lists. Not sure about that, the tests are placed here to be consistent with for is done for float8. For the documentation, I would also add 3.5 so that rounding to even is even clearer:-) Good idea. I reworked the example in the docs. Pushed with minor adjustments --- you missed updating int8-exp-three-digits.out, and I thought the documentation wording could be better. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] psql :: support for \ev viewname and \sv viewname
Petr Korobeinikov pkorobeini...@gmail.com writes: Fixed. Now both \ev and \sv numbering lines starting with 1. New version attached. Applied with a fair amount of mostly-cosmetic adjustment. As I've already noticed that pg_get_viewdef() does not support full syntax of creating or replacing views. Oh? If that were true, pg_dump wouldn't work on such views. It is kind of a PITA for this purpose that it doesn't include the CREATE text for you, but we're surely not changing that behavior now. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WAL logging problem in 9.4.3?
On Fri, Jul 03, 2015 at 07:21:21PM +0200, Andres Freund wrote: On 2015-07-03 19:14:26 +0200, Martijn van Oosterhout wrote: Am I missing something. ISTM that if the truncate record was simply not logged at all everything would work fine. The whole point is that the table was created in this transaction and so if it exists the table on disk must be the correct representation. That'd not work either. Consider: BEGIN; CREATE TABLE ... INSERT; TRUNCATE; INSERT; COMMIT; If you replay that without a truncation wal record the second INSERT will try to add stuff to already occupied space. And they can have different lengths and stuff, so you cannot just ignore that fact. I was about to disagree with you by suggesting that if the table was created in this transaction then WAL logging is skipped. But testing shows that inserts are indeed logged, as you point out. With inserts the WAL records look as follows (relfilenodes changed): martijn@martijn-jessie:~/git/ctm/docker$ sudo /usr/lib/postgresql/9.4/bin/pg_xlogdump -p /tmp/pgtest/postgres/pg_xlog/ 00010001 |grep -wE '16386|16384|16390' rmgr: Storage len (rec/tot): 16/48, tx: 0, lsn: 0/016A79C8, prev 0/016A79A0, bkp: , desc: file create: base/12139/16384 rmgr: Sequencelen (rec/tot):158/ 190, tx:683, lsn: 0/016B4258, prev 0/016B2508, bkp: , desc: log: rel 1663/12139/16384 rmgr: Storage len (rec/tot): 16/48, tx:683, lsn: 0/016B4318, prev 0/016B4258, bkp: , desc: file create: base/12139/16386 rmgr: Storage len (rec/tot): 16/48, tx:683, lsn: 0/016B9468, prev 0/016B9418, bkp: , desc: file create: base/12139/16390 rmgr: Sequencelen (rec/tot):158/ 190, tx:683, lsn: 0/016BC938, prev 0/016BC880, bkp: , desc: log: rel 1663/12139/16384 rmgr: Sequencelen (rec/tot):158/ 190, tx:683, lsn: 0/016BCAF0, prev 0/016BCAA0, bkp: , desc: log: rel 1663/12139/16384 rmgr: Heaplen (rec/tot): 35/67, tx:683, lsn: 0/016BCBB0, prev 0/016BCAF0, bkp: , desc: insert(init): rel 1663/12139/16386; tid 0/1 rmgr: Btree len (rec/tot): 20/52, tx:683, lsn: 0/016BCBF8, prev 0/016BCBB0, bkp: , desc: newroot: rel 1663/12139/16390; root 1 lev 0 rmgr: Btree len (rec/tot): 34/66, tx:683, lsn: 0/016BCC30, prev 0/016BCBF8, bkp: , desc: insert: rel 1663/12139/16390; tid 1/1 rmgr: Storage len (rec/tot): 16/48, tx:683, lsn: 0/016BCC78, prev 0/016BCC30, bkp: , desc: file truncate: base/12139/16386 to 0 blocks rmgr: Storage len (rec/tot): 16/48, tx:683, lsn: 0/016BCCA8, prev 0/016BCC78, bkp: , desc: file truncate: base/12139/16390 to 0 blocks rmgr: Heaplen (rec/tot): 35/67, tx:683, lsn: 0/016BCCD8, prev 0/016BCCA8, bkp: , desc: insert(init): rel 1663/12139/16386; tid 0/1 rmgr: Btree len (rec/tot): 20/52, tx:683, lsn: 0/016BCD20, prev 0/016BCCD8, bkp: , desc: newroot: rel 1663/12139/16390; root 1 lev 0 rmgr: Btree len (rec/tot): 34/66, tx:683, lsn: 0/016BCD58, prev 0/016BCD20, bkp: , desc: insert: rel 1663/12139/16390; tid 1/1 relname | relfilenode -+- test| 16386 test_id_seq | 16384 test_pkey | 16390 (3 rows) And amazingly, the database cluster successfuly recovers and there's no error now. So the problem is *only* because there is no data in the table at commit time. Which indicates that it's the 'newroot record that saves the day normally. And it's apparently generated by the first insert. Agreed. I think the problem is something else though. Namely that we reuse the relfilenode for heap_truncate_one_rel(). That's just entirely broken afaics. We need to allocate a new relfilenode and write stuff into that. Then we can forgo WAL logging the truncation record. Would that properly initialise the index though? Anyway, this is way outside my expertise, so I'll bow out now. Let me know if I can be of more assistance. Have a nice day, -- Martijn van Oosterhout klep...@svana.org http://svana.org/kleptog/ He who writes carelessly confesses thereby at the very outset that he does not attach much importance to his own thoughts. -- Arthur Schopenhauer signature.asc Description: Digital signature
Re: [HACKERS] patch : Allow toast tables to be moved to a different tablespace
Andreas Karlsson andr...@proxel.se writes: On 03/21/2015 01:19 PM, Julien Tachoires wrote: I am confused by your fix. Wouldn't cleaner fix be to use tbinfo-reltablespace rather than tbinfo-reltoasttablespace when calling ArchiveEntry()? Yes, doing this that way is cleaner. Here is a new version including your fix. Thanks. I am now satisfied with how the patch looks. Thanks for your work. I will mark this as ready for committeer now but not expect any committer to look at it until the commitfest starts. I have just looked through this thread, and TBH I think we should reject this patch altogether --- not RWF, but no we don't want this. The use-case remains hypothetical: no performance numbers showing a real-world benefit have been exhibited AFAICS. And allowing a toast table to be in a different tablespace from its parent opens up a host of corner cases that, despite the many revisions of the patch so far, probably haven't all been addressed yet. (For instance, I wonder whether pg_upgrade copes with toast tables in non-parent tablespaces.) I regret the idea of wasting all the work that's been poured into this, but I think pushing this patch forward will just waste even more time, now and in the future, for benefit that will be at most marginal. If any committers nonetheless want to take this up, be advised that it's far from committable as-is. Here are some notes just from looking at the pg_dump part of the patch: * Addition in pg_backup_archiver.c seems pretty dubious; we don't handle --no-tablespace that way for other cases. * Why is getTables() collecting toast tablespace only from latest-model servers? This will likely lead to doing the wrong thing (ie, dumping incorrect ALTER SET TOAST TABLESPACE pg_default commands) when dumping from an older server. * dumpTOASTTablespace (man, that's an ugly choice of name ... camel case combined with all-upper-case-words is a bad idea) neglects the possibility that it needs to quote the tablespace name. * Assorted random whitespace inconsistencies, only some of which would be cleaned up by pgindent. I've not studied the rest of the patch, but it would clearly need to be gone over very carefully to get to a committable state. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Let PostgreSQL's On Schedule checkpoint write buffer smooth spread cycle by tuning IsCheckpointOnSchedule?
power 1,5 is almost certainly not right for all cases, but it is simple and better. It is better in some cases, as I've been told on my patch. If you have a separate disk for WAL writes the power formula may just degrade performance, or maybe not, or not too much, or it really should be a guc. Well, I just think that it needs more performance testing with various loads and sizes, really. I'm not against this patch at all. And easy to remove if something even better arrives. I don't see the two patches being in conflict. They are not in conflict from a git point of view, or even so it would be trivial to solve. They are in conflict as the patch changes the checkpoint load significantly, which would mean that my X00 hours of performance testing on the checkpoint scheduler should more or less be run again. Ok, it is somehow egoistic, but I'm trying to avoid wasting people time. Another point is that I'm not sure I understand the decision process: for some patch in some area extensive performance tests are required, and for other patches in the same area they would not be. -- Fabien. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PATCH: remove nclients/nthreads constraint from pgbench
On 07/03/2015 07:50 AM, Fabien COELHO wrote: This doesn't behave correctly if you set -j to greater than -c, and also use the rate limit option: Indeed. v3 attached fixed the case when nthreads nclients. Ok, committed. Thanks! - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Let PostgreSQL's On Schedule checkpoint write buffer smooth spread cycle by tuning IsCheckpointOnSchedule?
Hello Andres, In conclusion, and very egoistically, I would prefer if this patch could wait for the checkpoint scheduling patch to be considered, as it would basically invalidate the X00 hours of performance tests I ran:-) These two patches target pretty independent mechanics. If you patch were significantly influenced by this something would be wrong. It might decrease the benefit of your patch a mite, but that's not really a problem. That is not the issue I see. On the principle of performance testing it really means that I should rerun the tests, even if I expect that the overall influence would be pretty small in this case. This is my egoistic argument. Well, probably I would just rerun a few cases to check that the impact is mite, as you said, not all cases. Another point is that I'm not sure that this patch is ripe, in particular I'm skeptical about the hardcoded 1.5 without further testing. Maybe it is good, maybe 1.3 or 1.6 is better, maybe it depends and it should just be a guc with some advises about how to set it. So I really think that it needs more performance figures than it has a positive effect on one load. Well, this is just my opinion, no need to care too much about it:-) -- Fabien. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PATCH:do not set Win32 server-side socket buffer size on windows 2012
On Fri, Jul 3, 2015 at 5:56 AM, Heikki Linnakangas wrote: On 04/10/2015 01:46 PM, chenhj wrote: I was about to commit the attached, but when I tested this between my Windows 8.1 virtual machine and Linux host, I was not able to see any performance difference. It may be because the case is hobbled by other inefficiencies, in the virtualization or somewhere else, but I wonder if others can reproduce the speedup? I just gave this a shot in a 8.1 VM, but I could not reproduce a speedup of more than a couple of percents (sorry no servers available), still this seemed to be some noise. The approach taken by this patch sounds safe enough to me that it should be applied. I mean, we know that on Win8/2k12 the default socket buffer size is 64k so reducing it to 32k would hurt performance. By the way, perhaps the link mentioned in this code should be updated to this one, visibly microsoft has changed an bit the URL shape: https://support.microsoft.com/en-us/kb/823764 A refresh would not hurt. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Fix pgbench --progress report under (very) low rate
On 06/15/2015 09:12 PM, Fabien COELHO wrote: v3 rebase (after pgbench moved to src/bin) and minor style tweaking. v4 adds a fix to another progress timing issue: Currently if pgbench/postgres get stuck somewhere, the report catches up by repeating progresses several time in a row, which looks like that: progress: 10.0 s ... progress: 11.0 s ... stuck... progress: 14.2 s catchup for 11.0 - 14.2 progress: 14.2 s stupid data progress: 14.2 s stupid data progress: 15.0 s ... progress: 16.0 s ... The correction removes the stupid data lines which compute a reports on a very short time, including absurd tps figures. Yet again, shame on me in the first place for this behavior. Thanks, applied. I chose to also backpatch this, although arguably this is a change in behaviour that would not be good to change in a minor version. However, progress reports are a very user-facing feature, it's not going to break anyone's scripts. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PATCH:do not set Win32 server-side socket buffer size on windows 2012
On 3 July 2015 at 20:06, Andres Freund and...@anarazel.de wrote: On 2015-07-02 23:56:16 +0300, Heikki Linnakangas wrote: On 04/10/2015 01:46 PM, chenhj wrote: Result(execute time): default(8K), 7.370s set SO_SNDBUF to 32K, 4.159s(the current implement) set SO_SNDBUF to 64K, 2.875s set SO_SNDBUF to 128K, 1.593s set SO_SNDBUF to 256K, 1.324s I was about to commit the attached, but when I tested this between my Windows 8.1 virtual machine and Linux host, I was not able to see any performance difference. It may be because the case is hobbled by other inefficiencies, in the virtualization or somewhere else, but I wonder if others can reproduce the speedup? Given that too small sockets incur significantly smaller latency bumps in a virtualized environment than in a real network, and hit another set of buffers inside the virtualization technology,, I'm not particularly surprised by that. I'm wondering what the original test setup was. I'm assuming psql and postgres both running on separate windows machines? I've tested the patch just connecting to a database running on localhost and I'm not getting much of a speedup. Perhaps 1%, if that's not noise. I don't have enough hardware here to have client and server on separate machines, at least not with a stable network that goes through copper. Here's the results. Unpatched: -- 100MB Measure-Command { .\psql.exe -d postgres -t -A -c select '1'::char(1000),generate_series(1,10) $null } TotalMilliseconds : 1997.3908 TotalMilliseconds : 2111.4119 TotalMilliseconds : 2040.4415 TotalMilliseconds : 2167.5532 TotalMilliseconds : 2087.6444 TotalMilliseconds : 2117.3759 TotalMilliseconds : 2100.3229 TotalMilliseconds : 2132.3522 TotalMilliseconds : 2129.9487 TotalMilliseconds : 2101.675 Median: 2106.54345 Average: 2098.61165 -- 500MB Measure-Command { .\psql.exe -d postgres -t -A -c select '1'::char(1000),generate_series(1,50) $null } TotalMilliseconds : 10344.4251 TotalMilliseconds : 10248.3671 TotalMilliseconds : 10370.3856 TotalMilliseconds : 10412.507 TotalMilliseconds : 10469.173 TotalMilliseconds : 10248.8889 TotalMilliseconds : 10331.9476 TotalMilliseconds : 10320.7841 TotalMilliseconds : 10470.3022 TotalMilliseconds : 10333.4203 Median: 10338.9227 Average: 10355.02009 Patched: -- 100MB Measure-Command { .\psql.exe -d postgres -t -A -c select '1'::char(1000),generate_series(1,10) $null } TotalMilliseconds : 2066.3701 TotalMilliseconds : 2106.6628 TotalMilliseconds : 2110.2459 TotalMilliseconds : 2047.8337 TotalMilliseconds : 2081.9166 TotalMilliseconds : 2034.7086 TotalMilliseconds : 2082.9072 TotalMilliseconds : 2146.6878 TotalMilliseconds : 2133.351 TotalMilliseconds : 2076.6862 Median: 2082.4119 Average: 2088.73699 -- 500MB Measure-Command { .\psql.exe -d postgres -t -A -c select '1'::char(1000),generate_series(1,50) $null } TotalMilliseconds : 10217.4794 TotalMilliseconds : 10244.8074 TotalMilliseconds : 10451.7265 TotalMilliseconds : 10162.9862 TotalMilliseconds : 10304.1866 TotalMilliseconds : 10374.7922 TotalMilliseconds : 10227.9632 TotalMilliseconds : 10145.5825 TotalMilliseconds : 10298.7048 TotalMilliseconds : 10170.3754 Median: 10236.3853 Average: 10259.86042 Comparison (Unpatched / Patched) 100MB 500MB Median 101.16% 101.00% Average 100.47% 100.93% Regards David Rowley -- David Rowley http://www.2ndQuadrant.com/ http://www.2ndquadrant.com/ PostgreSQL Development, 24x7 Support, Training Services
Re: [HACKERS] copy.c handling for RLS is insecure
On Tue, Dec 02, 2014 at 11:32:27AM -0500, Stephen Frost wrote: * Robert Haas (robertmh...@gmail.com) wrote: On Thu, Nov 27, 2014 at 2:03 AM, Stephen Frost sfr...@snowman.net wrote: Alright, I've done the change to use the RangeVar from CopyStmt, but also added a check wherein we verify that the relation's OID returned from the planned query is the same as the relation's OID that we did the RLS check on- if they're different, we throw an error. Please let me know if there are any remaining concerns. Here is the check in question (added in commit 143b39c): plan = planner(query, 0, NULL); /* * If we were passed in a relid, make sure we got the same one back * after planning out the query. It's possible that it changed * between when we checked the policies on the table and decided to * use a query and now. */ if (queryRelId != InvalidOid) { Oid relid = linitial_oid(plan-relationOids); /* * There should only be one relationOid in this case, since we * will only get here when we have changed the command for the * user from a COPY relation TO to COPY (SELECT * FROM * relation) TO, to allow row level security policies to be * applied. */ Assert(list_length(plan-relationOids) == 1); if (relid != queryRelId) ereport(ERROR, (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE), errmsg(relation referenced by COPY statement has changed))); } That's clearly an improvement, but I'm not sure it's water-tight. What if the name that originally referenced a table ended up referencing a view? Then you could get list_length(plan-relationOids) != 1. I'll test it out and see what happens. Certainly a good question and if there's an issue there then I'll get it addressed. Yes, it can be made to reference a view and trip the assertion. (And, in that case, I also wonder if you could get eval_const_expressions() to do evil things on your behalf while planning.) If it can be made to reference a view then there's an issue as the view might include a function call itself which is provided by the attacker.. Indeed. As the parenthetical remark supposed, the check happens too late to prevent a security breach. planner() has run eval_const_expressions(), executing code of the view owner's choosing. Clearly, if we found a relation originally then we need that same relation with the same OID after the conversion to a query. That is necessary but not sufficient. CREATE RULE can convert a table to a view without changing the OID, thereby fooling the check. Test procedure: -- as superuser (or createrole) create user blackhat; create user alice; -- as blackhat begin; create table exploit_rls_copy (c int); alter table exploit_rls_copy enable row level security; grant select on exploit_rls_copy to public; commit; -- as alice -- first, set breakpoint on BeginCopy copy exploit_rls_copy to stdout; -- as blackhat begin; create or replace function leak() returns int immutable as $$begin raise notice 'in leak()'; return 7; end$$ language plpgsql; create rule _RETURN as on select to exploit_rls_copy do instead select leak() as c from (values (0)) dummy; commit; -- Release breakpoint. leak() function call happens. After that, assertion -- fires if enabled. ERROR does not fire in any case. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Support for N synchronous standby servers - take 2
Hello, This has been registered in the next 2015-09 CF since majority are in favor of adding this multiple sync replication feature (with quorum/priority). New patch will be submitted once we have reached a consensus on the design. -- Beena Emerson
Re: [HACKERS] Determine operator from it's function
On 07/03/2015 01:20 AM, Jim Nasby wrote: Is there a way to determine the operator that resulted in calling the operator function? I thought fcinfo-flinfo-fn_expr might get set to the OpExpr, but seems it can be a FuncExpr even when called via an operator... Don't think there is. Why do you need to know? - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PATCH:do not set Win32 server-side socket buffer size on windows 2012
On 2015-07-02 23:56:16 +0300, Heikki Linnakangas wrote: On 04/10/2015 01:46 PM, chenhj wrote: Result(execute time): default(8K), 7.370s set SO_SNDBUF to 32K, 4.159s(the current implement) set SO_SNDBUF to 64K, 2.875s set SO_SNDBUF to 128K, 1.593s set SO_SNDBUF to 256K, 1.324s I was about to commit the attached, but when I tested this between my Windows 8.1 virtual machine and Linux host, I was not able to see any performance difference. It may be because the case is hobbled by other inefficiencies, in the virtualization or somewhere else, but I wonder if others can reproduce the speedup? Given that too small sockets incur significantly smaller latency bumps in a virtualized environment than in a real network, and hit another set of buffers inside the virtualization technology,, I'm not particularly surprised by that. Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Resource Owner reassign Locks
On 2015-06-07 13:44:08 -0700, Jeff Janes wrote: I'd like to advocate for back-patching this to 9.0, 9.1, and 9.2. It has run without problems for a while now, and it can be considered a bug that systems with a very large number of objects cannot be upgraded in a reasonable time. In that case, how about working on a version for = 9.2 (single one should suffice)? This will likely include a bunch of wrapper functions to avoid changing the API in the back branches. Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PATCH: remove nclients/nthreads constraint from pgbench
Indeed. v3 attached fixed the case when nthreads nclients. Ok, committed. Thanks! Thanks! -- Fabien. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Freeze avoidance of very large table.
On Fri, Jul 3, 2015 at 1:23 AM, Simon Riggs si...@2ndquadrant.com wrote: On 2 July 2015 at 16:30, Sawada Masahiko sawada.m...@gmail.com wrote: Also, the flags of each heap page header might be set PD_ALL_FROZEN, as well as all-visible Is it possible to have VM bits set to frozen but not visible? The description makes those two states sound independent of each other. Are they? Or not? Do we test for an impossible state? It's impossible to have VM bits set to frozen but not visible. These bit are controlled independently. But eventually, when all-frozen bit is set, all-visible is also set. Regards, -- Sawada Masahiko -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PATCH: pgbench - remove thread fork-emulation
On 04/28/2015 02:18 AM, Fabien COELHO wrote: This patch removes the pgbench thread fork-emulation code and simplifies things where possible, especially around pthread_create and pthread_join. The stats collection for the report is done directly instead of using an intermediate structure. As a result, if no thread implementation is available, pgbench is restricted to work with only the main thread (ie pgbench -j 1 ...). == Rational == Pgbench currently provides a thread emulation through process forks. This feature was developed way back when it may have been common that some platforms were not supporting threads. This is now very rare (can you name one such platform?). However, the thread fork-emulation feature has drawbacks: Namely, processes are not threads, the memory is not shared (sure), so it hinders simple implementation for some features, or results in not providing these features with fork-emulation, or having a different behavior under fork-emulation: Latency collection (-r) is not supported with fork emulation. Progress (-P) is reported differently with fork emulation. For a new feature under discussion, which consist in allowing one log instead of per-thread logs, supporting fork-emulation requires a (heavy) post-processing external sort phase whereas with actual threads all threads can share and append to the same log file with limited overhead, which is significantly simpler. I agree with all that, it's time to let the fork-emulation mode to go. Committed. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PostgreSQL 9.5 Alpha 1 build fail with perl 5.22
On Fri, Jul 3, 2015 at 2:47 PM, Marco Atzeri marco.atz...@gmail.com wrote: On 7/2/2015 5:16 PM, Dave Page wrote: The PostgreSQL Global Development Group announces that the alpha release of PostgreSQL 9.5, the latest version of the world's leading open source database, is available today. This release contains previews of all of the features which will be available in the final release of version 9.5, although some details will change before then. Please download, test, and report what you find. Help Test for Bugs -- building on cygwin and $ perl --version This is perl 5, version 22, subversion 0 (v5.22.0) built for cygwin-thread-multi-64int build fail here, anyone seeing the same on other platforms ? make -C hstore_plperl all make[1]: Entering directory '/cygdrive/e/cyg_pub/devel/postgresql/prova/postgres ql-9.5alpha1-1.i686/build/contrib/hstore_plperl' gcc -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -We ndif-labels -Wmissing-format-attribute -Wformat-security -fno-strict-aliasing -fwrapv -fexcess-precision=standard -ggdb -O2 -pipe -Wimplicit-function-declaratio n -I/pub/devel/postgresql/prova/postgresql-9.5alpha1-1.i686/src/postgresql-9.5a lpha1/src/pl/plperl -I/pub/devel/postgresql/prova/postgresql-9.5alpha1-1.i686/sr c/postgresql-9.5alpha1/contrib/hstore -I. -I/pub/devel/postgresql/prova/postgres ql-9.5alpha1-1.i686/src/postgresql-9.5alpha1/contrib/hstore_plperl -I../../src/i nclude -I/pub/devel/postgresql/prova/postgresql-9.5alpha1-1.i686/src/postgresql- 9.5alpha1/src/include -I/usr/lib/perl5/5.22/i686-cygwin-threads-64int/CORE -c -o hstore_plperl.o /pub/devel/postgresql/prova/postgresql-9.5alpha1-1.i686/src/p ostgresql-9.5alpha1/contrib/hstore_plperl/hstore_plperl.c In file included from /pub/devel/postgresql/prova/postgresql-9.5alpha1-1.i686/sr c/postgresql-9.5alpha1/contrib/hstore_plperl/hstore_plperl.c:6:0: /pub/devel/postgresql/prova/postgresql-9.5alpha1-1.i686/src/postgresql-9.5alpha1 /contrib/hstore/hstore.h:79:0: warning: HS_KEY redefined #define HS_KEY(arr_,str_,i_) ((str_) + HSE_OFF((arr_)[2*(i_)])) ^ In file included from /usr/lib/perl5/5.22/i686-cygwin-threads-64int/CORE/perl.h: 3730:0, from /pub/devel/postgresql/prova/postgresql-9.5alpha1-1.i686/sr c/postgresql-9.5alpha1/src/pl/plperl/plperl.h:48, from /pub/devel/postgresql/prova/postgresql-9.5alpha1-1.i686/sr c/postgresql-9.5alpha1/contrib/hstore_plperl/hstore_plperl.c:4: /usr/lib/perl5/5.22/i686-cygwin-threads-64int/CORE/util.h:221:0: note: this is t he location of the previous definition # define HS_KEY(setxsubfn, popmark, apiver, xsver) \ ^ gcc -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -We ndif-labels -Wmissing-format-attribute -Wformat-security -fno-strict-aliasing -fwrapv -fexcess-precision=standard -ggdb -O2 -pipe -Wimplicit-function-declaratio n -shared -o hstore_plperl.dll -Wl,--out-implib=libhstore_plperl.a hstore_plpe rl.o -L../../src/port -L../../src/common -Wl,-no-undefined -Wl,--allow-multiple -definition -Wl,--enable-auto-import -L/usr/local/lib -Wl,--as-needed -L../.. /src/backend -lpostgres -lpgcommon -lpgport -lintl -lssl -lcrypto -lz -lreadline-lcrypt -lldap hstore_plperl.o: In function `hstore_to_plperl': /pub/devel/postgresql/prova/postgresql-9.5alpha1-1.i686/src/postgresql-9.5alpha1 /contrib/hstore_plperl/hstore_plperl.c:16: undefined reference to `hstoreUpgrade ' Hm. dangomushi is using perl 5.22 and the build does not fail: -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WAL logging problem in 9.4.3?
On Fri, Jul 03, 2015 at 02:34:44PM +0900, Fujii Masao wrote: Hmm, for me it is 100% reproducable. Are you familiar with Docker? I can probably construct a Dockerfile that reproduces it pretty reliably. I could reproduce the problem in the master branch by doing the following steps. Thank you, I wasn't sure if you could kill the server fast enough without containers, but it looks like immediate mode is enough. 1. start the PostgreSQL server with wal_level = minimal 2. execute the following SQL statements begin; create table test(id serial primary key); truncate table test; commit; 3. shutdown the server with immediate mode 4. restart the server (crash recovery occurs) 5. execute the following SQL statement select * from test; The optimization of TRUNCATE opereation that we can use when CREATE TABLE and TRUNCATE are executed in the same transaction block seems to cause the problem. In this case, only index file truncation is logged, and index creation in btbuild() is not logged because wal_level is minimal. Then at the subsequent crash recovery, index file is truncated to 0 byte... Very simple fix is to log an index creation in that case, but not sure if that's ok to do.. Looks plausible to me. For reference I attach a small tarball for reproduction with docker. 1. Unpack tarball into empty dir (it has three small files) 2. docker build -t test . 3. docker run -v /tmp/pgtest:/data test 4. docker run -v /tmp/pgtest:/data test Data dir is in /tmp/pgtest Have a nice day, -- Martijn van Oosterhout klep...@svana.org http://svana.org/kleptog/ He who writes carelessly confesses thereby at the very outset that he does not attach much importance to his own thoughts. -- Arthur Schopenhauer postgresql-test.tgz Description: GNU Unix tar archive signature.asc Description: Digital signature
Re: [HACKERS] PostgreSQL 9.5 Alpha 1 build fail with perl 5.22
On Fri, Jul 3, 2015 at 2:47 PM, Marco Atzeri marco.atz...@gmail.com wrote: On 7/2/2015 5:16 PM, Dave Page wrote: The PostgreSQL Global Development Group announces that the alpha release of PostgreSQL 9.5, the latest version of the world's leading open source database, is available today. This release contains previews of all of the features which will be available in the final release of version 9.5, although some details will change before then. Please download, test, and report what you find. Help Test for Bugs -- building on cygwin and $ perl --version This is perl 5, version 22, subversion 0 (v5.22.0) built for cygwin-thread-multi-64int build fail here, anyone seeing the same on other platforms ? make -C hstore_plperl all make[1]: Entering directory '/cygdrive/e/cyg_pub/devel/postgresql/prova/postgres ql-9.5alpha1-1.i686/build/contrib/hstore_plperl' gcc -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -We ndif-labels -Wmissing-format-attribute -Wformat-security -fno-strict-aliasing -fwrapv -fexcess-precision=standard -ggdb -O2 -pipe -Wimplicit-function-declaratio n -I/pub/devel/postgresql/prova/postgresql-9.5alpha1-1.i686/src/postgresql-9.5a lpha1/src/pl/plperl -I/pub/devel/postgresql/prova/postgresql-9.5alpha1-1.i686/sr c/postgresql-9.5alpha1/contrib/hstore -I. -I/pub/devel/postgresql/prova/postgres ql-9.5alpha1-1.i686/src/postgresql-9.5alpha1/contrib/hstore_plperl -I../../src/i nclude -I/pub/devel/postgresql/prova/postgresql-9.5alpha1-1.i686/src/postgresql- 9.5alpha1/src/include -I/usr/lib/perl5/5.22/i686-cygwin-threads-64int/CORE -c -o hstore_plperl.o /pub/devel/postgresql/prova/postgresql-9.5alpha1-1.i686/src/p ostgresql-9.5alpha1/contrib/hstore_plperl/hstore_plperl.c In file included from /pub/devel/postgresql/prova/postgresql-9.5alpha1-1.i686/sr c/postgresql-9.5alpha1/contrib/hstore_plperl/hstore_plperl.c:6:0: /pub/devel/postgresql/prova/postgresql-9.5alpha1-1.i686/src/postgresql-9.5alpha1 /contrib/hstore/hstore.h:79:0: warning: HS_KEY redefined #define HS_KEY(arr_,str_,i_) ((str_) + HSE_OFF((arr_)[2*(i_)])) ^ In file included from /usr/lib/perl5/5.22/i686-cygwin-threads-64int/CORE/perl.h: 3730:0, from /pub/devel/postgresql/prova/postgresql-9.5alpha1-1.i686/sr c/postgresql-9.5alpha1/src/pl/plperl/plperl.h:48, from /pub/devel/postgresql/prova/postgresql-9.5alpha1-1.i686/sr c/postgresql-9.5alpha1/contrib/hstore_plperl/hstore_plperl.c:4: /usr/lib/perl5/5.22/i686-cygwin-threads-64int/CORE/util.h:221:0: note: this is t he location of the previous definition # define HS_KEY(setxsubfn, popmark, apiver, xsver) \ ^ gcc -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -We ndif-labels -Wmissing-format-attribute -Wformat-security -fno-strict-aliasing -fwrapv -fexcess-precision=standard -ggdb -O2 -pipe -Wimplicit-function-declaratio n -shared -o hstore_plperl.dll -Wl,--out-implib=libhstore_plperl.a hstore_plpe rl.o -L../../src/port -L../../src/common -Wl,-no-undefined -Wl,--allow-multiple -definition -Wl,--enable-auto-import -L/usr/local/lib -Wl,--as-needed -L../.. /src/backend -lpostgres -lpgcommon -lpgport -lintl -lssl -lcrypto -lz -lreadline-lcrypt -lldap hstore_plperl.o: In function `hstore_to_plperl': /pub/devel/postgresql/prova/postgresql-9.5alpha1-1.i686/src/postgresql-9.5alpha1 /contrib/hstore_plperl/hstore_plperl.c:16: undefined reference to `hstoreUpgrade ' So... dangomushi is able to build it at least. Here are the logs to the last build for example: http://buildfarm.postgresql.org/cgi-bin/show_stage_log.pl?nm=dangomushidt=2015-07-02%2019%3A19%3A39stg=make-contrib What is the name of the library generated in hstore? Perhaps there is a mismatch here. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Foreign join pushdown vs EvalPlanQual
On 2015/07/02 23:13, Kouhei Kaigai wrote: To be honest, ISTM that it's difficult to do that simply and efficiently for the foreign/custom-join-pushdown API that we have for 9.5. It's a little late, but what I started thinking is to redesign that API so that that API is called at standard_join_search, as discussed in [2]; (1) to place that API call *after* the set_cheapest call and (2) to place another set_cheapest call after that API call for each joinrel. By the first set_cheapest call, I think we could probably save an alternative path that we need in cheapest_builtin_path. By the second set_cheapest call following that API call, we could consider foreign/custom-join-pushdown paths also. What do you think about this idea? Disadvantage is larger than advantage, sorry. The reason why we put foreign/custom-join hook on add_paths_to_joinrel() is that the source relations (inner/outer) were not obvious, thus, we cannot reproduce which relations are the source of this join. So, I had to throw a spoon when I tried this approach before. Maybe I'm missing something, but my image about this approach is that if base relations for a given joinrel are all foreign tables and belong to the same foreign server, then by calling that API there, we consider the remote join over all the foreign tables, and that if not, we give up to consider the remote join. Your understanding is correct, but missing a point. Once foreign tables to be joined are informed as a bitmap (joinrel-relids), it is not obvious for extensions which relations are joined with INNER JOIN, and which ones are joined with OUTER JOIN. I tried to implement a common utility function under the v9.5 cycle, however, it was suspicious whether we can make a reliable logic. Also, I don't want to stick on the assumption that relations involved in remote join are all managed by same foreign-server no longer. The following two ideas introduce possible enhancement of remote join feature that involved local relations; replicated table or transformed to VALUES() clause. http://www.postgresql.org/message-id/CA+Tgmoai_VUF5h6qVLNLU+FKp0aeBCbnnMT3SCvL-HvOpBR=x...@mail.gmail.com http://www.postgresql.org/message-id/9a28c8860f777e439aa12e8aea7694f8010f2...@bpxm15gp.gisp.nec.co.jp Once we have to pay attention to the case of local/foreign relations mixed, we have to care about the path of underlying local or foreign relations managed by other foreign server. I think add_paths_to_joinrel() is the best location for foreign-join, not only custom-join. Relocation to standard_join_search() has larger disadvantage than its advantage. My idea is that we save the cheapest_total_path of RelOptInfo onto the new cheapest_builtin_path just before the GetForeignJoinPaths() hook. Why? It should be a built-in join logic, never be a foreign/custom-join, because of the hook location; only built-in logic shall be added here. My concern about your idea is that since that (a) add_paths_to_joinrel is called multiple times per joinrel and that (b) repetitive add_path calls through GetForeignJoinPaths in add_paths_to_joinrel might remove old paths that are builtin, it's possible to save a path that is not builtin onto the cheapest_total_path and thus to save that path wrongly onto the cheapest_builtin_path. There might be a good way to cope with that, though. For the concern (a), FDW driver can reference RelOptInfo-fdw_private that shall be initialized to NULL, then FDW driver will set valid data if it preliminary adds something. IIRC, postgres_fdw also skips to add same path multiple times. For the concern (b), yep, we may enhance add_path() to retain built-in path, instead of the add_paths_to_joinrel(). Let's adjust the logic a bit. The add_path() can know whether the given path is usual or exceptional (ForeignPath/CustomPath towards none base relation) one. If path is exceptional, the cheapest_builtin_path shall be retained unconditionally. Elsewhere, the cheapest one replace here, then the cheapest built-in path will survive. Is it still problematic? Regarding to the development timeline, I prefer to put something workaround not to kick Assert() on ExecScanFetch(). We may add a warning in the documentation not to replace built-in join if either/both of sub-trees are target of UPDATE/DELETE or FOR SHARE/UPDATE. I'm not sure that that is a good idea, but anyway, I think we need to hurry fixing this issue. My approach is not fix, but avoid. :-) It may be an idea to implement the above fixup even though it may be too large/late to apply v9.5 features, but we can understand how many changes are needed to fixup this problem. Thanks, -- NEC Business Creation Division / PG-Strom Project KaiGai Kohei kai...@ak.jp.nec.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Memory leak fixes for pg_dump, pg_dumpall, initdb and pg_upgrade
On Fri, Jul 3, 2015 at 3:14 AM, Heikki Linnakangas hlinn...@iki.fi wrote: I committed some of these that seemed like improvements on readability grounds, but please just mark the rest as ignore in coverity. Done. Thanks. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Let PostgreSQL's On Schedule checkpoint write buffer smooth spread cycle by tuning IsCheckpointOnSchedule?
On 3 July 2015 at 06:38, Fabien COELHO coe...@cri.ensmp.fr wrote: Hello Simon, We could do better, but that is not a reason not to commit this, as is. Commit, please. My 0,02€: Please do not commit without further testing... I've submitted a patch to improve checkpoint write scheduling, including X00 hours of performance test on various cases. This patch changes significantly the load distribution over the whole checkpoint, and AFAICS has been tested on rather small cases. I'm not sure that the power 1.5 is the right one for all cases. For a big checkpoint over 30 minutes, it may have, or not, very large and possibly unwanted effects. Maybe the 1.5 factor should really be a guc. Well, what I really think is that it needs performance measures. power 1,5 is almost certainly not right for all cases, but it is simple and better. And easy to remove if something even better arrives. I don't see the two patches being in conflict. In conclusion, and very egoistically, I would prefer if this patch could wait for the checkpoint scheduling patch to be considered, as it would basically invalidate the X00 hours of performance tests I ran:-) I recommend making peace with yourself that probably 50% of development time is wasted. But we try to keep the best half. Thank you for your time spent contributing. -- Simon Riggshttp://www.2ndQuadrant.com/ http://www.2ndquadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training Services
Re: [HACKERS] Let PostgreSQL's On Schedule checkpoint write buffer smooth spread cycle by tuning IsCheckpointOnSchedule?
On 2015-07-03 07:38:15 +0200, Fabien COELHO wrote: I've submitted a patch to improve checkpoint write scheduling, including X00 hours of performance test on various cases. This patch changes significantly the load distribution over the whole checkpoint, and AFAICS has been tested on rather small cases. I'm not sure that the power 1.5 is the right one for all cases. For a big checkpoint over 30 minutes, it may have, or not, very large and possibly unwanted effects. Maybe the 1.5 factor should really be a guc. Well, what I really think is that it needs performance measures. In conclusion, and very egoistically, I would prefer if this patch could wait for the checkpoint scheduling patch to be considered, as it would basically invalidate the X00 hours of performance tests I ran:-) These two patches target pretty independent mechanics. If you patch were significantly influenced by this something would be wrong. It might decrease the benefit of your patch a mite, but that's not really a problem. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Synch failover WAS: Support for N synchronous standby servers - take 2
On Fri, Jul 3, 2015 at 12:18 PM, Fujii Masao masao.fu...@gmail.com wrote: On Fri, Jul 3, 2015 at 6:54 AM, Josh Berkus j...@agliodbs.com wrote: On 07/02/2015 12:44 PM, Andres Freund wrote: On 2015-07-02 11:50:44 -0700, Josh Berkus wrote: So there's two parts to this: 1. I need to ensure that data is replicated to X places. 2. I need to *know* which places data was synchronously replicated to when the master goes down. My entire point is that (1) alone is useless unless you also have (2). I think there's a good set of usecases where that's really not the case. Please share! My plea for usecases was sincere. I can't think of any. And do note that I'm talking about information on the replica, not on the master, since in any failure situation we don't have the old master around to check. How would you, even theoretically, synchronize that knowledge to all the replicas? Even when they're temporarily disconnected? You can't, which is why what we need to know is when the replica thinks it was last synced from the replica side. That is, a sync timestamp and lsn from the last time the replica ack'd a sync commit back to the master successfully. Based on that information, I can make an informed decision, even if I'm down to one replica. ... because we would know definitively which servers were in sync. So maybe that's the use case we should be supporting? If you want automated failover you need a leader election amongst the surviving nodes. The replay position is all they need to elect the node that's furthest ahead, and that information exists today. I can do that already. If quorum synch commit doesn't help us minimize data loss any better than async replication or the current 1-redundant, why would we want it? If it does help us minimize data loss, how? In your example of 2 : { local_replica, london_server, nyc_server }, if there is not something like quorum commit, only local_replica is synch and the other two are async. In this case, if the local data center gets destroyed, you need to promote either london_server or nyc_server. But since they are async, they might not have the data which have been already committed in the master. So data loss! Of course, as I said yesterday, they might have all the data and no data loss happens at the promotion. But the point is that there is no guarantee that no data loss happens. OTOH, if we use quorum commit, we can guarantee that either london_server or nyc_server has all the data which have been committed in the master. So I think that quorum commit is helpful for minimizing the data loss. Yeah, quorum commit is helpful for minimizing data loss in comparison with today replication. But in this your case, how can we know which server we should use as the next master server, after local data center got down? If we choose a wrong one, we would get the data loss. Regards, -- Sawada Masahiko -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PROPOSAL] VACUUM Progress Checker.
Hello, TBH, I think that designing this as a hook-based solution is adding a whole lot of complexity for no value. The hard parts of the problem are collecting the raw data and making the results visible to users, and both of those require involvement of the core code. Where is the benefit from pushing some trivial intermediate arithmetic into an external module? If there's any at all, it's certainly not enough to justify problems such as you mention here. So I'd just create a pgstat_report_percent_done() type of interface in pgstat.c and then teach VACUUM to call it directly. Thank you for suggestion. I agree that adding code in core will reduce code complexity with no additional overhead. Going by the consensus, I will update the patch with code to collect and store progress information from vacuum in pgstat.c and UI using pg_stat_activity view. Thank you, Rahila Syed __ Disclaimer: This email and any attachments are sent in strictest confidence for the sole use of the addressee and may contain legally privileged, confidential, and proprietary data. If you are not the intended recipient, please advise the sender by replying promptly to this email and then delete and destroy this email and any attachments without any further use, copying or forwarding. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Synch failover WAS: Support for N synchronous standby servers - take 2
On Fri, Jul 3, 2015 at 5:59 PM, Sawada Masahiko sawada.m...@gmail.com wrote: On Fri, Jul 3, 2015 at 12:18 PM, Fujii Masao masao.fu...@gmail.com wrote: On Fri, Jul 3, 2015 at 6:54 AM, Josh Berkus j...@agliodbs.com wrote: On 07/02/2015 12:44 PM, Andres Freund wrote: On 2015-07-02 11:50:44 -0700, Josh Berkus wrote: So there's two parts to this: 1. I need to ensure that data is replicated to X places. 2. I need to *know* which places data was synchronously replicated to when the master goes down. My entire point is that (1) alone is useless unless you also have (2). I think there's a good set of usecases where that's really not the case. Please share! My plea for usecases was sincere. I can't think of any. And do note that I'm talking about information on the replica, not on the master, since in any failure situation we don't have the old master around to check. How would you, even theoretically, synchronize that knowledge to all the replicas? Even when they're temporarily disconnected? You can't, which is why what we need to know is when the replica thinks it was last synced from the replica side. That is, a sync timestamp and lsn from the last time the replica ack'd a sync commit back to the master successfully. Based on that information, I can make an informed decision, even if I'm down to one replica. ... because we would know definitively which servers were in sync. So maybe that's the use case we should be supporting? If you want automated failover you need a leader election amongst the surviving nodes. The replay position is all they need to elect the node that's furthest ahead, and that information exists today. I can do that already. If quorum synch commit doesn't help us minimize data loss any better than async replication or the current 1-redundant, why would we want it? If it does help us minimize data loss, how? In your example of 2 : { local_replica, london_server, nyc_server }, if there is not something like quorum commit, only local_replica is synch and the other two are async. In this case, if the local data center gets destroyed, you need to promote either london_server or nyc_server. But since they are async, they might not have the data which have been already committed in the master. So data loss! Of course, as I said yesterday, they might have all the data and no data loss happens at the promotion. But the point is that there is no guarantee that no data loss happens. OTOH, if we use quorum commit, we can guarantee that either london_server or nyc_server has all the data which have been committed in the master. So I think that quorum commit is helpful for minimizing the data loss. Yeah, quorum commit is helpful for minimizing data loss in comparison with today replication. But in this your case, how can we know which server we should use as the next master server, after local data center got down? If we choose a wrong one, we would get the data loss. Check the progress of each server, e.g., by using pg_last_xlog_replay_location(), and choose the server which is ahead of as new master. Regards, -- Fujii Masao -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Synch failover WAS: Support for N synchronous standby servers - take 2
On Fri, Jul 3, 2015 at 6:23 PM, Fujii Masao masao.fu...@gmail.com wrote: On Fri, Jul 3, 2015 at 5:59 PM, Sawada Masahiko sawada.m...@gmail.com wrote: Yeah, quorum commit is helpful for minimizing data loss in comparison with today replication. But in this your case, how can we know which server we should use as the next master server, after local data center got down? If we choose a wrong one, we would get the data loss. Check the progress of each server, e.g., by using pg_last_xlog_replay_location(), and choose the server which is ahead of as new master. Thanks. So we can choice the next master server using by checking the progress of each server, if hot standby is enabled. And a such procedure is needed even today replication. I think that the #2 problem which is Josh pointed out seems to be solved; 1. I need to ensure that data is replicated to X places. 2. I need to *know* which places data was synchronously replicated to when the master goes down. And we can address #1 problem using quorum commit. Thought? Regards, -- Sawada Masahiko -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Generalized JSON output functions
On 05/27/2015 09:51 PM, Andrew Dunstan wrote: On 05/27/2015 02:37 PM, Robert Haas wrote: On Tue, May 26, 2015 at 2:50 AM, Shulgin, Oleksandr oleksandr.shul...@zalando.de wrote: Is it reasonable to add this patch to CommitFest now? It's always reasonable to add a patch to the CommitFest if you would like for it to be reviewed and avoid having it get forgotten about. There seems to be some disagreement about whether we want this, but don't let that stop you from adding it to the next CommitFest. I'm not dead set against it either. When I have time I will take a closer look. Andrew, will you have the time to review this? Please add yourself as reviewer in the commitfest app if you do. My 2 cents is that I agree with your initial reaction: This is a lot of infrastructure and generalizing things, for little benefit. Let's change the current code where we generate JSON to be consistent with whitespace, and call it a day. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pgbench - allow backslash-continuations in custom scripts
On 2015-07-03 13:50:02 +0300, Heikki Linnakangas wrote: As Tom pointed out, you need the full lexer to do this correctly. You can argue that something that handles the most common cases is enough, but realistically, by the time you've handled all the common cases correctly, you've just re-invented the lexer. Yes. I think we should either bite the bullet and include the full SQL lexer in pgbench, or come up with some new syntax for marking the beginning and end of a statement. I'm pretty clearly in favor of doing correct lexing. I think we should generalize that and make it reusable. psql has it's own hacked up version already, there seems little point in having variedly good copies around. We could do something like bash here-documents or Postgres dollar-quoting, for example: \set ... select 1234; -- A statement on a single line, no change here -- Begin a multi-line statement \multi-line-statement END_TOKEN select * from complicated; END_TOKEN Not pretty imo. I could see including something esimpler, in addition to the lexer, to allow sending multiple statements in one go. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Re: Synch failover WAS: Support for N synchronous standby servers - take 2
Sawada Masahiko wrote: I think that the #2 problem which is Josh pointed out seems to be solved; 1. I need to ensure that data is replicated to X places. 2. I need to *know* which places data was synchronously replicated to when the master goes down. And we can address #1 problem using quorum commit. Thought? I agree. The knowledge of which servers where in sync(#2) would not actually help us determine the new master and quorum solves #1. - Beena Emerson -- View this message in context: http://postgresql.nabble.com/Support-for-N-synchronous-standby-servers-take-2-tp5849384p5856459.html Sent from the PostgreSQL - hackers mailing list archive at Nabble.com. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Foreign join pushdown vs EvalPlanQual
On 2015/07/03 15:32, Kouhei Kaigai wrote: On 2015/07/02 23:13, Kouhei Kaigai wrote: To be honest, ISTM that it's difficult to do that simply and efficiently for the foreign/custom-join-pushdown API that we have for 9.5. It's a little late, but what I started thinking is to redesign that API so that that API is called at standard_join_search, as discussed in [2]; Disadvantage is larger than advantage, sorry. The reason why we put foreign/custom-join hook on add_paths_to_joinrel() is that the source relations (inner/outer) were not obvious, thus, we cannot reproduce which relations are the source of this join. So, I had to throw a spoon when I tried this approach before. Maybe I'm missing something, but my image about this approach is that if base relations for a given joinrel are all foreign tables and belong to the same foreign server, then by calling that API there, we consider the remote join over all the foreign tables, and that if not, we give up to consider the remote join. Your understanding is correct, but missing a point. Once foreign tables to be joined are informed as a bitmap (joinrel-relids), it is not obvious for extensions which relations are joined with INNER JOIN, and which ones are joined with OUTER JOIN. Can't FDWs get the join information through the root, which I think we would pass to the API as the argument? Also, I don't want to stick on the assumption that relations involved in remote join are all managed by same foreign-server no longer. The following two ideas introduce possible enhancement of remote join feature that involved local relations; replicated table or transformed to VALUES() clause. http://www.postgresql.org/message-id/CA+Tgmoai_VUF5h6qVLNLU+FKp0aeBCbnnMT3SCvL-HvOpBR=x...@mail.gmail.com http://www.postgresql.org/message-id/9a28c8860f777e439aa12e8aea7694f8010f2...@bpxm15gp.gisp.nec.co.jp Interesting! I think add_paths_to_joinrel() is the best location for foreign-join, not only custom-join. Relocation to standard_join_search() has larger disadvantage than its advantage. I agree with you that it's important to ensure the expandability, and my question is, is it possible that the API call from standard_join_search also realize those idea if FDWs can get the join information through the root or something like that? Best regards, Etsuro Fujita -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pgbench - allow backslash-continuations in custom scripts
On 06/21/2015 11:12 AM, Fabien COELHO wrote: Hello Josh, Add backslash continuations to pgbench custom scripts. [...] IMHO this approach is the best compromise. I don't personally agree. I believe that it it worth breaking backwards compatibility to support line breaks in pgbench statements, and that if we're not going to do that, supporting \ continuations is of little value. As someone who actively uses pgbench to write custom benchmarks, I need to write queries which I can test. \ continuation does NOT work on the psql command line, so that's useless for testing my queries; I still have to reformat and troubleshoot. If we added \ continuation, I wouldn't use it. I think we should support line breaks, and require semicolons for end-of-statement. Backwards-compatability in custom pgbench scripts is not critical; pgbench scripts are neither used in produciton, nor used in automated systems much that I know of. I'm not clear on why we'd need a full SQL lexer. Attached is a without lexer version which does ;-terminated SQL commands and \-continuated meta commands (may be useful for \shell and long \set expressions). As Tom pointed out, you need the full lexer to do this correctly. You can argue that something that handles the most common cases is enough, but realistically, by the time you've handled all the common cases correctly, you've just re-invented the lexer. The home-grown lexer is missing e.g. dollar-quoting support, so this is not be parsed correctly: do $$ begin ... end; $$; That would be very nice to handle correctly, I've used DO-blocks in pgbench scripts many times, and it's a pain to have to write them in a single line. Also worth noting that you can currently test so-called multi-statements with pgbench, by putting multiple statements on a single line. Your patch seems to still do that, but if we went with a full-blown SQL lexer, they would probably be split into two statements. I think we should either bite the bullet and include the full SQL lexer in pgbench, or come up with some new syntax for marking the beginning and end of a statement. We could do something like bash here-documents or Postgres dollar-quoting, for example: \set ... select 1234; -- A statement on a single line, no change here -- Begin a multi-line statement \multi-line-statement END_TOKEN select * from complicated; END_TOKEN - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Re: [HACKERS] GSoC 2015 proposal: Improve the performance of “ALTER TABLE .. SET LOGGED / UNLOGGED” statement
On Thu, Apr 2, 2015 at 3:24 PM, Robert Haas robertmh...@gmail.com wrote: On Wed, Mar 25, 2015 at 9:46 PM, Fabrízio de Royes Mello fabriziome...@gmail.com wrote: http://www.postgresql.org/message-id/ca+tgmozm+-0r7h0edpzzjbokvvq+gavkchmno4fypveccw-...@mail.gmail.com I like the idea of the feature a lot, but the proposal to which you refer here mentions some problems, which I'm curious how you think you might solve. (I don't have any good ideas myself, beyond what I mentioned there.) You're right, and we have another design (steps 1 and 2 was implemented last year): *** ALTER TABLE changes 1) ATController 1.1) Acquire an AccessExclusiveLock (src/backend/commands/tablecmds.c - AlterTableGetLockLevel:3023) 2) Prepare to change relpersistence (src/backend/commands/tablecmds.c - ATPrepCmd:3249-3270) • check temp table (src/backend/commands/tablecmds.c - ATPrepChangePersistence:11074) • check foreign key constraints (src/backend/commands/tablecmds.c - ATPrepChangePersistence:11102) 3) FlushRelationBuffers, DropRelFileNodeBuffers and smgrimmedsync (MAIN_FORKNUM, FSM_FORKNUM, VISIBILITYMAP_FORKNUM and INIT_FORKNUM if exists) 4) Create a new fork called TRANSIENT INIT FORK: • from Unlogged to Logged (create _initl fork) (INIT_TO_LOGGED_FORKNUM) ∘ new forkName (src/common/relpath.c) called _initl ∘ insert XLog record to drop it if transaction abort • from Logged to Unlogged (create _initu fork) (INIT_TO_UNLOGGED_FORKUM) ∘ new forkName (src/common/relpath.c) called _initu ∘ insert XLog record to drop it if transaction abort 5) Change the relpersistence in catalog (pg_class-relpersistence) (heap, toast, indexes) 6) Remove/Create INIT_FORK • from Unlogged to Logged ∘ remove the INIT_FORK and INIT_TO_LOGGED_FORK adding to the pendingDeletes queue • from Logged to Unlogged ∘ remove the INIT_TO_UNLOGGED_FORK adding to the pendingDeletes queue ∘ create the INIT_FORK using heap_create_init_fork ∘ insert XLog record to drop init fork if the transaction abort *** CRASH RECOVERY changes 1) During crash recovery (src/backend/access/transam/xlog.c:6507:ResetUnloggedRelations) • if the transient fork _initl exists then ∘ drop the transient fork _initl ∘ if the init fork doesn't exist then create it ∘ reset relation • if the transient fork _initu exists then ∘ drop the transient fork _initl ∘ if the init fork exists then drop it ∘ don't reset the relation Regards, -- Fabrízio de Royes Mello Consultoria/Coaching PostgreSQL Timbira: http://www.timbira.com.br Blog: http://fabriziomello.github.io Linkedin: http://br.linkedin.com/in/fabriziomello Twitter: http://twitter.com/fabriziomello Github: http://github.com/fabriziomello
Re: [HACKERS] Synch failover WAS: Support for N synchronous standby servers - take 2
On 2015-07-02 14:54:19 -0700, Josh Berkus wrote: On 07/02/2015 12:44 PM, Andres Freund wrote: On 2015-07-02 11:50:44 -0700, Josh Berkus wrote: So there's two parts to this: 1. I need to ensure that data is replicated to X places. 2. I need to *know* which places data was synchronously replicated to when the master goes down. My entire point is that (1) alone is useless unless you also have (2). I think there's a good set of usecases where that's really not the case. Please share! My plea for usecases was sincere. I can't think of any. I have important data. I want to survive both a local hardware failure (it's faster to continue using the local standby) and I want to protect myself against actual disaster striking the primary datacenter. Pretty common. And do note that I'm talking about information on the replica, not on the master, since in any failure situation we don't have the old master around to check. How would you, even theoretically, synchronize that knowledge to all the replicas? Even when they're temporarily disconnected? You can't, which is why what we need to know is when the replica thinks it was last synced from the replica side. That is, a sync timestamp and lsn from the last time the replica ack'd a sync commit back to the master successfully. Based on that information, I can make an informed decision, even if I'm down to one replica. I think you're mashing together nearly unrelated topics. Note that we already have the last replayed lsn, and we have the timestamp of the last replayed transaction. If you want automated failover you need a leader election amongst the surviving nodes. The replay position is all they need to elect the node that's furthest ahead, and that information exists today. I can do that already. If quorum synch commit doesn't help us minimize data loss any better than async replication or the current 1-redundant, why would we want it? If it does help us minimize data loss, how? But it does make us safer against data loss? If your app gets back the commit you know that the data has made it both to the local replica and one other datacenter. And you're now safe agains the loss of either the master's hardware (most likely scenario) and safe against the loss of the entire primary datacenter. That you need additional logic to know to which other datacenter to fail over is just yet another piece (which you *can* build today). -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pgbench - allow backslash-continuations in custom scripts
Hello Heikki, I'm not clear on why we'd need a full SQL lexer. Attached is a without lexer version which does ;-terminated SQL commands and \-continuated meta commands (may be useful for \shell and long \set expressions). As Tom pointed out, you need the full lexer to do this correctly. You can argue that something that handles the most common cases is enough, but realistically, by the time you've handled all the common cases correctly, you've just re-invented the lexer. Sure. I understand that part of Josh argument is that we are discussing pgbench test scripts here, not real full-blown applications, and these are expected to be quite basic, plain mostly SQL things. The home-grown lexer is missing e.g. dollar-quoting support, so this is not be parsed correctly: do $$ begin ... end; $$; Hmmm, good one, if indeed you want to use PL/pgSQL or even any arbitrary language in a pgbench scripts... I would rather have created functions (once, outside of pgbench) and would call them from the script, so that would be a simple SELECT. That would be very nice to handle correctly, I've used DO-blocks in pgbench scripts many times, and it's a pain to have to write them in a single line. Yep. With some languages I'm not sure that it is even possible. Also worth noting that you can currently test so-called multi-statements with pgbench, by putting multiple statements on a single line. Yes indeed, behind the hood pgbench expects just one line, or you have to change significantly the way statements are handled, which is way beyond my initial intentions on this one, and this would mean quite a lot of changes for more or less corner cases. Your patch seems to still do that, but if we went with a full-blown SQL lexer, they would probably be split into two statements. I think we should either bite the bullet and include the full SQL lexer in pgbench, or come up with some new syntax for marking the beginning and end of a statement. We could do something like bash here-documents or Postgres dollar-quoting, for example: \set ... select 1234; -- A statement on a single line, no change here -- Begin a multi-line statement \multi-line-statement END_TOKEN select * from complicated; END_TOKEN I do not like the aesthetic of the above much. I really liked the idea of simply writing SQL queries just as in psql. So maybe just handling $$-quoting would be enough to handle reasonable use-cases without troubling pgbench internal working too much? That would be a simple local changes in the line reader. -- Fabien. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] assessing parallel-safety
On Thu, Jul 2, 2015 at 8:49 AM, Amit Kapila amit.kapil...@gmail.com wrote: On Thu, May 21, 2015 at 10:19 PM, Robert Haas robertmh...@gmail.com wrote: On Sat, Mar 21, 2015 at 2:30 PM, Thom Brown t...@linux.com wrote: Looks like one of the patches I applied is newer than the one in your list: HEAD Commit-id: 13a10c0ccd984643ef88997ac177da7c4b7e46a6 parallel-mode-v9.patch assess-parallel-safety-v4.patch parallel-heap-scan.patch parallel_seqscan_v11.patch This patch hasn't been updated for a while, so here is a rebased version now that we're hopefully mostly done making changes to pg_proc.h for 9.5. parallel-mode-v10 was mostly committed, and parallel-heap-scan has now been incorporated into the parallel_seqscan patch. So you should now be able to test this feature by applying just this patch, and the new version of the parallel seqscan patch which I am given to understand that Amit will be posting pretty soon. This patch again needs rebase. Attached, find the rebased patch which can be used to review/test latest version of parallel_seqscan patch which I am going to post in the parallel seq. scan thread soonish. With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com assess-parallel-safety-v6.tar.gz Description: GNU Zip compressed data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pgbench - allow backslash-continuations in custom scripts
I'm pretty clearly in favor of doing correct lexing. I think we should generalize that and make it reusable. psql has it's own hacked up version already, there seems little point in having variedly good copies around. I must admit that I do not know how to share lexer rules but have different actions on them (psql vs sql parser vs ...), as the action code is intrinsically intertwined with expressions. Maybe some hack is possible. Having yet another SQL-lexer to maintain seems highly undesirable, especially just for pgbench. I could see including something esimpler, in addition to the lexer, to allow sending multiple statements in one go. Currently, probably SELECT 1; SELECT 1; Does 2 statements in one go, but it is on one line. May by allowing both continuations and ; at the same time: -- two statements in one go SELECT 1; \ SELECT 1; -- next statement on it's own SELECT 1; Which could be reasonnably neat, and easy to implement. -- Fabien. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers