Re: [HACKERS] Commit fest 2017-01 will begin soon!
On Mon, Feb 13, 2017 at 4:25 AM, Jim Nasby wrote: > On 2/1/17 10:39 PM, Michael Paquier wrote: > >> On Thu, Feb 2, 2017 at 1:23 PM, Amit Kapila >> wrote: >> >>> Many thanks to you for running the show. I think it might be okay if >>> one consolidated mail is sent for all the patches that are marked >>> "Returned with Feedback" or "Rejected" or "Moved to next CF". OTOH, >>> there is some value in sending a separate mail for each patch so that >>> people can argue on the decision by CFM for one particular patch, but >>> not sure if it worth. >>> >> >> People tend to look at emails they are directly on in CC, that's why I >> prefer sending individual emails. >> > > BTW, the app also sends email notifications to anyone named on a patch (or > at least I got notifications). Having a human spend time doing that by hand > seems pretty silly. > > Since the app also knows about the email threads, presumably it could send > notifications to relevant threads as well. > -- Yes, it would be trivial to have the app post a "status changed from x to y" message when that is done. But IIRC we did discuss this around the time the app was dpeloyed and decided we did not want this. But such decisions can of course always be re-visited :) -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/
Re: [HACKERS] gitlab post-mortem: pg_basebackup waiting for checkpoint
On Mon, Feb 13, 2017 at 3:29 AM, Jim Nasby wrote: > On 2/11/17 4:36 AM, Michael Banck wrote: > >> I guess you're right, I've moved it further down. There is in fact a >> message about the xlog location (unless you switch off wal entirely), >> but having another one right before that mentioning the completed >> checkpoint sounds ok to me. >> > > 1) I don't think this should be verbose output. Having a program sit there > "doing nothing" for no apparent reason is just horrible UI design. > That would include much of Unix then.. For example if I run "cp" on a large file it sits around "doing nothing". Same if I do "tar". No? > 2) I think it'd be useful to have a way to get the status of a running > checkpoint. The checkpointer already has that info, and I think it might > even be in shared memory already. If there was a function that reported > checkpoint status pg_basebackup could poll that to provide users with live > status. That should be a separate patch though. I agree that this would definitely be useful. But it might be something that's better exposed as a server-side view? (and if pg_basebackup could poll it it would probably still not be included by default -- only if -P was given). -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/
Re: [HACKERS] removing tsearch2
On Mon, Feb 13, 2017 at 3:09 AM, Jim Nasby wrote: > On 2/10/17 2:24 PM, Andrew Dunstan wrote: > >> There's a bunch of these things in /contrib which really ought to be >>> PGXN extensions (also CUBE, earthdistance, etc.). However, one of the >>> steps in that would be getting the mainstream platforms to package them >>> so that users have a reasonable upgrade path, so I would not propose >>> doing it for 10. >>> >> >> Part of the reason for keeping a number of extensions is that it helps >> test our extension infrastructure. Also they server as good pieces of >> example code. So I don't want to get rid of them all, or even any of >> them that have any degree of significant use. I think these days >> tsearch2 is very largely redundant, so that means there's a good reason >> not to keep it. But that's not true of cube, isn etc. >> > > That's based on an assumption that PGXN shouldn't be treated as part of > the community effort, which I think is a mistake. Having a robust, > community run extension/package/module framework has proven to be extremely > valuable for other programming environments, and IMHO we should be striving > to improve in that area. Until pgxn has a way of helping users on for example Windows (or other platforms where they don't have a pgxs system and a compiler around), it's always going to be a "second class citizen". It's certainly part of the community efforts in many ways, but it's a significant loss of usability compared to things that are included. And from the perspective of the testing the infrastructure, you'd loose a lot of platform coverage (unless you can find a way to integrate pgxn building with the buildfarm). -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/
Re: [HACKERS] WIP: About CMake v2
2017-02-12 20:55 GMT+03:00 Vladimir Rusinov : > Overall, when things go wrong debugging cmake requires cmake knowledge, > while autotools mostly require shell knowledge which is much more common > (again, for sysadmins/packagers). It's not really true because of CMake scripts much easier than tons of crap bash (configure) and m4 scripts in Autotools, also please don't forget Windows MSVC, Xcode and etc usage. PS: I personally like Google's internal version of https://bazel.build/ a > lot. I've never used open-source version but I presume it's similar. While > it has many problems (Java, lack of popular IDE support, lack of popularity > and, again, Java) good parts are rules are both machine- and human- > readable and writable and generally easy to debug. I'm not bold enough to > propose PostgreSQL to use it, but I'd be happy to see ideas from it to be > used elsewhere. We have many build systems, for example, another one http://mesonbuild.com/ but CMake the best today as meta build system.
Re: [HACKERS] Push down more UPDATEs/DELETEs in postgres_fdw
Sorry for delay in the review. I started reviewing the patch again. Patch applied cleanly on latest source as well as regression pass through with the patch. I also performed few manual testing and haven't found any regression. Patch look much cleaner the earlier version, and don't have any major concern as such. Here are few comments: 1) @@ -211,6 +211,12 @@ typedef struct PgFdwDirectModifyState PGresult *result;/* result for query */ intnum_tuples;/* # of result tuples */ intnext_tuple;/* index of next one to return */ +RelationresultRel;/* relcache entry for the target table */ Why we need resultRel? Can't we directly use dmstate->rel ? 2) In the patch somewhere scanrelid condition being used as fscan->scan.scanrelid == 0 where as some place its been used as fsplan->scan.scanrelid > 0. Infact in the same function its been used differently example postgresBeginDirectModify. Can make this consistent. 3) + * If UPDATE/DELETE on a join, create a RETURINING list used in the remote + * query. + */ +if (fscan->scan.scanrelid == 0) +returningList = make_explicit_returning_list(resultRelation, rel, + returningList); + Above block can be moved inside the if (plan->returningLists) condition above the block. Like this: /* * Extract the relevant RETURNING list if any. */ if (plan->returningLists) { returningList = (List *) list_nth(plan->returningLists, subplan_index); /* * If UPDATE/DELETE on a join, create a RETURINING list used in the remote * query. */ if (fscan->scan.scanrelid == 0) returningList = make_explicit_returning_list(resultRelation, rel, returningList); } I am still doing few more testing with the patch, if I will found anything apart from this I will raise that into another mail. Thanks, On Wed, Feb 1, 2017 at 11:50 AM, Michael Paquier wrote: > On Wed, Jan 25, 2017 at 7:20 PM, Etsuro Fujita > wrote: > > Attached is the new version of the patch. I also addressed other > comments > > from you: moved rewriting the fdw_scan_tlist to postgres_fdw.c, > > added/revised comments, and added regression tests for the case where a > > pushed down UPDATE/DELETE on a join has RETURNING. > > > > My apologies for having been late to work on this. > > Moved to CF 2017-03. > -- > Michael > -- Rushabh Lathia
Re: [HACKERS] Bold itemized list entries
On 2017/02/13 16:39, Ashutosh Bapat wrote: > Hi > Bulletted list in Devel branch are all showing up bold as seen in the > screenshot of [1] attached. But the same page for 9.6 branch shows > normal bulletted lists. If I try to bulid docs from the latest > resources (at commit ae0e550ce1703621efb3b75f4558df253cef5bca), the > html renders bulletted items normally. Does anybody else notice this? > > Sorry, if there was some previous report of this and the issue has > been fixed but HTML pages on postgresql.org haven't got refreshed. I've seen some other formatting differences as well when viewing the HEAD versions of the docs on www.postgresql.org (couldn't really tell from the locally built ones viewed in Chrome on Windows, because not sure if the same styling rules are being applied). Screenshots comparing snippets of the create-table refpage are attached (one is for 9.6 and the other for HEAD, as filenames indicate). Thanks, Amit -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] gitlab post-mortem: pg_basebackup waiting for checkpoint
Hi, Am Montag, den 13.02.2017, 09:31 +0100 schrieb Magnus Hagander: > On Mon, Feb 13, 2017 at 3:29 AM, Jim Nasby > wrote: > On 2/11/17 4:36 AM, Michael Banck wrote: > I guess you're right, I've moved it further down. > There is in fact a > message about the xlog location (unless you switch off > wal entirely), > but having another one right before that mentioning > the completed > checkpoint sounds ok to me. > > 1) I don't think this should be verbose output. Having a > program sit there "doing nothing" for no apparent reason is > just horrible UI design. > > > That would include much of Unix then.. For example if I run "cp" on a > large file it sits around "doing nothing". Same if I do "tar". No? The expectation for all three commands is that, even if there is no output on stdout, they will write data to the local machine. So you can easily monitor the progress of cp and tar by running du or something in a different terminal. With pg_basebackup, nothing is happening on the local machine until the checkpoint on the remote is finished; while this is obvious to somebody familiar with how basebackups work internally, it appears to be not clear at all to some users. So I think notifying the user that something is happening remotely while the local process waits would be useful, but on the other hand, pg_basebackup does not print anything unless (i) --verbose is set or (ii) there is an error, so I think having it mention the checkpoint in --verbose mode only is acceptable. Michael -- Michael Banck Projektleiter / Senior Berater Tel.: +49 2166 9901-171 Fax: +49 2166 9901-100 Email: michael.ba...@credativ.de credativ GmbH, HRB Mönchengladbach 12080 USt-ID-Nummer: DE204566209 Trompeterallee 108, 41189 Mönchengladbach Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Logical replication existing data copy - sgml fixes
On 2017-02-09 02:25, Erik Rijkers wrote: On 2017-02-08 23:25, Petr Jelinek wrote: 0001-Use-asynchronous-connect-API-in-libpqwalreceiver-v2.patch 0002-Always-initialize-stringinfo-buffers-in-walsender-v2.patch 0003-Fix-after-trigger-execution-in-logical-replication-v2.patch 0004-Add-RENAME-support-for-PUBLICATIONs-and-SUBSCRIPTION-v2.patch 0001-Logical-replication-support-for-initial-data-copy-v4.patch fixes in create_subscription.sgml --- doc/src/sgml/ref/create_subscription.sgml.orig2 2017-02-11 11:58:10.788502999 +0100 +++ doc/src/sgml/ref/create_subscription.sgml 2017-02-11 12:17:50.069635493 +0100 @@ -55,7 +55,7 @@ Additional info about subscriptions and logical replication as a whole - can is available at and + is available at and . @@ -122,14 +122,14 @@ Name of the replication slot to use. The default behavior is to use - subscription_name for slot name. + subscription_name as the slot name. -COPY DATA -NOCOPY DATA +COPY DATA +NOCOPY DATA Specifies if the existing data in the publication that are being @@ -140,11 +140,11 @@ -SKIP CONNECT +SKIP CONNECT - Instructs the CREATE SUBSCRIPTION to skip initial - connection to the provider. This will change default values of other + Instructs CREATE SUBSCRIPTION to skip initial + connection to the provider. This will change the default values of other options to DISABLED, NOCREATE SLOT and NOCOPY DATA. @@ -181,8 +181,8 @@ Create a subscription to a remote server that replicates tables in - the publications mypubclication and - insert_only and starts replicating immediately on + the publications mypublication and + insert_only and start replicating immediately on commit: CREATE SUBSCRIPTION mysub @@ -193,7 +193,7 @@ Create a subscription to a remote server that replicates tables in - the insert_only publication and does not start replicating + the insert_only publication and do not start replicating until enabled at a later time. CREATE SUBSCRIPTION mysub -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] About pg_dump and recover
Hi all guys, I have a database named edb. In this db , I have lots of schemas. Here I use two schemas: a and b. The two schemas belong to different role. Now , I want to export all objects of schema a . And then using these export file to recover schema b. The user of the schema b can use the schema b normally. What should I do? Best regards, Rey -- View this message in context: http://postgresql.nabble.com/About-pg-dump-and-recover-tp5943880.html Sent from the PostgreSQL - hackers mailing list archive at Nabble.com. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: [[Parallel] Shared] Hash
On Thu, Feb 2, 2017 at 4:57 PM, Rafia Sabih wrote: > On Thu, Feb 2, 2017 at 1:19 AM, Thomas Munro > wrote: >> On Thu, Feb 2, 2017 at 3:34 AM, Rafia Sabih >> wrote: >>> [ regressions ] >> >> Thanks Rafia. At first glance this plan is using the Parallel Shared >> Hash in one place where it should pay off, that is loading the orders >> table, but the numbers are terrible. I noticed that it uses batch >> files and then has to increase the number of batch files, generating a >> bunch of extra work, even though it apparently overestimated the >> number of rows, though that's only ~9 seconds of ~60. I am >> investigating. > > Hi Thomas, > Apart from the previously reported regression, there appear one more > issue in this set of patches. At times, running a query using parallel > hash it hangs up and all the workers including the master shows the > following backtrace, Here's a new version to fix the problems reported by Rafia above. The patch descriptions are as before but it starts from 0002 because 0001 was committed as 7c5d8c16 (thanks, Andres). First, some quick master-vs-patch numbers from the queries listed with regressions, using TPCH dbgen scale 10, work_mem = 64MB, max_parallel_workers_per_gather = 4, shared_buffers = 8GB (the numbers themselves not comparable as different scale and different hardware). Better except for Q5 and Q8, which for some mysterious reason plans only one worker and then loses. I'm looking into that. Q3 19917.682 -> 8649.822 Q5 4149.983 -> 4192.551 Q7 14453.721 -> 10303.911 Q8 1981.540 -> 8030.264 Q9 26928.102 -> 17384.607 Q10 16955.240 -> 14563.787 I plan to explore the performance space with a range of worker numbers and work_mem sizes and do some analysis; more soon. Changes: 1. Fixed two bugs that resulted in ExecHashShrink sometimes hanging, as reported by Rafia. (1) When splitting the large v3 patch up into smaller patches for v4, I'd managed to lose the line that initialises shared->shrink_barrier, causing some occasional strange behaviour. (2) I found a bug[1] in condition_variable.c that could cause hangs and fixed that via a separate patch and the fix was committed as 3f3d60d3 (thanks, Robert). 2. Simplified barrier.c by removing BarrierWaitSet(), because that turned out to be unnecessary to implement rescan as I'd originally thought, and was incompatible with the way BarrierDetach() works. The latter assumes that the phase only ever increments, so that combination of features was broken. 3. Sorted out the hash table sizing logic that was previously leading to some strange decisions about batches. This involved putting the total estimated number of inner rows into the path and plan when there is a partial inner plan, because plan_rows only has the partial number. I need to size the hash table correctly at execution time. It seems a bit strange to do that specifically and only for Hash (see rows_total in the 0008 patch)... should there be some more generic way? Should total rows go into Plan rather than HashPlan, or perhaps the parallel divisor should go somewhere? 4. Comments fixed and added based on Ashutosh's feedback on patch 0003. 5. Various small bug fixes. I've also attached a small set of test queries that hit the four "modes" (for want of a better word) of our hash join algorithm for dealing with different memory conditions, which I've nicknamed thus: 1. "Good": We estimate that the hash table will fit in work_mem, and at execution time it does. This patch makes that more likely because [Parallel] Shared Hash gets to use more work_mem as discussed. 2. "Bad": We estimate that the hash table won't fit in work_mem, but that if we partition it into N batches using some bits from the hash value then each batch will fit in work_mem. At execution time, each batch does indeed fit into work_mem. This is not ideal, because we have to write out and read back in N - (1 / N) inner and outer tuples (ie all batches except the first one, although actually costsize.c always charges for all of them). But it may still be better than other plans, and the IO is sequential. Currently Shared Hash shouldn't be selected over (private) Hash if it would require batching anyway due to the cpu_shared_tuple_cost tie-breaker: on the one had it avoids a bunch of copies of the batch files being written out, but on the other it introduces a bunch of synchronisation overhead. Parallel Shared Hash is fairly likely to be chosen if possible be due to division of the inner relation's cost outweighing cpu_shared_tuple_cost. 3. "Ugly": We planned for "good" or "bad" mode, but we ran out of work_mem at some point during execution: this could be during the initial hash table load, or while loading a subsequent batch. So now we double the number of batches, splitting the current batch and all batches that haven't been processed yet into two in the hope of shrinking the hash table, while generating extra reading and writing of all as-yet unprocessed tuples. Thi
Re: [HACKERS] Documentation improvements for partitioning
On 2017/02/13 14:21, Amit Langote wrote: > On 2017/02/10 19:19, Simon Riggs wrote: >> * The OID inheritance needs work - you shouldn't need to specify a >> partition needs OIDS if the parent has it already. > > That sounds right. It's better to keep the behavior same as for regular > inheritance. Will post a patch to get rid of the unnecessary check. > > FWIW, the check was added to prevent the command from succeeding in the > case where WITHOUT OIDS has been specified for a partition and the parent > partitioned table has the OID column. Regular inheritance simply > *overrides* the WITHOUT OIDS specification, which might be seen as surprising. 0001 of the attached patches takes care of this. >> * There's no tab completion, which prevents people from testing this >> (maybe better now with some docs) > > Will post a patch as well. ...and 0002 for this. >> * ERROR: no partition of relation "measurement_year_month" found for row >> DETAIL: Failing row contains (2016-12-02, 1, 1). >> should not return the whole row, just the partition keys > > I think that makes sense. Something along the lines of > BuildIndexValueDescription() for partition keys will be necessary. Will > post a patch. Let me spend a bit more time on this one. Thanks, Amit >From 42682394d3d40aeaa5b2565a4f0ca23828a0dda0 Mon Sep 17 00:00:00 2001 From: amit Date: Mon, 13 Feb 2017 13:32:18 +0900 Subject: [PATCH 1/3] Inherit OID system column automatically for partitions Currently, WITH OIDS must be explicitly specified when creating a partition if the parent table has the OID system column. Instead, inherit it automatically, possibly overriding any explicit WITHOUT OIDS specification. Per review comment from Simon Riggs --- src/backend/commands/tablecmds.c | 21 - src/test/regress/expected/create_table.out | 18 +- src/test/regress/sql/create_table.sql | 10 ++ 3 files changed, 27 insertions(+), 22 deletions(-) diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c index 37a4c4a3d6..96650e69df 100644 --- a/src/backend/commands/tablecmds.c +++ b/src/backend/commands/tablecmds.c @@ -634,19 +634,14 @@ DefineRelation(CreateStmt *stmt, char relkind, Oid ownerId, relkind == RELKIND_PARTITIONED_TABLE)); descriptor->tdhasoid = (localHasOids || parentOidCount > 0); - if (stmt->partbound) - { - /* If the parent has OIDs, partitions must have them too. */ - if (parentOidCount > 0 && !localHasOids) - ereport(ERROR, - (errcode(ERRCODE_WRONG_OBJECT_TYPE), - errmsg("cannot create table without OIDs as partition of table with OIDs"))); - /* If the parent doesn't, partitions must not have them. */ - if (parentOidCount == 0 && localHasOids) - ereport(ERROR, - (errcode(ERRCODE_WRONG_OBJECT_TYPE), - errmsg("cannot create table with OIDs as partition of table without OIDs"))); - } + /* + * If a partitioned table doesn't have the system OID column, then none + * of its partitions should have it. + */ + if (stmt->partbound && parentOidCount == 0 && localHasOids) + ereport(ERROR, +(errcode(ERRCODE_WRONG_OBJECT_TYPE), + errmsg("cannot create table with OIDs as partition of table without OIDs"))); /* * Find columns with default values and prepare for insertion of the diff --git a/src/test/regress/expected/create_table.out b/src/test/regress/expected/create_table.out index fc92cd92dd..20eb3d35f9 100644 --- a/src/test/regress/expected/create_table.out +++ b/src/test/regress/expected/create_table.out @@ -524,16 +524,24 @@ DROP TABLE temp_parted; CREATE TABLE no_oids_parted ( a int ) PARTITION BY RANGE (a) WITHOUT OIDS; -CREATE TABLE fail_part PARTITION OF no_oids_parted FOR VALUES FROM (1) TO (10 )WITH OIDS; +CREATE TABLE fail_part PARTITION OF no_oids_parted FOR VALUES FROM (1) TO (10) WITH OIDS; ERROR: cannot create table with OIDs as partition of table without OIDs DROP TABLE no_oids_parted; --- likewise, the reverse if also true +-- If the partitioned table has oids, then the partition must have them. +-- If the WITHOUT OIDS option is specified for partition, it is overridden. CREATE TABLE oids_parted ( a int ) PARTITION BY RANGE (a) WITH OIDS; -CREATE TABLE fail_part PARTITION OF oids_parted FOR VALUES FROM (1) TO (10 ) WITHOUT OIDS; -ERROR: cannot create table without OIDs as partition of table with OIDs -DROP TABLE oids_parted; +CREATE TABLE part_forced_oids PARTITION OF oids_parted FOR VALUES FROM (1) TO (10) WITHOUT OIDS; +\d+ part_forced_oids + Table "public.part_forced_oids" + Column | Type | Collation | Nullable | Default | Storage | Stats target | Description ++-+---+--+-+-+--+- + a | integer | | not null | | plain | | +Partition of: oids_parted FOR VALUES FROM (1) TO (10) +Has OIDs: yes + +DROP TABLE oids_parted, part_forced_oids; -- check for
[HACKERS] Sum aggregate calculation for single precsion real
Hi hackers, I wonder why SUM aggregate is calculated for real (float4) type using floating point accumulator? It cause very confusing and unexpected behavior: -- postgres=# select sum(l_quantity) from lineitem where l_shipdate <= '1998-12-01'; sum - 1.52688e+09 (1 row) postgres=# select sum(l_quantity+0.0) from lineitem where l_shipdate <= '1998-12-01'; sum 1529738036 It is specified in any SQL standard how aggregates should be calculated? At least Oracle and MS-SQL are calculating SUM for single precision type in different (and more natual) way. Are there are reasons of using float4pl function for SUM aggregate instead of float4_accum? Konstantin Knizhnik Postgres Professional: http://www.postgrespro.com The Russian Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Walsender crash
Hello. While playing with logical publications/subscriptions I’ve noted that walsender crashes when i’m trying to CREATE SUBSCRIPTION to a current server. So having one running postgres instance on a standard port: stas=# CREATE SUBSCRIPTION mysub CONNECTION 'dbname=stas host=127.0.0.1' PUBLICATION mypub; ^CCancel request sent ERROR: canceling statement due to user request stas=# CREATE SUBSCRIPTION mysub CONNECTION 'dbname=stas host=127.0.0.1' PUBLICATION mypub; WARNING: terminating connection because of crash of another server process Crash happens due to uninitialised StringBuffer reply_message in walsender.c: (lldb) bt * thread #1: tid = 0x13e3f, 0x00010e74ebaf postgres`resetStringInfo(str=0x00010ecd12a0) + 15 at stringinfo.c:96, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x0) * frame #0: 0x00010e74ebaf postgres`resetStringInfo(str=0x00010ecd12a0) + 15 at stringinfo.c:96 frame #1: 0x00010e8778c9 postgres`ProcessRepliesIfAny + 169 at walsender.c:1347 frame #2: 0x00010e8770fd postgres`WalSndWaitForWal(loc=35929928) + 221 at walsender.c:1142 frame #3: 0x00010e876cf2 postgres`logical_read_xlog_page(state=0x7fbcac0446f8, targetPagePtr=35921920, reqLen=8008, targetRecPtr=35929904, cur_page="\x95?", pageTLI=0x7fbcac044fa4) + 50 at walsender.c:717 frame #4: 0x00010e571e5e postgres`ReadPageInternal(state=0x7fbcac0446f8, pageptr=35921920, reqLen=8008) + 542 at xlogreader.c:556 frame #5: 0x00010e571336 postgres`XLogReadRecord(state=0x7fbcac0446f8, RecPtr=35929904, errormsg=0x7fff5178f1e0) + 326 at xlogreader.c:255 frame #6: 0x00010e85ea9b postgres`DecodingContextFindStartpoint(ctx=0x7fbcac044438) + 139 at logical.c:432 frame #7: 0x00010e874dfd postgres`CreateReplicationSlot(cmd=0x7fbcad8004d0) + 333 at walsender.c:796 frame #8: 0x00010e8747a4 postgres`exec_replication_command(cmd_string="CREATE_REPLICATION_SLOT \"mysub\" LOGICAL pgoutput") + 532 at walsender.c:1272 frame #9: 0x00010e8e6adc postgres`PostgresMain(argc=1, argv=0x7fbcad005f50, dbname="stas", username="stas") + 2332 at postgres.c:4064 frame #10: 0x00010e839abe postgres`BackendRun(port=0x7fbcabc033a0) + 654 at postmaster.c:4310 frame #11: 0x00010e838eb3 postgres`BackendStartup(port=0x7fbcabc033a0) + 483 at postmaster.c:3982 frame #12: 0x00010e837ea5 postgres`ServerLoop + 597 at postmaster.c:1722 frame #13: 0x00010e8356b5 postgres`PostmasterMain(argc=3, argv=0x7fbcabc02b90) + 5397 at postmaster.c:1330 frame #14: 0x00010e76371f postgres`main(argc=3, argv=0x7fbcabc02b90) + 751 at main.c:228 frame #15: 0x7fffa951c255 libdyld.dylib`start + 1 frame #16: 0x7fffa951c255 libdyld.dylib`start + 1 Patch with lacking initStringInfo() attached. init_reply_message.diff Description: Binary data -- Stas Kelvich Postgres Professional: http://www.postgrespro.com The Russian Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] \if, \elseif, \else, \endif (was Re: PSQL commands: \quit_if, \quit_unless)
Maybe this can be a discussed in a follow-up patch and Corey should proceed to finalize the if patch? In the event that we can leave prompting to a later patch, here are the v12 highlights: My 0.02€ about v12: Patch applies, make check ok, psql make check ok. - created conditional.h and conditional.c which contain the functions with stack-ish push/pop/peek/poke names Why not. - now all non-test, non-doc changes are in src/bin/psql Hmmm, see below. - moved conditional stack out of scan_state, stack state maintained by mainloop.c/startup.c, passed to HandleSlashCommands ISTM that it is kind of a regression, because logically this is about the scan state so it should be in the corresponding structure, and having two structures to pass the scan state is not an improvement... - documentation encourages the user to employ ON_ERROR_STOP when using conditionals Indeed. The paragraph explanations are clear enough to me. I would suggest to also apply the advice to the example shown, including a comment about why the variable is set on. Also, the last element of the tap tests should be distinct: I suggest to use 'if syntax error' and 'elif syntax error' in place of 'syntax error' for the two first tests. -- Fabien. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] UPDATE of partition key
Currently, an update of a partition key of a partition is not allowed, since it requires to move the row(s) into the applicable partition. Attached is a WIP patch (update-partition-key.patch) that removes this restriction. When an UPDATE causes the row of a partition to violate its partition constraint, then a partition is searched in that subtree that can accommodate this row, and if found, the row is deleted from the old partition and inserted in the new partition. If not found, an error is reported. There are a few things that can be discussed about : 1. We can run an UPDATE using a child partition at any level in a nested partition tree. In such case, we should move the row only within that child subtree. For e.g. , in a tree such as : tab -> t1 -> t1_1 t1_2 t2 -> t2_1 t2_2 For "UPDATE t2 set col1 = 'AAA' " , if the modified tuple does not fit in t2_1 but can fit in t1_1, it should not be moved to t1_1, because the UPDATE is fired using t2. 2. In the patch, as part of the row movement, ExecDelete() is called followed by ExecInsert(). This is done that way, because we want to have the ROW triggers on that (sub)partition executed. If a user has explicitly created DELETE and INSERT BR triggers for this partition, I think we should run those. While at the same time, another question is, what about UPDATE trigger on the same table ? Here again, one can argue that because this UPDATE has been transformed into a DELETE-INSERT, we should not run UPDATE trigger for row-movement. But there can be a counter-argument. For e.g. if a user needs to make sure about logging updates of particular columns of a row, he will expect the logging to happen even when that row was transparently moved. In the patch, I have retained the firing of UPDATE BR trigger. 3. In case of a concurrent update/delete, suppose session A has locked the row for deleting it. Now a session B has decided to update this row and that is going to cause row movement, which means it will delete it first. But when session A is finished deleting it, session B finds that it is already deleted. In such case, it should not go ahead with inserting a new row as part of the row movement. For that, I have added a new parameter 'already_delete' for ExecDelete(). Of course, this still won't completely solve the concurrency anomaly. In the above case, the UPDATE of Session B gets lost. May be, for a user that does not tolerate this, we can have a table-level option that disallows row movement, or will cause an error to be thrown for one of the concurrent session. 4. The ExecSetupPartitionTupleRouting() is re-used for routing the row that is to be moved. So in ExecInitModifyTable(), we call ExecSetupPartitionTupleRouting() even for UPDATE. We can also do this only during execution time for the very first time we find that we need to do a row movement. I will think over that, but I am thinking it might complicate things, as compared to always doing the setup for UPDATE. WIll check on that. 5. Regarding performance testing, I have compared the results of row-movement with partition versus row-movement with inheritance tree using triggers. Below are the details : Schema : CREATE TABLE ptab (a date, b int, c int); CREATE TABLE ptab (a date, b int, c int) PARTITION BY RANGE (a, b); CREATE TABLE ptab_1_1 PARTITION OF ptab for values from ('1900-01-01', 1) to ('1900-01-01', 101) PARTITION BY range (c); CREATE TABLE ptab_1_1_1 PARTITION OF ptab_1_1 for values from (1) to (51); CREATE TABLE ptab_1_1_2 PARTITION OF ptab_1_1 for values from (51) to (101); . . CREATE TABLE ptab_1_1_n PARTITION OF ptab_1_1 for values from (n) to (n+m); .. .. CREATE TABLE ptab_5_n PARTITION OF ptab for values from ('1905-01-01', 101) to ('1905-01-01', 201) PARTITION BY range (c); CREATE TABLE ptab_1_2_1 PARTITION OF ptab_1_2 for values from (1) to (51); CREATE TABLE ptab_1_2_2 PARTITION OF ptab_1_2 for values from (51) to (101); . . CREATE TABLE ptab_1_2_n PARTITION OF ptab_1_2 for values from (n) to (n+m); . . Similarly for inheritance : CREATE TABLE ptab_1_1 (constraint check_ptab_1_1 check (a = '1900-01-01' and b >= 1 and b < 8)) inherits (ptab); create trigger brutrig_ptab_1_1 before update on ptab_1_1 for each row execute procedure ptab_upd_trig(); CREATE TABLE ptab_1_1_1 (constraint check_ptab_1_1_1 check (c >= 1 and c < 51)) inherits (ptab_1_1); create trigger brutrig_ptab_1_1_1 before update on ptab_1_1_1 for each row execute procedure ptab_upd_trig(); CREATE TABLE ptab_1_1_2 (constraint check_ptab_1_1_2 check (c >= 51 and c < 101)) inherits (ptab_1_1); create trigger brutrig_ptab_1_1_2 before update on ptab_1_1_2 for each row execute procedure ptab_upd_trig(); I had to have a BR UPDATE trigger on each of the leaf tables. Attached is the BR trigger function update_trigger.sql. There it generates the table na
Re: [HACKERS] Parallel Index Scans
On Sat, Feb 11, 2017 at 6:35 PM, Amit Kapila wrote: Why can't we rely on _bt_walk_left? >>> The reason is mentioned in comments, but let me try to explain with >>> some example. When you reach that point of code, it means that either >>> the current page (assume page number is 10) doesn't contain any >>> matching items or it is a half-dead page, both of which indicates that >>> we have to move to the previous page. Now, before checking if the >>> current page contains matching items, we signal parallel machinery >>> (via _bt_parallel_release) to allow workers to read the previous page >>> (assume previous page number is 9). So it is quite possible that >>> after deciding that current page (page number 10) doesn't contain any >>> matching tuples if we directly move to the previous page (in this case >>> it will be 9) by using _bt_walk_left, some other worker would have >>> read page 9. In short, if we directly use _bt_walk_left(), then we >>> are prone to returning some of the values twice as multiple workers >>> can read the same page. >> But ... the entire point of the seize-and-release stuff is to avoid >> this problem. You're suppose to seize the scan, read the current >> page, walk left, store the page you find in the scan, and then release >> the scan. > Exactly and that is what is done in the patch. Basically, if we found > that the current page is half-dead or it doesn't contain any matching > items, then release the current buffer, seize the scan, read the > current page, walk left and so on. I am slightly confused here > because it seems both of us agree on what is the right thing to do and > according to me that is how it is implemented. Are you just ensuring > about whether I have implemented as discussed or do you see a problem > with the way it is implemented? Well, before, I thought you said that relying entirely on _bt_walk_left couldn't work because then two people might end up running it at the same time, and that would cause problems. But if you can only run _bt_walk_left while you've got the scan seized, then that can't happen. Evidently I'm missing something here. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Should we cacheline align PGXACT?
Am Samstag, den 11.02.2017, 00:28 +0100 schrieb Tomas Vondra: > Comparing averages of tps, measured on 5 runs (each 5 minutes long), > the > difference between master and patched master is usually within 2%, > which > is pretty much within noise. > > I'm attaching spreadsheets with summary of the results, so that we > have > it in the archives. As usual, the scripts and much more detailed > results > are available here: I've done some benchmarking of this patch against the E850/ppc64el Ubuntu LPAR we currently have access to and got the attached results. pg_prewarm as recommended by Alexander was used, the tests run 300s secs, scale 1000, each with a testrun before. The SELECT-only pgbench was run twice each, the write tests only once. Looks like the influence of this patch isn't that big, at least on this machine. We're going to reassign the resources to an AIX LPAR soon, which doesn't give me enough time to test with Tomas' test scripts again. >> clients 10 < transaction type: scaling factor: 1000 query mode: prepared number of clients: 10 number of threads: 10 duration: 300 s number of transactions actually processed: 39931683 latency average = 0.075 ms tps = 133105.478637 (including connections establishing) tps = 133109.428803 (excluding connections establishing) >> clients 20 < transaction type: scaling factor: 1000 query mode: prepared number of clients: 20 number of threads: 20 duration: 300 s number of transactions actually processed: 78852558 latency average = 0.076 ms tps = 262841.340316 (including connections establishing) tps = 262855.469408 (excluding connections establishing) >> clients 40 < transaction type: scaling factor: 1000 query mode: prepared number of clients: 40 number of threads: 40 duration: 300 s number of transactions actually processed: 118612968 latency average = 0.101 ms tps = 395375.595262 (including connections establishing) tps = 395432.166071 (excluding connections establishing) >> clients 80 < transaction type: scaling factor: 1000 query mode: prepared number of clients: 80 number of threads: 80 duration: 300 s number of transactions actually processed: 170171627 latency average = 0.141 ms tps = 567235.428660 (including connections establishing) tps = 567322.181229 (excluding connections establishing) >> clients 100 < transaction type: scaling factor: 1000 query mode: prepared number of clients: 100 number of threads: 100 duration: 300 s number of transactions actually processed: 168267642 latency average = 0.178 ms tps = 560888.338862 (including connections establishing) tps = 561076.995776 (excluding connections establishing) >> clients 160 < transaction type: scaling factor: 1000 query mode: prepared number of clients: 160 number of threads: 160 duration: 300 s number of transactions actually processed: 146675068 latency average = 0.327 ms tps = 488911.857866 (including connections establishing) tps = 489155.251547 (excluding connections establishing) >> clients 240 < transaction type: scaling factor: 1000 query mode: prepared number of clients: 240 number of threads: 240 duration: 300 s number of transactions actually processed: 139558941 latency average = 0.516 ms tps = 465184.240782 (including connections establishing) tps = 466046.696168 (excluding connections establishing) >> clients 280 < transaction type: scaling factor: 1000 query mode: prepared number of clients: 280 number of threads: 280 duration: 300 s number of transactions actually processed: 137419242 latency average = 0.611 ms tps = 458052.555736 (including connections establishing) tps = 459795.543770 (excluding connections establishing) >> clients 10 < transaction type: scaling factor: 1000 query mode: prepared number of clients: 10 number of threads: 10 duration: 300 s number of transactions actually processed: 40994263 latency average = 0.073 ms tps = 136647.383534 (including connections establishing) tps = 136649.850380 (excluding connections establishing) >> clients 20 < transaction type: scaling factor: 1000 query mode: prepared number of clients: 20 number of threads: 20 duration: 300 s number of transactions actually processed: 71023846 latency average = 0.084 ms tps = 236745.708077 (including connections establishing) tps = 236752.743230 (excluding connections establishing) >> clients 40 < transaction type: scaling factor: 1000 query mode: prepared number of clients: 40 number of threads: 40 duration: 300 s number of transactions actually processed: 120060357 latency average = 0.100 ms tps = 400200.095159 (including connections establishing) tps = 400238.563079 (excluding connections establishing) >> clients 80 < transaction type: scaling factor: 1000 query mode: prepared number of clients: 80 number of threads: 80 duration: 300 s number of transactions actually processed: 170353955 latency average = 0.141 ms tps = 567843.510912 (including connections establishing) tps = 567913
Re: [HACKERS] Parallel bitmap heap scan
On Sat, Feb 11, 2017 at 1:41 AM, Dilip Kumar wrote: > I tried my best, please check the latest patch (0001). I don't think it's acceptable (or necessary) to move the DSA definitions into postgres.h. Why do you think you need to do that, vs. just including dsa.h in a few more places? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 'text' instead of 'unknown' in Postgres 10
On Mon, Feb 13, 2017 at 3:09 AM, Jim Nasby wrote: > On 2/7/17 9:16 AM, Daniele Varrazzo wrote: >> >> Thank you for the clarification: I will assume the behaviour cannot be >> maintained on PG 10 and think whether the treatment of '{}' is too >> magical and drop it instead. > > > BTW, I would hope that passing '{}' into a defined array field still works, > since an empty array isn't treated the same as NULL, which means you need > some way to create an empty array. Yes, that didn't change. The issue was only reading data from postgres to python. There is a beta of next psycopg version available on testpypi which implements the new behaviour, if you want to take a look at it. You can use pip install -i https://testpypi.python.org/pypi psycopg2==2.7b1 to install it. -- Daniele -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel Append implementation
On Mon, Feb 6, 2017 at 11:06 AM, Amit Khandekar wrote: > Ashutosh Bapat wrote: >>> We may want to think about a third goal: preventing too many workers >>> from executing the same plan. As per comment in get_parallel_divisor() >>> we do not see any benefit if more than 4 workers execute the same >>> node. So, an append node can distribute more than 4 worker nodes >>> equally among the available subplans. It might be better to do that as >>> a separate patch. >> >> I think that comment is for calculating leader contribution. It does >> not say that 4 workers is too many workers in general. >> >> But yes, I agree, and I have it in mind as the next improvement. >> Basically, it does not make sense to give more than 3 workers to a >> subplan when parallel_workers for that subplan are 3. For e.g., if >> gather max workers is 10, and we have 2 Append subplans s1 and s2 with >> parallel workers 3 and 5 respectively. Then, with the current patch, >> it will distribute 4 workers to each of these workers. What we should >> do is : once both of the subplans get 3 workers each, we should give >> the 7th and 8th worker to s2. >> >> Now that I think of that, I think for implementing above, we need to >> keep track of per-subplan max_workers in the Append path; and with >> that, the bitmap will be redundant. Instead, it can be replaced with >> max_workers. Let me check if it is easy to do that. We don't want to >> have the bitmap if we are sure it would be replaced by some other data >> structure. > > Attached is v2 patch, which implements above. Now Append plan node > stores a list of per-subplan max worker count, rather than the > Bitmapset. But still Bitmapset turned out to be necessary for > AppendPath. More details are in the subsequent comments. > > >>> Goal A : Allow non-partial subpaths in Partial Append. >>> Goal B : Distribute workers across the Append subplans. >>> Both of these require some kind of synchronization while choosing the >>> next subplan. So, goal B is achieved by doing all the synchronization >>> stuff. And implementation of goal A requires that goal B is >>> implemented. So there is a dependency between these two goals. While >>> implementing goal B, we should keep in mind that it should also work >>> for goal A; it does not make sense later changing the synchronization >>> logic in goal A. >>> >>> I am ok with splitting the patch into 2 patches : >>> a) changes required for goal A >>> b) changes required for goal B. >>> But I think we should split it only when we are ready to commit them >>> (commit for B, immediately followed by commit for A). Until then, we >>> should consider both of these together because they are interconnected >>> as explained above. >> >> For B, we need to know, how much gain that brings and in which cases. >> If that gain is not worth the complexity added, we may have to defer >> Goal B. Goal A would certainly be useful since it will improve >> performance of the targetted cases. The synchronization required for >> Goal A is simpler than that of B and thus if we choose to implement >> only A, we can live with a simpler synchronization. > > For Goal A , the logic for a worker synchronously choosing a subplan will be : > Go the next subplan. If that subplan has not already assigned max > workers, choose this subplan, otherwise, go the next subplan, and so > on. Right, at a given time, we have to remember only the next plan to assign worker to. That's simpler than remembering the number of workers for each subplan and updating those concurrently. That's why I am saying synchronization for A is simpler than that of B. > For Goal B , the logic will be : > Among the subplans which are yet to achieve max workers, choose the > subplan with the minimum number of workers currently assigned. > > I don't think there is a significant difference between the complexity > of the above two algorithms. So I think here the complexity does not > look like a factor based on which we can choose the particular logic. > We should choose the logic which has more potential for benefits. The > logic for goal B will work for goal A as well. And secondly, if the > subplans are using their own different system resources, the resource > contention might be less. One case is : all subplans using different > disks. Second case is : some of the subplans may be using a foreign > scan, so it would start using foreign server resources sooner. These > benefits apply when the Gather max workers count is not sufficient for > running all the subplans in their full capacity. If they are > sufficient, then the workers will be distributed over the subplans > using both the logics. Just the order of assignments of workers to > subplans will be different. > > Also, I don't see a disadvantage if we follow the logic of Goal B. Do we have any performance measurements where we see that Goal B performs better than Goal A, in such a situation? Do we have any performance measurement comparing these two approaches in
Re: [HACKERS] Parallel bitmap heap scan
On Mon, Feb 13, 2017 at 6:24 PM, Robert Haas wrote: > I don't think it's acceptable (or necessary) to move the DSA > definitions into postgres.h. Why do you think you need to do that, > vs. just including dsa.h in a few more places? I need to access dsa_pointer in tidbitmap.h, which is included from FRONTEND as well. Now, problem is that dsa.h is including #include "port/atomics.h", but atomic.h can not be included if FRONTEND is defined. #ifndef ATOMICS_H #define ATOMICS_H #ifdef FRONTEND #error "atomics.h may not be included from frontend code" #endif Is there any other solution to this ? -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Logical replication existing data copy
On 2017-02-11 11:16, Erik Rijkers wrote: On 2017-02-08 23:25, Petr Jelinek wrote: 0001-Use-asynchronous-connect-API-in-libpqwalreceiver-v2.patch 0002-Always-initialize-stringinfo-buffers-in-walsender-v2.patch 0003-Fix-after-trigger-execution-in-logical-replication-v2.patch 0004-Add-RENAME-support-for-PUBLICATIONs-and-SUBSCRIPTION-v2.patch 0001-Logical-replication-support-for-initial-data-copy-v4.patch This often works but it also fails far too often (in my hands). I test whether the tables are identical by comparing an md5 from an ordered resultset, from both replica and master. I estimate that 1 in 5 tries fail; 'fail' being a somewhat different table on replica (compared to mater), most often pgbench_accounts (typically there are 10-30 differing rows). No errors or warnings in either logfile. I'm not sure but I think testing on faster machines seem to be doing somewhat better ('better' being less replication error). I have noticed that when I insert a few seconds wait-state after the create subscription (or actually: the 'enable'ing of the subscription) the problem does not occur. Apparently, (I assume) the initial snapshot occurs somewhere when the subsequent pgbench-run has already started, so that the logical replication also starts somewhere 'into' that pgbench-run. Does that make sense? I don't know what to make of it. Now that I think that I understand what happens I hesitate to call it a bug. But I'd say it's still a useability problem that the subscription is only 'valid' after some time, even if it's only a few seconds. (the other problem I mentioned (drop subscription hangs) still happens every now and then) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Should we cacheline align PGXACT?
On Mon, Feb 13, 2017 at 3:34 PM, Bernd Helmle wrote: > Am Samstag, den 11.02.2017, 00:28 +0100 schrieb Tomas Vondra: > > Comparing averages of tps, measured on 5 runs (each 5 minutes long), > > the > > difference between master and patched master is usually within 2%, > > which > > is pretty much within noise. > > > > I'm attaching spreadsheets with summary of the results, so that we > > have > > it in the archives. As usual, the scripts and much more detailed > > results > > are available here: > > I've done some benchmarking of this patch against the E850/ppc64el > Ubuntu LPAR we currently have access to and got the attached results. > pg_prewarm as recommended by Alexander was used, the tests run 300s > secs, scale 1000, each with a testrun before. The SELECT-only pgbench > was run twice each, the write tests only once. > > Looks like the influence of this patch isn't that big, at least on this > machine. > Thank you for testing. Yes, influence seems to be low. But nevertheless it's important to insure that there is no regression here. Despite pg_prewarm'ing and running tests 300s, there is still significant variation. For instance, with clients count = 80: * pgxact-result-2.txt – 474704 * pgxact-results.txt – 574844 Could some background processes influence the tests? Or could it be another virtual machine? Also, I wonder why I can't see this variation on the graphs. Another issue with graphs is that we can't see details of read and write TPS variation on the same scale, because write TPS values are too low. I think you should draw write benchmark on the separate graph. -- Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
[HACKERS] VOPS: vectorized executor for Postgres: how to speedup OLAP queries more than 10 times without changing anything in Postgres executor
Hello hackers, There were many discussions concerning possible ways of speeding-up Postgres. Different approaches were suggested: - JIT (now we have three different prototype implementations based on LLVM) - Chunked (vectorized) executor - Replacing pull with push - Columnar store (cstore_fdw, IMCS) - Optimizing and improving current executor (reducing tuple deform overhead, function call overhead,...) Obviously the best result can be achieved in case of combining all this approaches. But actually them are more or less interchangeable: vectorized execution is not eliminating interpretation overhead, but it is divided by vector size and becomes less critical. I decided to write small prototype to estimate possible speed improvement of vectorized executor. I created special types representing "tile" and implement standard SQL operators for them. So neither Postgres planer, nether Postgres executor, nether Postgres heap manager are changed. But I was able to reach more than 10 times speed improvement on TPC-H Q1/Q6 queries! Please find more information here: https://cdn.rawgit.com/postgrespro/vops/ddcbfbe6/vops.html The sources of the project can be found here: https://github.com/postgrespro/vops.git -- Konstantin Knizhnik Postgres Professional: http://www.postgrespro.com The Russian Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] LWLock optimization for multicore Power machines
Am Samstag, den 11.02.2017, 15:42 +0300 schrieb Alexander Korotkov: > Thus, I see reasons why in your tests absolute results are lower than > in my > previous tests. > 1. You use 28 physical cores while I was using 32 physical cores. > 2. You run tests in PowerVM while I was running test on bare metal. > PowerVM could have some overhead. > 3. I guess you run pgbench on the same machine. While in my tests > pgbench > was running on another node of IBM E880. > Yeah, pgbench was running locally. Maybe we can get some resources to run them remotely. Interesting side note: If you run a second postgres instance with the same pgbench in parallel, you get nearly the same transaction throughput as a single instance. Short side note: If you run two Postgres instances concurrently with the same pgbench parameters, you get nearly the same transaction throughput for both instances each as when running against a single instance, e.g. - single transaction type: scaling factor: 1000 query mode: prepared number of clients: 112 number of threads: 112 duration: 300 s number of transactions actually processed: 121523797 latency average = 0.276 ms latency stddev = 0.096 ms tps = 405075.282309 (including connections establishing) tps = 405114.299174 (excluding connections establishing) instance-1/instance-2 concurrently run: transaction type: scaling factor: 1000 query mode: prepared number of clients: 112 number of threads: 56 duration: 300 s number of transactions actually processed: 120645351 latency average = 0.278 ms latency stddev = 0.158 ms tps = 402148.536087 (including connections establishing) tps = 402199.952824 (excluding connections establishing) transaction type: scaling factor: 1000 query mode: prepared number of clients: 112 number of threads: 56 duration: 300 s number of transactions actually processed: 121959772 latency average = 0.275 ms latency stddev = 0.110 ms tps = 406530.139080 (including connections establishing) tps = 406556.658638 (excluding connections establishing) So it looks like the machine has plenty of power, but PostgreSQL is limiting somewhere. > Therefore, having lower absolute numbers in your tests, win of LWLock > optimization is also lower. That is understandable. But win of > LWLock > optimization is clearly visible definitely exceeds variation. > > I think it would make sense to run more kinds of tests. Could you > try set > of tests provided by Tomas Vondra? > If even we wouldn't see win some of the tests, it would be still > valuable > to see that there is no regression there. Unfortunately there are some test for AIX scheduled, which will assign resources to that LPAR...i've just talked to the people responsible for the machine and we can get more time for the Linux tests ;) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] libpq Alternate Row Processor
On 2/9/17 7:15 PM, Jim Nasby wrote: > Can you run a trace to see where all the time is going in the single row > case? I don't see an obvious time-suck with a quick look through the code. > It'd be interesting to see how things change if you eliminate the filler > column from the SELECT. Traces are attached, these are with callgrind. profile_nofiller.txt: single row without filler column profile_filler.txt: single row with filler column profile_filler_callback.txt: callback with filler column pqResultAlloc looks to hit malloc pretty hard. The callback reduces all of that to a single malloc for each row. Without the filler, here is the average over 11 runs: Realusersys Callback.133.033.035 Single Row .170.112.029 For the callback case, it's slightly higher than the prior results with the filler column. Profile data file 'callgrind.out.14930' (creator: callgrind-3.11.0) I1 cache: D1 cache: LL cache: Timerange: Basic block 0 - 74120972 Trigger: Program termination Profiled target: ./test -m row (PID 14930, part 1) Events recorded: Ir Events shown: Ir Event sort order: Ir Thresholds: 99 Include dirs: User annotated: Auto-annotation: off Ir 313,455,690 PROGRAM TOTALS Ir file:function 61,410,828 ???:_int_malloc [/usr/lib64/libc-2.17.so] 38,321,887 ???:_int_free [/usr/lib64/libc-2.17.so] 25,800,115 ???:pqResultAlloc [/usr/local/pgsql/lib/libpq.so.5.10] 20,611,330 ???:pqParseInput3 [/usr/local/pgsql/lib/libpq.so.5.10] 16,002,817 ???:malloc [/usr/lib64/libc-2.17.so] 14,800,004 ???:pqRowProcessor [/usr/local/pgsql/lib/libpq.so.5.10] 12,604,893 ???:pqGetInt [/usr/local/pgsql/lib/libpq.so.5.10] 10,400,004 ???:PQsetResultAttrs [/usr/local/pgsql/lib/libpq.so.5.10] 10,200,316 main.c:main [/usr/local/src/postgresql-perf/test] 9,600,000 ???:check_tuple_field_number [/usr/local/pgsql/lib/libpq.so.5.10] 8,300,631 ???:__strcpy_sse2_unaligned [/usr/lib64/libc-2.17.so] 7,500,075 ???:pqResultStrdup [/usr/local/pgsql/lib/libpq.so.5.10] 7,500,000 ???:pqSkipnchar [/usr/local/pgsql/lib/libpq.so.5.10] 7,017,368 ???:__memcpy_ssse3_back [/usr/lib64/libc-2.17.so] 6,900,000 ???:PQgetisnull [/usr/local/pgsql/lib/libpq.so.5.10] 6,401,100 ???:free [/usr/lib64/libc-2.17.so] 6,200,004 ???:PQcopyResult [/usr/local/pgsql/lib/libpq.so.5.10] 6,100,959 ???:__strlen_sse2_pminub [/usr/lib64/libc-2.17.so] 5,700,000 ???:PQgetvalue [/usr/local/pgsql/lib/libpq.so.5.10] 4,700,045 ???:PQclear [/usr/local/pgsql/lib/libpq.so.5.10] 4,200,057 ???:PQmakeEmptyPGresult [/usr/local/pgsql/lib/libpq.so.5.10] 4,103,903 ???:PQgetResult [/usr/local/pgsql/lib/libpq.so.5.10] 3,400,000 ???:pqAddTuple [/usr/local/pgsql/lib/libpq.so.5.10] 3,203,437 ???:pqGetc [/usr/local/pgsql/lib/libpq.so.5.10] 2,600,034 ???:pqPrepareAsyncResult [/usr/local/pgsql/lib/libpq.so.5.10] 2,500,679 ???:appendBinaryPQExpBuffer [/usr/local/pgsql/lib/libpq.so.5.10] 2,300,621 ???:enlargePQExpBuffer [/usr/local/pgsql/lib/libpq.so.5.10] 1,600,016 ???:appendPQExpBufferStr [/usr/local/pgsql/lib/libpq.so.5.10] 900,270 ???:resetPQExpBuffer [/usr/local/pgsql/lib/libpq.so.5.10] Profile data file 'callgrind.out.15062' (creator: callgrind-3.11.0) I1 cache: D1 cache: LL cache: Timerange: Basic block 0 - 84068364 Trigger: Program termination Profiled target: ./test -m row (PID 15062, part 1) Events recorded: Ir Events shown: Ir Event sort order: Ir Thresholds: 99 Include dirs: User annotated: Auto-annotation: off Ir 358,525,458 PROGRAM TOTALS Ir file:function 61,410,901 ???:_int_malloc [/usr/lib64/libc-2.17.so] 38,321,887 ???:_int_free [/usr/lib64/libc-2.17.so] 31,400,139 ???:pqResultAlloc [/usr/local/pgsql/lib/libpq.so.5.10] 22,839,505 ???:pqParseInput3 [/usr/local/pgsql/lib/libpq.so.5.10] 17,600,004 ???:pqRowProcessor [/usr/local/pgsql/lib/libpq.so.5.10] 16,002,817 ???:malloc [/usr/lib64/libc-2.17.so] 14,716,359 ???:pqGetInt [/usr/local/pgsql/
Re: [HACKERS] [ patch ] pg_dump: new --custom-fetch-table and --custom-fetch-value parameters
On Sat, Feb 11, 2017 at 9:56 AM, Andrea Urbani wrote: > I'm a beginner here... anyway I try to share my ideas. > > My situation is changed in a worst state: I'm no more able to make a pg_dump > neither with my custom fetch value (I have tried "1" as value = one row at > the time) neither without the "--column-inserts": > > pg_dump: Dumping the contents of table "tDocumentsFiles" failed: > PQgetResult() failed. > pg_dump: Error message from server: ERROR: out of memory > DETAIL: Failed on request of size 1073741823. > pg_dump: The command was: COPY public."tDocumentsFiles" ("ID_Document", > "ID_File", "Name", "FileName", "Link", "Note", "Picture", "Content", > "FileSize", "FileDateTime", "DrugBox", "DrugPicture", "DrugInstructions") TO > stdout; > > I don't know if the Kyotaro Horiguchi patch will solve this, because, again, > I'm not able to get neither one single row. Yeah, if you can't fetch even one row, limiting the fetch size won't help. But why is that failing? A single 1GB allocation should be fine on most modern servers. I guess the fact that you're using a 32-bit build of PostgreSQL is probably a big part of the problem; there is probably only 2GB of available address space and you're trying to find a single, contiguous 1GB chunk. If you switch to using a 64-bit PostgreSQL things will probably get a lot better for you, unless the server's actual memory is also very small. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: About CMake v2
On 2/8/17 4:44 PM, Andres Freund wrote: >> Well, I find it a bit scary. If you do the big switch all at once, then >> you will have to dedicate the following 3 months to fixing complaints >> from developers and build farmers. > > That'll be the case just as well if we spread it out over two cycles, > except that we'll have it in multiple phases, and we run into the danger > of a half-done conversion. They key term is "dedicated". I don't think anyone wants to spend 3 months doing nothing but fixing the build system. If we spread it over several releases and bring more people up to speed along the way, that looks more manageable. >> That would work if there were a single entry point into the build >> system. But in practice there are many, and every one of them is >> someone's favorite. It's unlikely that we will be able to enumerate all >> of them during patch review. > > Not sure what you mean with "entry point"? People have all kinds of expectations on how the build system behaves. It's not just ./configure; make; make install. All kinds of niche and edge cases have evolved over the years. Variables you can set, overrides, targets in some subdirectories, building in subdirectories and having it build another subdirectory first, testing this way and testing that way, testing with a custom locale or a custom database name, building before testing or not, testing without reinstalling, and so on and so on. You'd have to make sure at least some of that continues to work or be able to explain it away. And most of it is not really documented. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Should we cacheline align PGXACT?
Alexander Korotkov wrote: > Yes, influence seems to be low. But nevertheless it's important to insure > that there is no regression here. > Despite pg_prewarm'ing and running tests 300s, there is still significant > variation. > For instance, with clients count = 80: > * pgxact-result-2.txt – 474704 > * pgxact-results.txt – 574844 > Could some background processes influence the tests? Or could it be > another virtual machine? > Also, I wonder why I can't see this variation on the graphs. > Another issue with graphs is that we can't see details of read and write > TPS variation on the same scale, because write TPS values are too low. I > think you should draw write benchmark on the separate graph. So, I'm reading that on PPC64 there is no effect, and on the "lesser" machine Tomas tested on, there is no effect either; this patch only seems to benefit Alexander's 72 core x86_64 machine. It seems to me that Andres comments here were largely ignored: https://www.postgresql.org/message-id/20160822021747.u5bqx2xwwjzac...@alap3.anarazel.de He was suggesting to increase the struct size to 16 bytes rather than going all the way up to 128. Did anybody test this? Re the coding of the padding computation, seems it'd be better to use our standard "offsetof(last-struct-member) + sizeof(last-struct-member)" rather than adding each of the members' sizes individually. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] removing tsearch2
On Fri, Feb 10, 2017 at 1:41 PM, Robert Haas wrote: > Anyway, it seems like the consensus here is unanimous. Unless there > are a LARGE number of contrary votes in the meanwhile, I'll go make > this happen sometime next week. Done. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [WIP]Vertical Clustered Index (columnar store extension)
Hi, I wonder if it is possible to somehow benchmark your clustered index implementation. I tried to create VCI index for lineitem table from TPC and run Q6 query. After index creation Postgres is not using parallel execution plan any more but speed of sequential plan is not changed and nothing in query execution plan indicates that VCI index is used: postgres=# explain select sum(l_extendedprice*l_discount) as revenue from lineitem_projection where l_shipdate between '1996-01-01' and '1997-01-01' and l_discount between 0.08 and 0.1 and l_quantity < 24; QUERY PLAN --- - Finalize Aggregate (cost=608333.85..608333.86 rows=1 width=4) -> Gather (cost=608333.23..608333.84 rows=6 width=4) Workers Planned: 6 -> Partial Aggregate (cost=607333.23..607333.24 rows=1 width=4) -> Parallel Seq Scan on lineitem_projection (cost=0.00..607024.83 rows=61680 width=8) Filter: ((l_shipdate >= '1996-01-01'::date) AND (l_shipdate <= '1997-01-01'::date) AND (l_discount >= '0.08'::double precision) AN D (l_discount <= '0.1'::double precision) AND (l_quantity < '24'::double precision)) (6 rows) postgres=# select sum(l_extendedprice*l_discount) as revenue from lineitem_projection where l_shipdate between '1996-01-01' and '1997-01-01' and l_discount between 0.08 and 0.1 and l_quantity < 24; revenue - 6.2e+08 (1 row) Time: 1171.324 ms (00:01.171) postgres=# create index vci_idx on lineitem_projection using vci(l_shipdate,l_quantity,l_extendedprice,l_discount,l_tax,l_returnflag,l_linestatus); CREATE INDEX Time: 4.705 ms postgres=# explain select * from lineitem_projection where l_shipdate between '1996-01-01' and '1997-01-01' and l_discount between 0.08 and 0.1 and l_quantity < 24; QUERY PLAN --- --- Seq Scan on lineitem_projection (cost=0.00..382077.00 rows=1 width=22) Filter: ((l_shipdate >= '1996-01-01'::date) AND (l_shipdate <= '1997-01-01'::date) AND (l_discount >= '0.08'::double precision) AND (l_discount <= ' 0.1'::double precision) AND (l_quantity < '24'::double precision)) (2 rows) postgres=# select sum(l_extendedprice*l_discount) as revenue from lineitem_projection where l_shipdate between '1996-01-01' and '1997-01-01' and l_discount between 0.08 and 0.1 and l_quantity < 24; revenue 6.2112e+08 (1 row) Time: 4304.355 ms (00:04.304) I wonder if there is any query which can demonstrate advantages of using VCI index? On 06.02.2017 04:26, Haribabu Kommi wrote: On Fri, Feb 3, 2017 at 8:28 PM, Konstantin Knizhnik mailto:k.knizh...@postgrespro.ru>> wrote: On 30.12.2016 06:55, Haribabu Kommi wrote: Hi All, Fujitsu was interested in developing a columnar storage extension with minimal changes the server backend. We in PostgresPRO are also very interested in developing vertical storage (VS) for Postgres. And after considering many alternatives, we came to the conclusion that approach based on representing columnar store as access method (index) is the most promising one. It allows to: 1. Implement VS as extension without affecting Postgres core. 2. Have both ROS and WOS. 3. Create multiple projections (as in Vertica). 4. Optimize insert speed by support batch inserts and use flexible recovery model for VS. So it is very similar with your approach. But there are few differences: 1. Our intention is to completely eliminate changes in Postgres core. You wrote: Yes, it is a mix of both index and table access methods. The current design of Vertical clustered index needs both access methods, because of this reason we used both access methods. But I still do not completely understand why it is not possible to use VS in index only scans without any changes and standard Postgres executor? Why it is not possible to rely on standard rules of applying indexes in Postgres optimizer based on costs provided by our AM implementation? In our storage design, we used TID-CRID map to identify a record in heap to columnar storage. Because of HOT update, the new data will not be inserted into indexes, but this will give problem to the columnar storage, so we added a hook to insert index data even if the update is HOT. And also we added another hook for initializing the parameters during the execution. Most of the other added hooks can be replaced with existing
Re: [HACKERS] DROP SUBSCRIPTION and ROLLBACK
On Mon, Feb 13, 2017 at 4:05 PM, Masahiko Sawada wrote: > On Sat, Feb 11, 2017 at 8:18 PM, Petr Jelinek > wrote: >> On 10/02/17 19:55, Masahiko Sawada wrote: >>> On Thu, Feb 9, 2017 at 12:44 AM, Petr Jelinek >>> wrote: On 08/02/17 07:40, Masahiko Sawada wrote: > On Wed, Feb 8, 2017 at 9:01 AM, Michael Paquier > wrote: >> On Wed, Feb 8, 2017 at 1:30 AM, Fujii Masao >> wrote: >>> On Wed, Feb 8, 2017 at 12:26 AM, Petr Jelinek >>> wrote: For example what happens if apply crashes during the DROP SUBSCRIPTION/COMMIT and is not started because the delete from catalog is now visible so the subscription is no longer there? >>> >>> Another idea is to treat DROP SUBSCRIPTION in the same way as VACUUM, >>> i.e., >>> make it emit an error if it's executed within user's transaction block. >> >> It seems to me that this is exactly Petr's point: using >> PreventTransactionChain() to prevent things to happen. > > Agreed. It's better to prevent to be executed inside user transaction > block. And I understood there is too many failure scenarios we need to > handle. > >>> Also DROP SUBSCRIPTION should call CommitTransactionCommand() just >>> after removing the entry from pg_subscription, then connect to the >>> publisher >>> and remove the replication slot. >> >> For consistency that may be important. > > Agreed. > > Attached patch, please give me feedback. > This looks good (and similar to what initial patch had btw). Works fine for me as well. Remaining issue is, what to do about CREATE SUBSCRIPTION then, there are similar failure scenarios there, should we prevent it from running inside transaction as well? >>> >>> Hmm, after thought I suspect current discussing approach. For >>> example, please image the case where CRAETE SUBSCRIPTION creates >>> subscription successfully but fails to create replication slot for >>> whatever reason, and then DROP SUBSCRIPTION drops the subscription but >>> dropping replication slot is failed. In such case, CREAET SUBSCRIPTION >>> and DROP SUBSCRIPTION return ERROR but the subscription is created and >>> dropped successfully. I think that this behaviour confuse the user. >>> >>> I think we should just prevent calling DROP SUBSCRIPTION in user's >>> transaction block. Or I guess that it could be better to separate the >>> starting/stopping logical replication from subscription management. >>> >> >> We need to stop the replication worker(s) in order to be able to drop >> the slot. There is no such issue with startup of the worker as that one >> is launched by launcher after the transaction has committed. >> >> IMO best option is to just don't allow DROP/CREATE SUBSCRIPTION inside a >> transaction block and don't do any commits inside of those (so that >> there are no rollbacks, which solves your initial issue I believe). That >> way failure to create/drop slot will result in subscription not being >> created/dropped which is what we want. On second thought, +1. > I basically agree this option, but why do we need to change CREATE > SUBSCRIPTION as well? Because the window between the creation of replication slot and the transaction commit of CREATE SUBSCRIPTION should be short. Otherwise, if any error happens during that window, the replication slot unexpectedly remains while there is no corresponding subscription. Of course, even If we prevent CREATE SUBSCRIPTION from being executed within user's transaction block, there is still such window. But we can reduce the possibility of that problem. Regards, -- Fujii Masao -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Should we cacheline align PGXACT?
Am Montag, den 13.02.2017, 16:55 +0300 schrieb Alexander Korotkov: > > Thank you for testing. > > Yes, influence seems to be low. But nevertheless it's important to > insure > that there is no regression here. > Despite pg_prewarm'ing and running tests 300s, there is still > significant > variation. > For instance, with clients count = 80: > * pgxact-result-2.txt – 474704 > * pgxact-results.txt – 574844 > Could some background processes influence the tests? Or could it be > another virtual machine? > Also, I wonder why I can't see this variation on the graphs. > Another issue with graphs is that we can't see details of read and > write Whoops, good catch. I've mistakenly copied the wrong y-axis for these results in the gnuplot script, shame on me. New plots attached. You're right, the 2nd run with the pgxact alignment patch is notable. I've realized that there was a pgbouncer instance running from a previous test, but not sure if that could explain the difference. > TPS variation on the same scale, because write TPS values are too > low. I > think you should draw write benchmark on the separate graph. > The Linux LPAR is the only one used atm. We got some more time for Linux now and i'm going to prepare Tomas' script to run. Not sure i can get to it today, though. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Sum aggregate calculation for single precsion real
Konstantin Knizhnik writes: > I wonder why SUM aggregate is calculated for real (float4) type using > floating point accumulator? If you can't deal with the vagaries of floating-point arithmetic, you shouldn't be storing your data in float format. Use numeric. > Are there are reasons of using float4pl function for SUM aggregate instead of > float4_accum? The latter is probably a good two orders of magnitude slower, and it wouldn't really do much to solve the inherent accuracy problems of adding float4 values that have a wide dynamic range. The expectation for SUM(float4) is that you want speed and are prepared to cope with the consequences. It's easy enough to cast your input to float8 if you want a wider accumulator, or to numeric if you'd like more stable (not necessarily more accurate :-() results. I do not think it's the database's job to make those choices for you. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] \if, \elseif, \else, \endif (was Re: PSQL commands: \quit_if, \quit_unless)
> > ISTM that it is kind of a regression, because logically this is about the > scan state so it should be in the corresponding structure, and having two > structures to pass the scan state is not an improvement... I wasn't too happy with the extra param, either. Having moved the guts to conditional.c, I'm happy with that change and can move the stack head back to scan_state with a clear conscience. > I would suggest to also apply the advice to the example shown, including a > comment about why the variable is set on. > +1 > > Also, the last element of the tap tests should be distinct: I suggest to > use 'if syntax error' and 'elif syntax error' in place of 'syntax error' > for the two first tests. +1, shouldn't take too long.
Re: \if, \elseif, \else, \endif (was Re: [HACKERS] PSQL commands: \quit_if, \quit_unless)
On Sat, Feb 11, 2017 at 2:43 AM, Fabien COELHO wrote: >> Ok, so that's not just PROMPT_READY, that's every prompt...which might be >> ok. ? is a great optional cue, and you're thinking on 2 levels deep, 2nd >> level always being '.'? > > Yep. The idea is to keep it short, but to still have something to say "there > are more levels" in the stack, hence the one dot. Basically I just > compressed your 4 level proposal, and added a separator to deal with the > preceding stuff and suggest the conditional. I think we should try to make this REALLY simple. We don't really want to have everybody have to change their PROMPT1 and PROMPT2 strings for this one feature. How about just introducing a new value for %R? The documentation currently says: In prompt 1 normally =, but ^ if in single-line mode, or ! if the session is disconnected from the database (which can happen if \connect fails). ...and suppose we just extend that to add: , or @ if commands are currently being ignored because of the result of an \if test. The latter would include being in the \if section when the conditional was true as well as being in the \else section when the conditional was false. I think that's all you need here: a way to alert users as to whether commands are being ignored, or not. Putting details in about precisely why they are being ignored seems like it's too complicated; people won't remember how to decode some bizarre series of glyphs that we output. Telling them whether their next command is set to be ignored or executed is good enough; if the answer isn't what they expect, they can debug their script to figure out what they screwed up. Also, keep in mind that people don't need to know everything from the current prompt. They can try to debug things by looking back at previous prompts. They'll understand that \if is going to introduce a new nesting level and \endif is going to end one, and that \else and \elseif may change things. Aside from keeping the code simple so we can maintain it and the output simple so that users can remember what it means, I just don't believe that it's really going to be helpful to convey much detail here. People aren't going to paste in a gigaton of commands and then look only at the last line of the output and try to understand what it's telling them, or if they do that and are confused, I think nobody will really feel bad about giving them the advice "scroll up" or "try a simpler test case first". Further keep in mind that eventually somebody's going to code \while or \for or something, and then there are going to be even more possible states here. Just when you've figured out what tfzffft means, they'll be the case of a \while loop which is getting skipped because the condition at the top turned out to be false on the first iteration, or where (half-joking) we're skipping commands until we find the label that matches an executed \goto. Writing maintainable code includes leaving room open for other people to do stuff we can't even foresee today, and that means we need not to use up a disproportionate number of the glyphs that can reasonably be used in a psql prompt just on this. This is one small feature out of many that psql has, and one small hint to the user about whether it's currently causing commands to be skipped seems sufficient. All IMHO, of course. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] amcheck (B-Tree integrity checking tool)
On Thu, Feb 9, 2017 at 8:15 PM, Peter Geoghegan wrote: > BTW, aren't there cases where a PARALLEL SAFE function that needs to > acquire locks on some arbitrary relation not referenced in the query > can get locks on its own, which may only last as long as the parallel > worker's lifetime? This could happen *despite* the fact that the > author of the function may have imagined that callers could not > release relation level locks early (i.e., by not releasing them > directly, and so correctly following this "badly documented > assumption"). Yes. > It seems like the existence of PARALLEL RESTRICTED is primarily > motivated by making stuff that definitely needs the locks to last > until xact end being sure that that happens -- the docs say so. This > seems wholly inconsistent with the idea that you're not supposed to > let that happen under any circumstances. I find all this highly > confusing. Have I missed some further subtlety that applies with > PARALLEL RESTRICTED? That's by no means the only thing that could cause you to mark something PARALLEL RESTRICTED. I think you might have missed this bit of the documentation, a few paragraphs up: Similarly, functions must be marked PARALLEL RESTRICTED if they access temporary tables, client connection state, cursors, prepared statements, or miscellaneous backend-local state which the system cannot synchronize across workers. For example, setseed and random are parallel restricted for this last reason. That stuff is the main motivation. The early lock release stuff could come up for user-written functions, but the things listed in the paragraph I've quoted here come up for plenty of built-in functions. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] IF NOT EXISTS option for CREATE SERVER and CREATE USER MAPPING statements
On 01/13/2017 08:36 AM, Anastasia Lubennikova wrote: > I implemented IF NOT EXISTS option for CREATE SERVER and CREATE USER > MAPPING statements > for one of our customers. > I think other users can also find it useful for scripting and > automated tasks. > The patches themselves are small and simple. Documentation is updated > as well. > This looks good and useful. Please add some regression tests. cheers andrew -- Andrew Dunstanhttps://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Re: [COMMITTERS] pgsql: Remove all references to "xlog" from SQL-callable functions in p
On Fri, Feb 10, 2017 at 5:36 AM, Robert Haas wrote: > Remove all references to "xlog" from SQL-callable functions in pg_proc. > > Commit f82ec32ac30ae7e3ec7c84067192535b2ff8ec0e renamed the pg_xlog > directory to pg_wal. To make things consistent, and because "xlog" is > terrible terminology for either "transaction log" or "write-ahead log" > rename all SQL-callable functions that contain "xlog" in the name to > instead contain "wal". (Note that this may pose an upgrade hazard for > some users.) There are still some mentions to "xlog" in the doc. Maybe we should replace them with "wal" as the attached patch does. Regards, -- Fujii Masao xlog2wal.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Sum aggregate calculation for single precsion real
On 13.02.2017 19:20, Tom Lane wrote: Konstantin Knizhnik writes: I wonder why SUM aggregate is calculated for real (float4) type using floating point accumulator? If you can't deal with the vagaries of floating-point arithmetic, you shouldn't be storing your data in float format. Use numeric. 4-byte floats are widely used for example in trading applications just because it is two times shorter then double and range of stored data is relatively small (do not need a lot of significant digits). At the same time volume of stored data is very large and switching from float4 to float8 will almost double it. It requires two times more storage and almost two times increase query execution time. So this is not acceptable answer. Are there are reasons of using float4pl function for SUM aggregate instead of float4_accum? The latter is probably a good two orders of magnitude slower, and it wouldn't really do much to solve the inherent accuracy problems of adding float4 values that have a wide dynamic range. It is not true - please notice query execution time of this two queries: postgres=# select sum(l_quantity) from lineitem where l_shipdate <= '1998-12-01'; sum - 1.52688e+09 (1 row) Time: 2858.852 ms postgres=# select sum(l_quantity+0.0) from lineitem where l_shipdate <= '1998-12-01'; sum 1529738036 (1 row) Time: 3174.529 ms Looks like now in Postgres aggregate calculation itself is not a bottleneck, comparing with tuple deform cost. The expectation for SUM(float4) is that you want speed and are prepared to cope with the consequences. It's easy enough to cast your input to float8 if you want a wider accumulator, or to numeric if you'd like more stable (not necessarily more accurate :-() results. I do not think it's the database's job to make those choices for you. From my point of your it is strange and wrong expectation. I am choosing "float4" type for a column just because it is enough to represent range of data I have and I need to minimize size of record. But when I am calculating sum, I expect to receive more or less precise result. Certainly I realize that even in case of using double it is possible to loose precision while calculation and result may depend on sum order (if we add very small and very larger values). But in real use cases (for example in trading data) such large difference in attribute values is very rare. If you have, for example, stock price, then it is very unlikely that one company has value 0.01 and another 1000.0 At least in TPC-H example (which certainly deal with dummy generated data), double type produce "almost price" result. regards, tom lane -- Konstantin Knizhnik Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: [HACKERS] amcheck (B-Tree integrity checking tool)
On Thu, Feb 9, 2017 at 5:32 PM, Andres Freund wrote: >> > Again, some parts of the code doing something bad isn't a good argument >> > for doing again. Releasing locks early is a bad pattern, because backend >> > code isn't generally safe against it, we have special code-paths for >> > catalog tables to support it for those. >> >> I don't agree with that. The reason we keep relation locks until the >> end of the transaction is to avoid surprising application code, not >> because the system can't tolerate it. But here it's more likely that >> retaining the lock will surprise the user. > > It's both. A bunch of code paths rely on early release ony happening on > catalogs. E.g., and that's just the first thing that comes to mind, > cache invalidations which really only are guaranteed to be delivered and > visibile for other cache accesses, if the lock is only released after > the end of the transaction. Object accesses like relation_open() accept > invalidations *after* acquiring the lock. Combined with lock releases > only happening *after* transaction commits that guarantees an up2date > view of the cache; but if others release the lock earlier and have cache > invalidations, that doesn't work anymore. Now you can argue that it's > possibly harmless here because there's presumably no way this can ever > trigger cache invals - and you're probably right - but this isn't the > only place with such assumption and you'd definitely have to argue *why* > it's right. I agree that code which issues cache invalidations needs to hold a blocking lock until commit IF it's critical for other backends to see the update. That's not always the case; for example, if an ALTER TABLE statement only changes the fillfactor the worst thing that happens if some other backend fails to see the invalidations is that some transactions continue to use the old value for a while. That's why we now allow fillfactor and some other things using only ShareUpdateExclusiveLock. That's not quite the same thing but the effect is the same: we issue invalidations that other backends aren't guaranteed to see in a timely fashion, and it works OK. Yeah, some other session might not see the change for a while, but we don't view that as a big problem. But I think in this case that argument is basically a straw man. A utility that's called amcheck is obviously not issuing any cache invalidations. It's probably not even changing anything at all; if it is, maybe it should be called amrepair. So the fact that CLUSTER has to hold AEL until commit time really doesn't have anything to do with whether amcheck needs to hold AES until commit time. I don't really understand your demand for "an argument why it's right". My argument is simple: I can't think of a single thing that it would break, and I don't believe there is any such thing. Furthermore, as Peter points out, we do this kind of thing all the time in other places and it indeed does not break things. I think you're trying to invent a coding rule that doesn't exist, but I can't prove a negative. Your argument so far is that "yeah, well, maybe inval doesn't emit sinval traffic (duh?) but it might do something else that breaks, prove to me that there is no such thing". But that's just like asking me to prove that it's impossible to exceed the speed of light. On information and belief, nobody currently knows how to do that and there is good scientific evidence that it cannot be done, but there's no way to prove that someone won't in the future discover a refinement in the physical laws of the universe as we understand them today which sheds a different light on the situation. >From my point of view, it's likely to be common to want to run amcheck in series on a bunch of indexes, like by writing a SELECT query against pg_class or pg_index and passing the OIDs to amcheck. With the retain-locks design, that query accumulates locks on a large number of objects, possibly blowing out the lock table or blocking DDL. I think that readily-forseeable problem ought to trump concerns about purely hypothetical issues caused by releasing locks early. If you can come up with an actual issue with releasing locks early that applies to this particular case, then that's different, of course. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Should we cacheline align PGXACT?
Hi All, I too have performed benchmarking of this patch on a large machine (with 128 CPU(s), 520GB RAM, intel x86-64 architecture) and would like to share my observations for the same (Please note that, as I had to reverify readings on few client counts, it did take some time for me to share these test-results.) Case1: Data fits in shared buffer, Read Only workload: --- For data fits in shared buffer, I have taken readings at 300 SF. The result sheet 'results-readonly-300-1000-SF' containing the median of 3 runs is attached with this mail. In this case, I could see very good performance improvement with the patch basically at high client counts (156 - 256). Case2: Data doesn't fit in shared buffer, Read Only workload: -- For data doesn't fit in shared buffer, I have taken readings at 1000 SF. The result sheet 'results-readonly-300-1000-SF' is attached with this mail. In this case, the performance improvement is not as in Case1, Infact it just varies in the range of 2-7%. but the good thing is that there is no regression. Case3: Data fits in shared buffer, Read-write workload: - In this case, I could see that the tps on head and patch are very close to each other with a small variation of (+-)3-4% which i assume is a run-to-run variation. PFA result sheet 'results-readwrite-300-1000-SF' containing the test-results. Case4: Data doesn't fit in shared buffer, Read-write workload: In this case as well, tps on head and patch are very close to each other with small variation of 1-2% which again is a run-to-run variation. PFA result sheet 'results-readwrite-300-1000-SF' containing the test-results. Please note that details on the non-default guc params and pgbench is all provided in the result sheets attached with this mail. Also, I have not used pg_prewarm in my test script. Thank you. On Mon, Feb 13, 2017 at 9:43 PM, Bernd Helmle wrote: > Am Montag, den 13.02.2017, 16:55 +0300 schrieb Alexander Korotkov: >> >> Thank you for testing. >> >> Yes, influence seems to be low. But nevertheless it's important to >> insure >> that there is no regression here. >> Despite pg_prewarm'ing and running tests 300s, there is still >> significant >> variation. >> For instance, with clients count = 80: >> * pgxact-result-2.txt – 474704 >> * pgxact-results.txt – 574844 > >> Could some background processes influence the tests? Or could it be >> another virtual machine? >> Also, I wonder why I can't see this variation on the graphs. >> Another issue with graphs is that we can't see details of read and >> write > > Whoops, good catch. I've mistakenly copied the wrong y-axis for these > results in the gnuplot script, shame on me. New plots attached. > > You're right, the 2nd run with the pgxact alignment patch is notable. > I've realized that there was a pgbouncer instance running from a > previous test, but not sure if that could explain the difference. > >> TPS variation on the same scale, because write TPS values are too >> low. I >> think you should draw write benchmark on the separate graph. >> > > The Linux LPAR is the only one used atm. We got some more time for > Linux now and i'm going to prepare Tomas' script to run. Not sure i can > get to it today, though. > > > > > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers > results-readonly-300-1000-SF.xlsx Description: MS-Excel 2007 spreadsheet results-readwrite-300-1000-SF.xlsx Description: MS-Excel 2007 spreadsheet -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] amcheck (B-Tree integrity checking tool)
On Fri, Feb 10, 2017 at 8:39 PM, Peter Geoghegan wrote: > On Thu, Feb 9, 2017 at 2:32 PM, Andres Freund wrote: >> Except that the proposed names aren't remotely like that... ;). > > Revision attached -- V5. We now REVOKE ALL on both functions, as > Robert suggested, instead of the previous approach of having a > hard-coded superuser check with enforcement. > >> And I proposed documenting named parameters, and >> check_btree(performing_check_requiring_exclusive_locks => true) is just >> about as expressive. > > I have not done this, nor have I renamed the functions. I still think > that this is something that we can fix by adding a boolean argument to > each function in the future, or something along those lines. I > *really* hate the idea of having one function with non-obvious, > variable requirements on locking, with locking implications that are > not knowable when we PREPARE an SQL statement calling the function. It > also removes a useful way of have superusers discriminate against the > stronger locking variant bt_index_parent_check() by not granting > execute on it (as an anti-footgun measure). I think Andres is more or less correct that "performing_check_requiring_exclusive_locks => true" is just about as expressive as calling a different function, but I think that your point that the superuser might want to grant access to one function but not the other is a good one. On the other hand, I think Andres has a concern that we might have more modes in the future and we don't want to end up with 2^n entrypoints. That also seems valid. Hmm. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Provide list of subscriptions and publications in psql's completion
On Fri, Feb 10, 2017 at 1:52 PM, Michael Paquier wrote: > On Tue, Feb 7, 2017 at 6:35 AM, Peter Eisentraut > wrote: >> On 2/6/17 10:54 AM, Fujii Masao wrote: >>> Yes, that's an option. And, if we add dbid to pg_stat_subscription, >>> I'm tempted to add all the pg_subscription's columns except subconninfo >>> into pg_stat_subscription. Since subconninfo may contain security-sensitive >>> information, it should be excluded. But it seems useful to expose other >>> columns. Then, if we expose all the columns except subconninfo, maybe >>> it's better to just revoke subconninfo column on pg_subscription instead of >>> all columns. Thought? >> >> I think previous practice is to provide a view such as pg_subscriptions >> that contains all the nonprivileged information. > > OK, I think that I see the point you are coming at: > pg_stat_get_subscription (or stat tables) should not be used for > psql's tab completion. So gathering all things discussed, we have: > - No tab completion for publications. No, the tab-completions for ALTER/DROP PUBLICATION should show the local publications because those commands drop and alter the local ones. OTOH, CREATE/ALTER SUBSCRIPTOIN ... PUBLICATION should show nothing because the remote publications in the publisher side should be specified there. > - Fix the meta-commands. Yes. > - Addition of a view pg_subscriptions with all the non-sensitive data. > (- Or really extend pg_stat_subscriptions with the database ID and use > it for tab completion?) Probably I failed to get Peter's point... Anyway IMO that we can expose all the columns except the sensitive information (i.e., subconninfo field) in pg_subscription to even non-superusers. Then we can use pg_subscription for the tab-completion for ALTER/DROP SUBSCRIPTION. Regards, -- Fujii Masao -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] log_autovacuum_min_duration doesn't log VACUUMs
On Sun, Feb 12, 2017 at 9:20 PM, Jim Nasby wrote: > On 2/10/17 2:33 PM, Robert Haas wrote: >> That having been said, I think it could certainly be useful to have >> more control over what DDL gets logged in foreground processes. > > FWIW, this is a significant problem outside of DDL. Once you're past 1-2 > levels of nesting SET client_min_messages = DEBUG becomes completely > useless. > > I think the ability to filter logging based on context would be very > valuable. AFAIK you could actually do that for manual logging with existing > plpgsql support, but obviously that won't help for anything else. Well, that's moving the goalposts a lot further and in an unrelated direction. I don't think that it's a good idea to change the semantics of log_autovacuum_min_duration in the way Simon is proposing for the reasons I noted, but I think that an acceptable patch could be 100 lines of pretty straightforward code and documentation, like a new GUC that controls this output for the vac-non-autovac case. Fine-grained control over which commands get logged in general is harder, but it's still only a moderately complex patch (I think). Filtering based on the context in which the logging is happening sounds extremely complicated; I'm not sure what sort of design for that would even be reasonable and it seems like there might be innumerable requests for additional frammishes culminating in some sort of mini-language and getting committed right around the time Zefram Cochrane makes his inaugural flight. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Re: [COMMITTERS] pgsql: Remove all references to "xlog" from SQL-callable functions in p
On Mon, Feb 13, 2017 at 11:43 AM, Fujii Masao wrote: > On Fri, Feb 10, 2017 at 5:36 AM, Robert Haas wrote: >> Remove all references to "xlog" from SQL-callable functions in pg_proc. >> >> Commit f82ec32ac30ae7e3ec7c84067192535b2ff8ec0e renamed the pg_xlog >> directory to pg_wal. To make things consistent, and because "xlog" is >> terrible terminology for either "transaction log" or "write-ahead log" >> rename all SQL-callable functions that contain "xlog" in the name to >> instead contain "wal". (Note that this may pose an upgrade hazard for >> some users.) > > There are still some mentions to "xlog" in the doc. Maybe we should replace > them with "wal" as the attached patch does. +1. Patch looks good to me. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] COPY IN/BOTH vs. extended query mode
On Mon, Jan 23, 2017 at 9:12 PM, Robert Haas wrote: > According to the documentation for COPY IN mode, "If the COPY command > was issued via an extended-query message, the backend will now discard > frontend messages until a Sync message is received, then it will issue > ReadyForQuery and return to normal processing." I added a similar > note to the documentation for COPY BOTH mode in > 91fa8532f4053468acc08534a6aac516ccde47b7, and the documentation > accurately describes the behavior of the server. However, this seems > to make fully correct error handling for clients using libpq almost > impossible, because PQsendQueryGuts() sends > Parse-Bind-Describe-Execute-Sync in one shot without regard to whether > the command that was just sent invoked COPY mode (cf. the note in > CopyGetData about why we ignore Flush and Sync in that function). > > So imagine that the client uses libpq to send (via the extended query > protocol) a COPY IN command (or some hypothetical command that starts > COPY BOTH mode to begin). If the server throws an error before the > Sync message is consumed, it will bounce back to PostgresMain which > will set doing_extended_query_message = true after which it will > consume messages, find the Sync, reset that flag, and send > ReadyForQuery. On the other hand, if the server enters CopyBoth mode, > consumes the Sync message in CopyGetData (or a similar function), and > *then* throws an ERROR, the server will wait for a second Sync message > from the client before issuing ReadyForQuery. There is no sensible > way of coping with this problem in libpq, because there is no way for > the client to know which part of the server code consumed the Sync > message that it already sent. In short, from the client's point of > view, if it enters COPY IN or COPY BOTH mode via the extend query > protocol, and an error occurs on the server, the server MAY OR MAY NOT > expect a further Sync message before issuing ReadyForQuery, and the > client has no way of knowing -- except maybe waiting for a while to > see what happens. > > It does not appear to me that there is any good solution to this > problem. Fixing it on the server side would require a wire protocol > change - e.g. one kind of Sync message that is used in a > Parse-Bind-Describe-Execute-Sync sequence that only terminates > non-COPY commands and another kind that is used to signal the end even > of COPY. Fixing it on the client side would require all clients to > know prior to initiating an extended-query-protocol sequence whether > or not the command was going to initiate COPY, which is an awful API > even if didn't constitute an impossible-to-contemplate backward > compatibility break. Perhaps we will have to be content to document > the fact that this part of the protocol is depressingly broken... > > ...unless of course somebody can see something that I'm missing here > and the situation isn't as bad as it currently appears to me to be. Anybody have any thoughts on this? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Re: [COMMITTERS] pgsql: Remove all references to "xlog" from SQL-callable functions in p
On Tue, Feb 14, 2017 at 2:06 AM, Robert Haas wrote: > On Mon, Feb 13, 2017 at 11:43 AM, Fujii Masao wrote: >> On Fri, Feb 10, 2017 at 5:36 AM, Robert Haas wrote: >>> Remove all references to "xlog" from SQL-callable functions in pg_proc. >>> >>> Commit f82ec32ac30ae7e3ec7c84067192535b2ff8ec0e renamed the pg_xlog >>> directory to pg_wal. To make things consistent, and because "xlog" is >>> terrible terminology for either "transaction log" or "write-ahead log" >>> rename all SQL-callable functions that contain "xlog" in the name to >>> instead contain "wal". (Note that this may pose an upgrade hazard for >>> some users.) >> >> There are still some mentions to "xlog" in the doc. Maybe we should replace >> them with "wal" as the attached patch does. > > +1. Patch looks good to me. Thanks! Pushed. Regards, -- Fujii Masao -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] \if, \elseif, \else, \endif (was Re: PSQL commands: \quit_if, \quit_unless)
On Mon, Feb 13, 2017 at 11:26 AM, Corey Huinker wrote: > ISTM that it is kind of a regression, because logically this is about the >> scan state so it should be in the corresponding structure, and having two >> structures to pass the scan state is not an improvement... > > > I wasn't too happy with the extra param, either. Having moved the guts to > conditional.c, I'm happy with that change and can move the stack head back > to scan_state with a clear conscience. > So moving the conditional stack back into PsqlScanState has some side effects: conditional.[ch] have to move to the fe_utils/ dirs, and now pgbench, which does not use conditionals, would have to link to them. Is that a small price to pay for modularity and easier-to-find code? Or should I just tuck it back into psqlscan_int.[ch]?
Re: [HACKERS] WIP: About CMake v2
On Mon, Feb 13, 2017 at 10:52 AM, Peter Eisentraut wrote: > People have all kinds of expectations on how the build system behaves. > It's not just ./configure; make; make install. All kinds of niche and > edge cases have evolved over the years. Variables you can set, > overrides, targets in some subdirectories, building in subdirectories > and having it build another subdirectory first, testing this way and > testing that way, testing with a custom locale or a custom database > name, building before testing or not, testing without reinstalling, and > so on and so on. You'd have to make sure at least some of that > continues to work or be able to explain it away. And most of it is not > really documented. ...which is another argument for just not changing anything. I mean, again, the current Windows build system is unbeautiful and occasionally takes some work, but if we switch to cmake, that has to get enough better to make up for all of the things that get worse. And it's far from obvious than this will be so, especially when you consider the fact that many people have detailed knowledge of how to tweak the current system to do what they want. As you point out here, a lot of that stuff may stop working if we replace the system wholesale. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Removal of deprecated views pg_user, pg_group, pg_shadow
On Thu, Feb 9, 2017 at 8:54 PM, Stephen Frost wrote: > Note that these views have not been consistently maintained and have > ended up including some role attributes from recent versions That's not a bug. According to the documentation, these views exist for compatibility with PostgreSQL versions before 8.1, so there's no need to update them with newer fields. Clients who are expecting to talk with a pre-8.1 PostgreSQL won't expect those fields to be present anyway. My big objection to removing these views is that it will break pgAdmin 3, which uses all three of these views. I understand that the pgAdmin community is now moving away from pgAdmin 3 and toward pgAdmin 4, but I bet that pgAdmin 3 still has significant usage and will continue to have significant usage for at least a year or three. When it's thoroughly dead, then I think we can go ahead and do this, unless there are other really important tools still depending on those views, but it's only been 3 months since the final pgAdmin 3 release. IMHO, these views aren't costing us much. It'd be nice to get rid of them eventually but a view definition doesn't really need much maintenance. (A contrib module doesn't either, but more than a view definition.) -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] LWLock optimization for multicore Power machines
On 02/13/2017 03:16 PM, Bernd Helmle wrote: Am Samstag, den 11.02.2017, 15:42 +0300 schrieb Alexander Korotkov: Thus, I see reasons why in your tests absolute results are lower than in my previous tests. 1. You use 28 physical cores while I was using 32 physical cores. 2. You run tests in PowerVM while I was running test on bare metal. PowerVM could have some overhead. 3. I guess you run pgbench on the same machine. While in my tests pgbench was running on another node of IBM E880. Yeah, pgbench was running locally. Maybe we can get some resources to run them remotely. Interesting side note: If you run a second postgres instance with the same pgbench in parallel, you get nearly the same transaction throughput as a single instance. Short side note: If you run two Postgres instances concurrently with the same pgbench parameters, you get nearly the same transaction throughput for both instances each as when running against a single instance, e.g. That strongly suggests you're hitting some kind of lock. It'd be good to know which one. I see you're doing "pgbench -S" which also updates branches and other tiny tables - it's possible the sessions are trying to update the same row in those tiny tables. You're running with scale 1000, but with 100 it's still possible thanks to the birthday paradox. Otherwise it might be interesting to look at sampling wait events, which might tell us more about the locks. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] \if, \elseif, \else, \endif (was Re: PSQL commands: \quit_if, \quit_unless)
> > I wasn't too happy with the extra param, either. Having moved the guts to >> conditional.c, I'm happy with that change and can move the stack head back >> to scan_state with a clear conscience. >> > > That effort was creating as many headaches as it solved. Rolled back v12 except for doc paragraph, added more descriptive tap test names. Enjoy. diff --git a/doc/src/sgml/ref/psql-ref.sgml b/doc/src/sgml/ref/psql-ref.sgml index ae58708..e86b66b 100644 --- a/doc/src/sgml/ref/psql-ref.sgml +++ b/doc/src/sgml/ref/psql-ref.sgml @@ -2035,6 +2035,79 @@ hello 10 + +\if expr +\elif expr +\else +\endif + + +This group of commands implements nestable conditional blocks, like +this: + + +SELECT +EXISTS(SELECT 1 FROM customer WHERE customer_id = 123) as is_customer, +EXISTS(SELECT 1 FROM employee WHERE employee_id = 456) as is_employee +\gset +-- set ON_ERROR_STOP in case the variables are not valid boolean expressions +\set ON_ERROR_STOP on +\if :is_customer +SELECT * FROM customer WHERE customer_id = 123; +\elif :is_employee +\echo 'is not a customer but is an employee' +SELECT * FROM employee WHERE employee_id = 456; +\else +\if yes +\echo 'not a customer or employee' +\else +\echo 'this should never print' +\endif +\endif + + +Conditional blocks must begin with a \if and end +with an \endif, and the pairs must be found in +the same source file. If an EOF is reached on the main file or an +\include-ed file before all local +\if-\endif are matched, then +psql will raise an error. + + +A conditional block can have any number of +\elif clauses, which may optionally be followed by a +single \else clause. + + +The \if and \elif commands +read the rest of the line and evaluate it as a boolean expression. +Currently, expressions are limited to a single unquoted string +which are evaluated like other on/off options, so the valid values +are any unambiguous case insensitive matches for one of: +true, false, 1, +0, on, off, +yes, no. So +t, T, and tR +will all match true. + + +Expressions that do not properly evaluate to true or false will +generate an error and cause the \if or +\elif command to fail. Because that behavior may +change branching context in undesirable ways (executing code which +was intended to be skipped, causing \elif, +\else, and \endif commands to +pair with the wrong \if, etc), it is +recommended that scripts which use conditionals also set +ON_ERROR_STOP. + +Lines within false branches are not evaluated in any way: queries are +not sent to the server, non-conditional commands are not evaluated but +bluntly ignored, nested if-expressions in such branches are also not +evaluated but are tallied to check for proper nesting. + + + \ir or \include_relative filename diff --git a/src/bin/psql/Makefile b/src/bin/psql/Makefile index c53733f..7a418c6 100644 --- a/src/bin/psql/Makefile +++ b/src/bin/psql/Makefile @@ -61,8 +61,16 @@ uninstall: clean distclean: rm -f psql$(X) $(OBJS) lex.backup + rm -rf tmp_check # files removed here are supposed to be in the distribution tarball, # so do not clean them in the clean/distclean rules maintainer-clean: distclean rm -f sql_help.h sql_help.c psqlscanslash.c + + +check: + $(prove_check) + +installcheck: + $(prove_installcheck) diff --git a/src/bin/psql/command.c b/src/bin/psql/command.c index f17f610..390bccd 100644 --- a/src/bin/psql/command.c +++ b/src/bin/psql/command.c @@ -49,6 +49,7 @@ #include "psqlscanslash.h" #include "settings.h" #include "variables.h" +#include "fe_utils/psqlscan_int.h" /* * Editable database object types. @@ -132,7 +133,7 @@ HandleSlashCmds(PsqlScanState scan_state, status = PSQL_CMD_ERROR; } - if (status != PSQL_CMD_ERROR) + if (status != PSQL_CMD_ERROR && pset.active_branch) { /* eat any remaining arguments after a valid command */ /* note we suppress evaluation of backticks here */ @@ -194,6 +195,30 @@ read_connect_arg(PsqlScanState scan_state) return result; } +/* + * Read and interpret argument as a boolean expression. + * Return true if a boolean value was successfully parsed. + */ +static bool +read_boolean_expression(PsqlScanState scan_state, char *action, + bool *result) +{ + char*value = psql_scan_slash_option(scan_state, + OT_NORMAL, NULL, false); +
Re: [HACKERS] Removal of deprecated views pg_user, pg_group, pg_shadow
On 02/13/2017 11:00 AM, Robert Haas wrote: > My big objection to removing these views is that it will break pgAdmin > 3, which uses all three of these views. I understand that the pgAdmin > community is now moving away from pgAdmin 3 and toward pgAdmin 4, but > I bet that pgAdmin 3 still has significant usage and will continue to > have significant usage for at least a year or three. When it's > thoroughly dead, then I think we can go ahead and do this, unless > there are other really important tools still depending on those views, > but it's only been 3 months since the final pgAdmin 3 release. How long would you suggest is appropriate? Postgres 11? 12? Let's set a target date; that way we can communicate it more than once. -- Josh Berkus Containers & Databases Oh My! -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Removal of deprecated views pg_user, pg_group, pg_shadow
Robert Haas wrote: > My big objection to removing these views is that it will break pgAdmin > 3, which uses all three of these views. I understand that the pgAdmin > community is now moving away from pgAdmin 3 and toward pgAdmin 4, but > I bet that pgAdmin 3 still has significant usage and will continue to > have significant usage for at least a year or three. When it's > thoroughly dead, then I think we can go ahead and do this, unless > there are other really important tools still depending on those views, > but it's only been 3 months since the final pgAdmin 3 release. Well, we can remove them from PG10 and pgAdmin3 (and others) be adjusted to use the new ones, conditionally on server version. Surely pgAdmin3 is going to receive further updates, given that it's still widely used? > IMHO, these views aren't costing us much. It'd be nice to get rid of > them eventually but a view definition doesn't really need much > maintenance. Maybe not, but the fact that they convey misleading information is bad. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] possibility to specify template database for pg_regress
Hi 2017-02-13 6:46 GMT+01:00 Michael Paquier : > On Sat, Feb 11, 2017 at 3:03 PM, Pavel Stehule > wrote: > > here is new update - check is done before any creating > > It may be better to do any checks before dropping existing databases > as well... It would be as well just simpler to complain with a single > error message like "database and template list lengths do not match". > next step Regards Pavel > -- > Michael > diff --git a/src/test/regress/pg_regress.c b/src/test/regress/pg_regress.c index d4d00d9c66..ef0542ad0c 100644 --- a/src/test/regress/pg_regress.c +++ b/src/test/regress/pg_regress.c @@ -68,6 +68,7 @@ const char *pretty_diff_opts = "-w -C3"; /* options settable from command line */ _stringlist *dblist = NULL; +_stringlist *templatelist = NULL; bool debug = false; char *inputdir = "."; char *outputdir = "."; @@ -1907,7 +1908,7 @@ drop_database_if_exists(const char *dbname) } static void -create_database(const char *dbname) +create_database(const char *dbname, const char *template) { _stringlist *sl; @@ -1917,10 +1918,12 @@ create_database(const char *dbname) */ header(_("creating database \"%s\""), dbname); if (encoding) - psql_command("postgres", "CREATE DATABASE \"%s\" TEMPLATE=template0 ENCODING='%s'%s", dbname, encoding, + psql_command("postgres", "CREATE DATABASE \"%s\" TEMPLATE=\"%s\" ENCODING='%s'%s", + dbname, template, encoding, (nolocale) ? " LC_COLLATE='C' LC_CTYPE='C'" : ""); else - psql_command("postgres", "CREATE DATABASE \"%s\" TEMPLATE=template0%s", dbname, + psql_command("postgres", "CREATE DATABASE \"%s\" TEMPLATE=\"%s\"%s", + dbname, template, (nolocale) ? " LC_COLLATE='C' LC_CTYPE='C'" : ""); psql_command(dbname, "ALTER DATABASE \"%s\" SET lc_messages TO 'C';" @@ -1995,6 +1998,7 @@ help(void) printf(_(" --outputdir=DIR place output files in DIR (default \".\")\n")); printf(_(" --schedule=FILE use test ordering schedule from FILE\n")); printf(_("(can be used multiple times to concatenate)\n")); + printf(_(" --template=DB use template DB (default \"template0\")\n")); printf(_(" --temp-instance=DIR create a temporary instance in DIR\n")); printf(_(" --use-existinguse an existing installation\n")); printf(_("\n")); @@ -2041,10 +2045,12 @@ regression_main(int argc, char *argv[], init_function ifunc, test_function tfunc {"launcher", required_argument, NULL, 21}, {"load-extension", required_argument, NULL, 22}, {"config-auth", required_argument, NULL, 24}, + {"template", required_argument, NULL, 25}, {NULL, 0, NULL, 0} }; _stringlist *sl; + _stringlist *tl; int c; int i; int option_index; @@ -2154,6 +2160,16 @@ regression_main(int argc, char *argv[], init_function ifunc, test_function tfunc case 24: config_auth_datadir = pg_strdup(optarg); break; + case 25: + +/* + * If a default template was specified, we need to remove it + * before we add the specified one. + */ +free_stringlist(&templatelist); +split_to_stringlist(optarg, ",", &templatelist); +break; + default: /* getopt_long already emitted a complaint */ fprintf(stderr, _("\nTry \"%s -h\" for more information.\n"), @@ -2205,6 +2221,18 @@ regression_main(int argc, char *argv[], init_function ifunc, test_function tfunc unlimit_core_size(); #endif + /* The length of template list should be same like db list */ + if (templatelist != NULL) + { + for (sl = dblist, tl = templatelist; sl && tl; sl = sl->next, tl = tl->next); + if (sl || tl) + { + fprintf(stderr, _("%s: database and template list lengths do not match\n"), + progname); + exit(2); + } + } + if (temp_instance) { FILE *pg_conf; @@ -2454,8 +2482,17 @@ regression_main(int argc, char *argv[], init_function ifunc, test_function tfunc */ if (!use_existing) { - for (sl = dblist; sl; sl = sl->next) - create_database(sl->str); + if (templatelist != NULL) + { + for (sl = dblist, tl = templatelist; sl; sl = sl->next, tl = tl->next) +create_database(sl->str, tl->str); + } + else + { + for (sl = dblist; sl; sl = sl->next) +create_database(sl->str, "template0"); + } + for (sl = extraroles; sl; sl = sl->next) create_role(sl->str, dblist); } -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] \if, \elseif, \else, \endif (was Re: PSQL commands: \quit_if, \quit_unless)
Hello Corey, That effort was creating as many headaches as it solved. Rolled back v12 except for doc paragraph, added more descriptive tap test names. Enjoy. The TAP tests are missing from the patch, not sure why... The stack cleanup is nicer with pop calls. Two debatable suggestions about the example: - put the on_error_stop as the very first thing in the example? - add a comment explaining what the SELECT ... \gset does? -- Fabien. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Removal of deprecated views pg_user, pg_group, pg_shadow
On Mon, Feb 13, 2017 at 2:42 PM, Alvaro Herrera wrote: > Robert Haas wrote: >> My big objection to removing these views is that it will break pgAdmin >> 3, which uses all three of these views. I understand that the pgAdmin >> community is now moving away from pgAdmin 3 and toward pgAdmin 4, but >> I bet that pgAdmin 3 still has significant usage and will continue to >> have significant usage for at least a year or three. When it's >> thoroughly dead, then I think we can go ahead and do this, unless >> there are other really important tools still depending on those views, >> but it's only been 3 months since the final pgAdmin 3 release. > > Well, we can remove them from PG10 and pgAdmin3 (and others) be adjusted > to use the new ones, conditionally on server version. Surely pgAdmin3 > is going to receive further updates, given that it's still widely used? According to the pgAdmin web site, no. (Yeah, that does seem odd.) >> IMHO, these views aren't costing us much. It'd be nice to get rid of >> them eventually but a view definition doesn't really need much >> maintenance. > > Maybe not, but the fact that they convey misleading information is bad. Has anyone actually been confused by them? I'm a bit skeptical about the idea that these are misleading people, because the information is no more or less misleading now than it was in PostgreSQL 8.1 when the views were introduced. And evidently it was not so misleading at that time as to make us thing that a hard compatibility break was warranted. On the other hand, I suppose that the last version of pgAdmin 3 isn't likely to work with future major versions of PostgreSQL anyway unless somebody updates it, and if somebody decides to update it for the other changes in v10 then updating it for the removal of these views won't be much extra work. So maybe it doesn't matter. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] \if, \elseif, \else, \endif (was Re: PSQL commands: \quit_if, \quit_unless)
On Mon, Feb 13, 2017 at 3:04 PM, Fabien COELHO wrote: > > Hello Corey, > > That effort was creating as many headaches as it solved. Rolled back v12 >> except for doc paragraph, added more descriptive tap test names. Enjoy. >> > > The TAP tests are missing from the patch, not sure why... > I went back to master and re-applied v11, something must have gotten lost in translation. > The stack cleanup is nicer with pop calls. > Yeah, dunno why I didn't do that earlier. > Two debatable suggestions about the example: > - put the on_error_stop as the very first thing in the example? > - add a comment explaining what the SELECT ... \gset does? Both are reasonable and easy. Added. diff --git a/doc/src/sgml/ref/psql-ref.sgml b/doc/src/sgml/ref/psql-ref.sgml index ae58708..7e59192 100644 --- a/doc/src/sgml/ref/psql-ref.sgml +++ b/doc/src/sgml/ref/psql-ref.sgml @@ -2035,6 +2035,81 @@ hello 10 + +\if expr +\elif expr +\else +\endif + + +This group of commands implements nestable conditional blocks, like +this: + + +-- stop the script if the condition variables are invalid boolean expressions +\set ON_ERROR_STOP on +-- check for the existence of two separate records in the database and store +-- the results in separate psql variables +SELECT +EXISTS(SELECT 1 FROM customer WHERE customer_id = 123) as is_customer, +EXISTS(SELECT 1 FROM employee WHERE employee_id = 456) as is_employee +\gset +\if :is_customer +SELECT * FROM customer WHERE customer_id = 123; +\elif :is_employee +\echo 'is not a customer but is an employee' +SELECT * FROM employee WHERE employee_id = 456; +\else +\if yes +\echo 'not a customer or employee' +\else +\echo 'this should never print' +\endif +\endif + + +Conditional blocks must begin with a \if and end +with an \endif, and the pairs must be found in +the same source file. If an EOF is reached on the main file or an +\include-ed file before all local +\if-\endif are matched, then +psql will raise an error. + + +A conditional block can have any number of +\elif clauses, which may optionally be followed by a +single \else clause. + + +The \if and \elif commands +read the rest of the line and evaluate it as a boolean expression. +Currently, expressions are limited to a single unquoted string +which are evaluated like other on/off options, so the valid values +are any unambiguous case insensitive matches for one of: +true, false, 1, +0, on, off, +yes, no. So +t, T, and tR +will all match true. + + +Expressions that do not properly evaluate to true or false will +generate an error and cause the \if or +\elif command to fail. Because that behavior may +change branching context in undesirable ways (executing code which +was intended to be skipped, causing \elif, +\else, and \endif commands to +pair with the wrong \if, etc), it is +recommended that scripts which use conditionals also set +ON_ERROR_STOP. + +Lines within false branches are not evaluated in any way: queries are +not sent to the server, non-conditional commands are not evaluated but +bluntly ignored, nested if-expressions in such branches are also not +evaluated but are tallied to check for proper nesting. + + + \ir or \include_relative filename diff --git a/src/bin/psql/Makefile b/src/bin/psql/Makefile index c53733f..7a418c6 100644 --- a/src/bin/psql/Makefile +++ b/src/bin/psql/Makefile @@ -61,8 +61,16 @@ uninstall: clean distclean: rm -f psql$(X) $(OBJS) lex.backup + rm -rf tmp_check # files removed here are supposed to be in the distribution tarball, # so do not clean them in the clean/distclean rules maintainer-clean: distclean rm -f sql_help.h sql_help.c psqlscanslash.c + + +check: + $(prove_check) + +installcheck: + $(prove_installcheck) diff --git a/src/bin/psql/command.c b/src/bin/psql/command.c index f17f610..390bccd 100644 --- a/src/bin/psql/command.c +++ b/src/bin/psql/command.c @@ -49,6 +49,7 @@ #include "psqlscanslash.h" #include "settings.h" #include "variables.h" +#include "fe_utils/psqlscan_int.h" /* * Editable database object types. @@ -132,7 +133,7 @@ HandleSlashCmds(PsqlScanState scan_state, status = PSQL_CMD_ERROR; } - if (status != PSQL_CMD_ERROR) + if (status != PSQL_CMD_ERROR && pset.active_branch) { /* eat any remaining arguments after a valid command */ /* note we suppress evaluation of backticks here */ @@ -194,6 +195,30 @@ read_connect_arg(PsqlScanState scan_state) ret
Re: [HACKERS] Small improvement to parallel query docs
On Sun, Feb 12, 2017 at 7:16 PM, David Rowley wrote: > Tomas Vondra pointed out to me that there's an error in parallel.sgml > which confuses the inner and outer sides of the join. > > I've attached a patch which fixes this, although I think I'm still > missing the point to text's explanation of why Merge Join is not > included due to it having to sort the inner side in each worker. Hash > join must build a hash table for each worker, so why is that OK by > sorting is not? > > Anyway, I've attached a patch which fixes the outer/inner confusion > and cleans up a couple of other grammar mistakes and an omissions > regarding DISTINCT and ORDER BY not being supported in parallel > aggregate. I ended up rewriting a section too which was explaining > parallel aggregate, which I personally believe is a bit more clear to > read, but someone else may think otherwise. -loops or hash joins. The outer side of the join may be any kind of +loops or hash joins. The inner side of the join may be any kind of Oops. That's clearly a mistake. -table. Each worker will execute the outer side of the plan in full, which -is why merge joins are not supported here. The outer side of a merge join -will often involve sorting the entire inner table; even if it involves an -index, it is unlikely to be productive to have multiple processes each -conduct a full index scan of the inner table. +relation. Each worker will execute the inner side of the join in full, +which is why merge joins are not supported here. The inner side of a merge +join will often involve sorting the entire inner relation; even if it +involves an index, it is unlikely to be productive to have multiple +processes each conduct a full index scan of the inner side of the join. Why s/table/relation/? I don't think that's useful, especially because the first part of that very same paragraph would still say "table". Anyway, events have shown the bit about merge joins was wrong thinking on my part. See Dilip's emails about parallel merge join. Maybe we should just change to say that it isn't supported without giving a reason. I hope to commit Dilip's patch to add support for exactly that thing soon. Then we can remove the language altogether and say that it is supported. -COUNT(*), each worker could compute a total, but those totals -would need to combined in order to produce a final answer. If the query -involved a GROUP BY clause, a separate total would need to -be computed for each group. Even though aggregation can't be done entirely -in parallel, queries involving aggregation are often excellent candidates -for parallel query, because they typically read many rows but return only -a few rows to the client. Queries that return many rows to the client -are often limited by the speed at which the client can read the data, -in which case parallel query cannot help very much. +COUNT(*), each worker must compute subtotals which later must +be combined to produce an overall total in order to produce the final +answer. If the query involves a GROUP BY clause, +separate subtotals must be computed for each group seen by each parallel +worker. Each of these subtotals must then be combined into an overall +total for each group once the parallel aggregate portion of the plan is +complete. This means that queries which produce a low number of groups +relative to the number of input rows are often far more attractive to the +query planner, whereas queries which don't collect many rows into each +group are less attractive, due to the overhead of having to combine the +subtotals into totals, of which cannot run in parallel. I don't think "of which cannot run in parallel" is good grammar. I'm somewhat unsure whether the rest is an improvement or not. Other opinions? -a PartialAggregate node. Second, the partial results are +a Partial Aggregate node. Second, the partial results are Oops, that's clearly a mistake. -FinalizeAggregate node. +Finalize Aggregate node. That one, too. -Parallel aggregation is not supported for ordered set aggregates or when -the query involves GROUPING SETS. It can only be used when -all joins involved in the query are also part of the parallel portion -of the plan. +Parallel aggregation is not supported if any aggregate function call +contains DISTINCT or ORDER BY clause and is also +not supported for ordered set aggregates or when the query involves +GROUPING SETS. It can only be used when all joins involved in +the query are also part of the parallel portion of the plan. That chunk seems like an improvement to me. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpre
Re: [HACKERS] Small improvement to parallel query docs
On 14 February 2017 at 09:21, Robert Haas wrote: > On Sun, Feb 12, 2017 at 7:16 PM, David Rowley > -table. Each worker will execute the outer side of the plan in full, which > -is why merge joins are not supported here. The outer side of a merge join > -will often involve sorting the entire inner table; even if it involves an > -index, it is unlikely to be productive to have multiple processes each > -conduct a full index scan of the inner table. > +relation. Each worker will execute the inner side of the join in full, > +which is why merge joins are not supported here. The inner side of a > merge > +join will often involve sorting the entire inner relation; even if it > +involves an index, it is unlikely to be productive to have multiple > +processes each conduct a full index scan of the inner side of the join. > > Why s/table/relation/? I don't think that's useful, especially > because the first part of that very same paragraph would still say > "table". Perhaps it's not correct to do that. I did mean relation is the true relational theory sense, rather than where relkind = 'r'. I didn't really like the way it assumed the inner side was a table. Perhaps its better to just say "involve sorting the entire inner side of the join" -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Small improvement to parallel query docs
On Mon, Feb 13, 2017 at 3:29 PM, David Rowley wrote: > On 14 February 2017 at 09:21, Robert Haas wrote: >> On Sun, Feb 12, 2017 at 7:16 PM, David Rowley >> -table. Each worker will execute the outer side of the plan in full, >> which >> -is why merge joins are not supported here. The outer side of a merge >> join >> -will often involve sorting the entire inner table; even if it involves >> an >> -index, it is unlikely to be productive to have multiple processes each >> -conduct a full index scan of the inner table. >> +relation. Each worker will execute the inner side of the join in full, >> +which is why merge joins are not supported here. The inner side of a >> merge >> +join will often involve sorting the entire inner relation; even if it >> +involves an index, it is unlikely to be productive to have multiple >> +processes each conduct a full index scan of the inner side of the join. >> >> Why s/table/relation/? I don't think that's useful, especially >> because the first part of that very same paragraph would still say >> "table". > > Perhaps it's not correct to do that. I did mean relation is the true > relational theory sense, rather than where relkind = 'r'. I didn't > really like the way it assumed the inner side was a table. Perhaps its > better to just say "involve sorting the entire inner side of the join" Yeah, that seems fine. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: \if, \elseif, \else, \endif (was Re: [HACKERS] PSQL commands: \quit_if, \quit_unless)
Hello Robert, [...] I think we should try to make this REALLY simple. We don't really want to have everybody have to change their PROMPT1 and PROMPT2 strings for this one feature. Ok. I think that we agree that the stack was too much details. How about just introducing a new value for %R? Yes. That is indeed one of the idea being discussed. [...] , or @ if commands are currently being ignored because of the result of an \if test. Currently I find that %R logic is quite good, with "=" for give me something, "^" is start line regular expression for one line, "!" for beware someting is amiss, and in prompt2 "-" for continuation, '"' for in double quotes, "(" for in parenthesis and so on. What would be the mnemonic for "," an "@"? By shortening one of the suggestion down to two characters, we may have three cases: "?t" for "in condition, in true block" "?f" for "in condition, in false block (but true yet to come)" "?z" for "in condition, waiting for the end (true has been executed)". So no indication about the stack depth and contents. tfz for true false and sleeping seem quite easy to infer and understand. "?" is also needed as a separator with the previous field which is the database name sometimes: calvin=> \if false calvin?f=> \echo 1 calvin?f=> \elif true calvin?t=> \echo 2 2 calvin?t=> \else calvin?z=> \echo 3 calvin?z=> \endif calvin=> With the suggested , and @: calvin=> \if false calvin,=> \echo 1 calvin,=> \elif true calvin@=> \echo 2 2 calvin@=> \else calvin,=> \echo 3 calvin,=> \endif calvin=> If I can find some simple mnemonic for "," vs "@" for being executed vs ignored, I could live with that, but nothing obvious comes to my mind. The "?" for condition and Corey's [tfz] looked quite intuitive/mnemonic to me. The drawback is that it is 2 chars vs one char in above. [...] I think that's all you need here: a way to alert users as to whether commands are being ignored, or not. Yep. [...] To sum up your points: just update %R (ok), keep it simple/short (ok... but how simple [2 vs 3 states] and short [1 or 2 chars]), and no real need to be too nice with the user beyond the vital (ok, that significantly simplifies things). -- Fabien. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: \if, \elseif, \else, \endif (was Re: [HACKERS] PSQL commands: \quit_if, \quit_unless)
On Mon, Feb 13, 2017 at 11:29 AM, Robert Haas wrote: > possible states here. Just when you've figured out what tfzffft I agree with what you've said, but wanted to point out that any condition that follows a 'z' would itself be 'z'. Not that tfz is much better.
Re: [HACKERS] \if, \elseif, \else, \endif (was Re: PSQL commands: \quit_if, \quit_unless)
Hello Corey, I went back to master and re-applied v11, something must have gotten lost in translation. Probably you need "git add" for added files? About v14: patch applies, make check ok, psql tap tests ok. All seems fine to me. Test coverage is better than a lot of other features. Code & comments seem fine. Doc and example are clear enough to me. The level of messages in interactive is terse but informative when needed, on errors and when commands are ignored. The only missing point is about doing something to the prompt, but given the current messages ISTM that this can wait for a follow-up patch. Robert Haas advice is to keep it simple and short and in %R. There was also some suggestion to have a "show the stack" command for debug, I think that this can wait as well. I've turned again the CF entry to "ready for committers", to see what committers thing about this new and simplified version. -- Fabien. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] libpq Alternate Row Processor
On Mon, Feb 13, 2017 at 8:46 AM, Kyle Gearhart wrote: > On 2/9/17 7:15 PM, Jim Nasby wrote: >> Can you run a trace to see where all the time is going in the single row >> case? I don't see an obvious time-suck with a quick look through the code. >> It'd be interesting to see how things change if you eliminate the filler >> column from the SELECT. > > Traces are attached, these are with callgrind. > > profile_nofiller.txt: single row without filler column > profile_filler.txt: single row with filler column > profile_filler_callback.txt: callback with filler column > > pqResultAlloc looks to hit malloc pretty hard. The callback reduces all of > that to a single malloc for each row. Couldn't that be optimized, say, by preserving malloc'd memory when in single row mode and recycling it? (IIRC during the single row mode discussion this optimization was voted down). A barebones callback mode ISTM is a complete departure from the classic PGresult interface. This code is pretty unpleasant IMO: acct->abalance = *((int*)PQgetvalue(res, 0, i)); acct->abalance = __bswap_32(acct->abalance); Your code is faster but foists a lot of the work on the user, so it's kind of cheating in a way (although very carefully written applications might be able to benefit). merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Small improvement to parallel query docs
Robert Haas wrote: > +COUNT(*), each worker must compute subtotals which later must > +be combined to produce an overall total in order to produce the final > +answer. If the query involves a GROUP BY clause, > +separate subtotals must be computed for each group seen by each parallel > +worker. Each of these subtotals must then be combined into an overall > +total for each group once the parallel aggregate portion of the plan is > +complete. This means that queries which produce a low number of groups > +relative to the number of input rows are often far more attractive to the > +query planner, whereas queries which don't collect many rows into each > +group are less attractive, due to the overhead of having to combine the > +subtotals into totals, of which cannot run in parallel. > I don't think "of which cannot run in parallel" is good grammar. I'm > somewhat unsure whether the rest is an improvement or not. Other opinions? Does this read any more clearly? +COUNT(*), each worker must compute subtotals which are later +combined in order to produce an overall total for the final answer. If +the query involves a GROUP BY clause, separate subtotals +must be computed for each group seen by each parallel worker. After the +parallel aggregate portion of the plan is complete, there is a serial step +where the group subtotals from all of the parallel workers are combined +into an overall total for each group. Because of the overhead of combining +the subtotals into totals, plans which produce few groups relative to the +number of input rows are often more attractive to the query planner +than plans which produce many groups relative to the number of input rows. I got rid of the ", of which cannot run in parallel" entirely. It was already stated that the subtotals->totals step runs "once the parallel aggregate portion of the plan is complete." which implies that it is serial. I made that explicit with "there is a serial step". Also, the purpose of the ", of which cannot run in parallel" sentence is to communicate why the planner prefers one plan over another and, if I'm reading this correctly, the subtotals->totals step is serial for both plans so that is not a reason to prefer one over the other. I think that the planner prefers plans rather than queries, so I changed that as well. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: \if, \elseif, \else, \endif (was Re: [HACKERS] PSQL commands: \quit_if, \quit_unless)
On Mon, Feb 13, 2017 at 3:40 PM, Fabien COELHO wrote: > > Hello Robert, > > [...] I think we should try to make this REALLY simple. We don't really >> want to have everybody have to change their PROMPT1 and PROMPT2 strings for >> this one feature. >> > > Ok. I think that we agree that the stack was too much details. > > How about just introducing a new value for %R? >> > > Yes. That is indeed one of the idea being discussed. > > [...] , or @ if commands are currently being ignored because of the result >> of an \if test. >> > ,-or-@ has one advantage over t/f/z: we cannot infer the 'z' state purely from pset.active_state, and the if-stack itself is sequestered in scan_state, which is not visible to the get_prompt() function. I suppose if somebody wanted it, a separate slash command that does a verbose printing of the current if-stack would be nice, but mostly just to explain to people how the if-stack works. > If I can find some simple mnemonic for "," vs "@" for being executed vs > ignored, I could live with that, but nothing obvious comes to my mind. > @in't gonna execute it? I'm here all week, try the veal. To sum up your points: just update %R (ok), keep it simple/short (ok... but > how simple [2 vs 3 states] and short [1 or 2 chars]), and no real need to > be too nice with the user beyond the vital (ok, that significantly > simplifies things). I'd be fine with either of these on aesthetic grounds. On technical grounds, 'z' is harder to show.
Re: [HACKERS] COPY IN/BOTH vs. extended query mode
On 14 Feb. 2017 06:15, "Robert Haas" wrote: On Mon, Jan 23, 2017 at 9:12 PM, Robert Haas wrote: > According to the documentation for COPY IN mode, "If the COPY command > was issued via an extended-query message, the backend will now discard > frontend messages until a Sync message is received, then it will issue > ReadyForQuery and return to normal processing." I added a similar > note to the documentation for COPY BOTH mode in > 91fa8532f4053468acc08534a6aac516ccde47b7, and the documentation > accurately describes the behavior of the server. However, this seems > to make fully correct error handling for clients using libpq almost > impossible, because PQsendQueryGuts() sends > Parse-Bind-Describe-Execute-Sync in one shot without regard to whether > the command that was just sent invoked COPY mode (cf. the note in > CopyGetData about why we ignore Flush and Sync in that function). > > So imagine that the client uses libpq to send (via the extended query > protocol) a COPY IN command (or some hypothetical command that starts > COPY BOTH mode to begin). If the server throws an error before the > Sync message is consumed, it will bounce back to PostgresMain which > will set doing_extended_query_message = true after which it will > consume messages, find the Sync, reset that flag, and send > ReadyForQuery. On the other hand, if the server enters CopyBoth mode, > consumes the Sync message in CopyGetData (or a similar function), and > *then* throws an ERROR, the server will wait for a second Sync message > from the client before issuing ReadyForQuery. There is no sensible > way of coping with this problem in libpq, because there is no way for > the client to know which part of the server code consumed the Sync > message that it already sent. In short, from the client's point of > view, if it enters COPY IN or COPY BOTH mode via the extend query > protocol, and an error occurs on the server, the server MAY OR MAY NOT > expect a further Sync message before issuing ReadyForQuery, and the > client has no way of knowing -- except maybe waiting for a while to > see what happens. > > It does not appear to me that there is any good solution to this > problem. Fixing it on the server side would require a wire protocol > change - e.g. one kind of Sync message that is used in a > Parse-Bind-Describe-Execute-Sync sequence that only terminates > non-COPY commands and another kind that is used to signal the end even > of COPY. Fixing it on the client side would require all clients to > know prior to initiating an extended-query-protocol sequence whether > or not the command was going to initiate COPY, which is an awful API > even if didn't constitute an impossible-to-contemplate backward > compatibility break. Perhaps we will have to be content to document > the fact that this part of the protocol is depressingly broken... > > ...unless of course somebody can see something that I'm missing here > and the situation isn't as bad as it currently appears to me to be. Anybody have any thoughts on this? I've been thinking on it a bit, but don't really have anything that can be done without a protocol version bump. We can't really disallow extended query protocol COPY, too much is likely to break. And we can't fix it without a protocol change. A warning in the docs for COPY would be appropriate, noting that clients should use the simple query protocol to issue COPY. It's kind of mixing layers, since many users won't see the protocol level or have any idea if their client driver uses ext or simple query, but we can at least advise libpq users. Also in the protocol docs, noting that clirnfa sending COPY should prefer the simple query protocol due to error recovery issues with COPY and extended query protocol.
Re: [HACKERS] Small improvement to parallel query docs
On 14 February 2017 at 10:10, Brad DeJong wrote: > Robert Haas wrote: > >> +COUNT(*), each worker must compute subtotals which later >> must >> +be combined to produce an overall total in order to produce the final >> +answer. If the query involves a GROUP BY clause, >> +separate subtotals must be computed for each group seen by each parallel >> +worker. Each of these subtotals must then be combined into an overall >> +total for each group once the parallel aggregate portion of the plan is >> +complete. This means that queries which produce a low number of groups >> +relative to the number of input rows are often far more attractive to >> the >> +query planner, whereas queries which don't collect many rows into each >> +group are less attractive, due to the overhead of having to combine the >> +subtotals into totals, of which cannot run in parallel. > >> I don't think "of which cannot run in parallel" is good grammar. I'm >> somewhat unsure whether the rest is an improvement or not. Other opinions? > > Does this read any more clearly? > > +COUNT(*), each worker must compute subtotals which are later > +combined in order to produce an overall total for the final answer. If > +the query involves a GROUP BY clause, separate subtotals > +must be computed for each group seen by each parallel worker. After the > +parallel aggregate portion of the plan is complete, there is a serial > step > +where the group subtotals from all of the parallel workers are combined > +into an overall total for each group. Because of the overhead of > combining > +the subtotals into totals, plans which produce few groups relative to the > +number of input rows are often more attractive to the query planner > +than plans which produce many groups relative to the number of input > rows. Actually looking over this again I think it's getting into too much detail which is already described in the next paragraph (of which I think is very clear). I propose we just remove the whole paragraph, and mention about the planning and estimated number of groups stuff in another new paragraph. I've attached a patch to this effect, which also just removes the text about why we don't support Merge Join. I felt something needed written in its place, so I mentioned that identical hash tables are created in each worker. This is perhaps not required, but the paragraph seemed a bit empty without it. I also noticed a mistake "based on a column taken from the inner table", this "inner" I assume should be "outer" since it surely must be talking of a parameterised index scan?, in which case the parameter is from the outer side, not the inner. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services parallel_doc_fixes_v2.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Small improvement to parallel query docs
David Rowley wrote: > I propose we just remove the whole paragraph, and mention about > the planning and estimated number of groups stuff in another new paragraph. > > I've attached a patch to this effect ... s/In a worse case scenario/In the worst case scenario,/ Other than that, the phrasing in the new patch reads very smoothly. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: [COMMITTERS] pgsql: Remove all references to "xlog" from SQL-callable functions in p
On Tue, Feb 14, 2017 at 2:31 AM, Fujii Masao wrote: > On Tue, Feb 14, 2017 at 2:06 AM, Robert Haas wrote: >> On Mon, Feb 13, 2017 at 11:43 AM, Fujii Masao wrote: >>> On Fri, Feb 10, 2017 at 5:36 AM, Robert Haas wrote: Remove all references to "xlog" from SQL-callable functions in pg_proc. Commit f82ec32ac30ae7e3ec7c84067192535b2ff8ec0e renamed the pg_xlog directory to pg_wal. To make things consistent, and because "xlog" is terrible terminology for either "transaction log" or "write-ahead log" rename all SQL-callable functions that contain "xlog" in the name to instead contain "wal". (Note that this may pose an upgrade hazard for some users.) >>> >>> There are still some mentions to "xlog" in the doc. Maybe we should replace >>> them with "wal" as the attached patch does. >> >> +1. Patch looks good to me. > > Thanks! Pushed. It seems like the previous review I provided for the set of renaming patches has been ignored: https://www.postgresql.org/message-id/CAB7nPqSm=PNSe3EfvnEResdFzpQMcXXgapFBcF=EFdxVW4=f...@mail.gmail.com Good that somebody else checked... -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Small improvement to parallel query docs
On 14 February 2017 at 10:56, Brad DeJong wrote: > David Rowley wrote: >> I propose we just remove the whole paragraph, and mention about >> the planning and estimated number of groups stuff in another new paragraph. >> >> I've attached a patch to this effect ... > > s/In a worse case scenario/In the worst case scenario,/ > > Other than that, the phrasing in the new patch reads very smoothly. Thanks. Updated patch attached. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services parallel_doc_fixes_v3.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: [[Parallel] Shared] Hash
On Thu, Feb 9, 2017 at 2:03 AM, Ashutosh Bapat wrote: >> >> 0004-hj-refactor-batch-increases-v4.patch: >> >> Modify the existing hash join code to detect work_mem exhaustion at >> the point where chunks are allocated, instead of checking after every >> tuple insertion. This matches the logic used for estimating, and more >> importantly allows for some parallelism in later patches. > > The patch has three changes > 1. change dense_alloc() to accept respect_workmem argument and use it > within the function. > 2. Move call to ExecHashIncreaseNumBatches() into dense_alloc() from > ExecHashTableInsert() to account for memory before inserting new tuple > 3. Check growEnabled before calling ExecHashIncreaseNumBatches(). Thanks for the review! > I think checking growEnabled within ExecHashIncreaseNumBatches() is > more easy to maintain that checking at every caller. If someone is to > add a caller tomorrow, s/he has to remember to add the check. Hmm. Yeah. In the later 0010 patch ExecHashIncreaseNumBatches will be used in a slightly different way -- not for making decisions or performing the hash table shrink, but only for reallocating the batch arrays. I will see if putting the growEnabled check back in there in the 0004 patch and then refactoring in a later patch makes more sense to someone reviewing the patches independently, for the next version. > It might be better to add some comments in > ExecHashRemoveNextSkewBucket() explaining why dense_alloc() should be > called with respect_work_mem = false? ExecHashSkewTableInsert() does > call ExecHashIncreaseNumBatches() after calling > ExecHashRemoveNextSkewBucket() multiple times, so it looks like we do > expect increase in space used and thus go beyond work_mem for a short > while. Is there a way we can handle this case in dense_alloc()? Right, that needs some explanation, which I'll add for the next version. The explanation is that while 'shrinking' the hash table, we may need to go over the work_mem limit by one chunk for a short time. That is already true in master, but by moving the work_mem checks into dense_alloc I ran into the problem that dense_alloc might decide to shrink the hash table which needs to call dense alloc. Shrinking works by spinning through all the chunks copying only the tuples we want to keep into new chunks and freeing the old chunks as we go. We will temporarily go one chunk over work_mem when we allocate the first new chunk but before we've freed the first old one. We don't want shrink operations to trigger recursive shrink operations, so we disable respect for work_mem when calling it from ExecHashIncreaseNumBatches. In the course of regular hash table loading, we want to respect work_mem. Looking at the v5 patch series I posted yesterday, I see that in fact ExecHashIncreaseNumBatches calls dense_alloc with respect_work_mem = true in the 0004 patch, and then I corrected that mistake in the 0008 patch; I'll move the correction back to the 0004 patch in the next version. To reach ExecHashIncreaseNumBatches, see the "ugly" query in hj-test-queries.sql (posted with v5). In ExecHashRemoveNextSkewBucket I preserved the existing behaviour of not caring about work_mem when performing the rare operation of copying a tuple from the skew bucket into a dense_alloc memory chunk so it can be inserted into a regular (non-skew) bucket. > Is it possible that increasing the number of batches changes the > bucket number of the tuple being inserted? If so, should we > recalculate the bucket and batch of the tuple being inserted? No -- see the function documentation for ExecHashGetBucketAndBatch. -- Thomas Munro http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] log_autovacuum_min_duration doesn't log VACUUMs
On 13 February 2017 at 17:12, Robert Haas wrote: > On Sun, Feb 12, 2017 at 9:20 PM, Jim Nasby wrote: >> On 2/10/17 2:33 PM, Robert Haas wrote: >>> That having been said, I think it could certainly be useful to have >>> more control over what DDL gets logged in foreground processes. >> >> FWIW, this is a significant problem outside of DDL. Once you're past 1-2 >> levels of nesting SET client_min_messages = DEBUG becomes completely >> useless. >> >> I think the ability to filter logging based on context would be very >> valuable. AFAIK you could actually do that for manual logging with existing >> plpgsql support, but obviously that won't help for anything else. > > Well, that's moving the goalposts a lot further and in an unrelated > direction. I don't think that it's a good idea to change the > semantics of log_autovacuum_min_duration in the way Simon is proposing > for the reasons I noted, but I think that an acceptable patch could be > 100 lines of pretty straightforward code and documentation, like a new > GUC that controls this output for the vac-non-autovac case. If my idea would not log manual ANALYZE, well, we can add that in easily. There is no reason to block the patch for such a minor foible. This is a short patch to address a specific minor issue, not a blue sky redesign of logging. If someone else wants to add more, they can, later. Incremental change, just as happens all the time everywhere else. -- Simon Riggshttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Removal of deprecated views pg_user, pg_group, pg_shadow
* Robert Haas (robertmh...@gmail.com) wrote: > On Thu, Feb 9, 2017 at 8:54 PM, Stephen Frost wrote: > > Note that these views have not been consistently maintained and have > > ended up including some role attributes from recent versions > > That's not a bug. According to the documentation, these views exist > for compatibility with PostgreSQL versions before 8.1, so there's no > need to update them with newer fields. Clients who are expecting to > talk with a pre-8.1 PostgreSQL won't expect those fields to be present > anyway. Yet we added bypassrls to them, after a similar discussion of how they're for backwards compat and we can't get rid of them, but we should update them with new things, blah, blah. > My big objection to removing these views is that it will break pgAdmin > 3, which uses all three of these views. I understand that the pgAdmin > community is now moving away from pgAdmin 3 and toward pgAdmin 4, but > I bet that pgAdmin 3 still has significant usage and will continue to > have significant usage for at least a year or three. When it's > thoroughly dead, then I think we can go ahead and do this, unless > there are other really important tools still depending on those views, > but it's only been 3 months since the final pgAdmin 3 release. IMHO, if it's dead enough to not get updated for such changes then we shouldn't care enough about it to maintain backwards compat views in our code-base for it. > IMHO, these views aren't costing us much. It'd be nice to get rid of > them eventually but a view definition doesn't really need much > maintenance. (A contrib module doesn't either, but more than a view > definition.) Clearly, it does need some form of maintenance and consideration or it ends up in a confusing and inconsistent state, as evidenced by the fact that that's exactly where we are. I don't really expect to actually win this argument, as I've had the experience of trying to fight this fight before, but I certainly don't agree that we should try to continue to maintain them for pgAdmin3's sake. Thanks! Stephen signature.asc Description: Digital signature
Re: [HACKERS] Removal of deprecated views pg_user, pg_group, pg_shadow
Robert, * Robert Haas (robertmh...@gmail.com) wrote: > On Mon, Feb 13, 2017 at 2:42 PM, Alvaro Herrera > wrote: > > Well, we can remove them from PG10 and pgAdmin3 (and others) be adjusted > > to use the new ones, conditionally on server version. Surely pgAdmin3 > > is going to receive further updates, given that it's still widely used? > > According to the pgAdmin web site, no. (Yeah, that does seem odd.) I really do not think the PG core project should be held hostage by an external and apparently not-really-maintained project. What if we introduce some other difference in PG10 that breaks pgAdmin3? Are we going to roll that change back? Are we sure that none exists already? And, as I understand it, pgAdmin3 hasn't got support for features introduced as far back as 9.5 either, surely it's not going to have support added to it for the publication/subscription things or declarative partitioning, should we rip those out to accomedate pgAdmin3? > >> IMHO, these views aren't costing us much. It'd be nice to get rid of > >> them eventually but a view definition doesn't really need much > >> maintenance. > > > > Maybe not, but the fact that they convey misleading information is bad. > > Has anyone actually been confused by them? This isn't something we can prove. Nor can we prove that no one has ever been confused. What we can show is that they're clearly misleading and inconsistent. Even if no one is ever confused by them, having them is bad. > I'm a bit skeptical about the idea that these are misleading people, > because the information is no more or less misleading now than it was > in PostgreSQL 8.1 when the views were introduced. And evidently it > was not so misleading at that time as to make us thing that a hard > compatibility break was warranted. No, that's not how I recall the discussion going down when I introduced them in 8.1. It was more along the lines of "Here's a patch, oh, and I added these views too." The archives might prove me wrong, as memory does fade after years, but I certainly don't recall any lengthy discussion about if we should add these views or not. In short, my recollection is that we added them because it was easy to do at the time and we didn't have the foresight to realize just how hard they would become to get rid of and how much time they would suck up from people arguing about them. This is part of the bigger picture that we need to consider whenever we think about backwards-compat mis-features. > On the other hand, I suppose that the last version of pgAdmin 3 isn't > likely to work with future major versions of PostgreSQL anyway unless > somebody updates it, and if somebody decides to update it for the > other changes in v10 then updating it for the removal of these views > won't be much extra work. So maybe it doesn't matter. Indeed. Thanks! Stephen signature.asc Description: Digital signature
[HACKERS] Possible TODO: allow arbitrary expressions in event trigger WHEN
Is there a reason not to allow $SUBJECT? Specifically, it'd be nice to be able to do something like WHEN tag LIKE 'ALTER%'. -- Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX Experts in Analytics, Data Architecture and PostgreSQL Data in Trouble? Get it in Treble! http://BlueTreble.com 855-TREBLE2 (855-873-2532) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] libpq Alternate Row Processor
On Mon, Feb 13, 2017 Merlin Moncure wrote: >A barebones callback mode ISTM is a complete departure from the classic >PGresult interface. This code is pretty unpleasant IMO: acct->abalance = *((int*)PQgetvalue(res, 0, i)); abalance = acct->__bswap_32(acct->abalance); > Your code is faster but foists a lot of the work on the user, so it's kind of > cheating in a way (although very carefully written applications might be able > to benefit). The bit you call out above is for single row mode. Binary mode is a slippery slope, with or without the proposed callback. Let's remember that one of the biggest, often overlooked, gains when using an ORM is that it abstracts all this mess away. The goal here is to prevent all the ORM/framework folks from having to implement protocol. Otherwise they get to wait on libpq to copy from the socket to the PGconn buffer to the PGresult structure to their buffers. The callback keeps the slowest guy on the team...on the bench. Kyle Gearhart -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] renaming pg_resetxlog to pg_resetwal has broken pg_upgrade.
Upgrading from 9.6 to dev, I now get: $ rm bisectdata -r ; bisect/bin/pg_ctl initdb -D bisectdata; bisect/bin/pg_upgrade -b /usr/local/pgsql9_6/bin/ -B bisect/bin/ -d 96 -D bisectdata/ check for "/usr/local/pgsql9_6/bin/pg_resetwal" failed: No such file or directory This looks somewhat complicated to fix. Should check_bin_dir test the old cluster version, and make a deterministic check based on that? Or just check for either spelling, and stash the successful result somewhere? Culprit is here: commit 85c11324cabaddcfaf3347df78555b30d27c5b5a Author: Robert Haas Date: Thu Feb 9 16:23:46 2017 -0500 Rename user-facing tools with "xlog" in the name to say "wal". This means pg_receivexlog because pg_receivewal, pg_resetxlog becomes pg_resetwal, and pg_xlogdump becomes pg_waldump. Cheers, Jeff
[HACKERS] Set of fixes for WAL consistency check facility
Hi all, Beginning a new thread to raise awareness... As already reported here, I had a look at what has been committed in a507b869: https://www.postgresql.org/message-id/CAB7nPqRzCQb=vdfhvmtp0hmlbhu6z1agdo4gjsup-hp8jx+...@mail.gmail.com Here are a couple of things I have noticed while looking at the code: + * Portions Copyright (c) 2016, PostgreSQL Global Development Group s/2016/2017/ in bufmask.c and bufmask.h. + if (ItemIdIsNormal(iid)) + { + + HeapTupleHeader page_htup = (HeapTupleHeader) page_item; Unnecessary newline here. +* Read the contents from the backup copy, stored in WAL record and +* store it in a temporary page. There is not need to allocate a new +* page here, a local buffer is fine to hold its contents and a mask +* can be directly applied on it. s/not need/no need/. In checkXLogConsistency(), FPWs that have the flag BKPIMAGE_APPLY set will still be checked, resulting in a FPW being compared to itself. I think that those had better be bypassed. Please find attached a patch with those fixes. I am attaching as well this patch to next CF. Regards, -- Michael consistency-checks-fix.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel bitmap heap scan
On Tue, Feb 14, 2017 at 12:48 AM, Dilip Kumar wrote: > On Mon, Feb 13, 2017 at 6:24 PM, Robert Haas > wrote: > > I don't think it's acceptable (or necessary) to move the DSA > > definitions into postgres.h. Why do you think you need to do that, > > vs. just including dsa.h in a few more places? > > I need to access dsa_pointer in tidbitmap.h, which is included from > FRONTEND as well. Now, problem is that dsa.h is including #include > "port/atomics.h", but atomic.h can not be included if FRONTEND is > defined. > > #ifndef ATOMICS_H > #define ATOMICS_H > #ifdef FRONTEND > #error "atomics.h may not be included from frontend code" > #endif > > Is there any other solution to this ? How about creating another header file with the parallel changes and include it only in necessary places? Following are my observations, while going through the patch. +#if SIZEOF_DSA_POINTER == 4 +typedef uint32 dsa_pointer; +#else +typedef uint64 dsa_pointer; +#endif I feel the declaration of the above typdef can be moved into the section above if we going with the current move into postgres.h file. +/* + * tbm_alloc_shared + * + * Callback function for allocating the memory for hashtable elements. + * It allocates memory from DSA if tbm holds a reference to a dsa. + */ +static inline void * +pagetable_allocate(pagetable_hash *pagetable, Size size) Function name and comments mismatch? Regards, Hari Babu Fujitsu Australia
Re: [HACKERS] [WIP]Vertical Clustered Index (columnar store extension)
On Tue, Feb 14, 2017 at 2:57 AM, Konstantin Knizhnik < k.knizh...@postgrespro.ru> wrote: > Hi, > > I wonder if it is possible to somehow benchmark your clustered index > implementation. > I tried to create VCI index for lineitem table from TPC and run Q6 query. > After index creation Postgres is not using parallel execution plan any > more but speed of sequential plan is not changed > and nothing in query execution plan indicates that VCI index is used: > > > postgres=# explain select > sum(l_extendedprice*l_discount) as revenue > from > lineitem_projection > where > l_shipdate between '1996-01-01' and '1997-01-01' > and l_discount between 0.08 and 0.1 > and l_quantity < 24; > > QUERY > PLAN > > > > > --- > > - > Finalize Aggregate (cost=608333.85..608333.86 rows=1 width=4) >-> Gather (cost=608333.23..608333.84 rows=6 width=4) > Workers Planned: 6 > -> Partial Aggregate (cost=607333.23..607333.24 rows=1 width=4) >-> Parallel Seq Scan on lineitem_projection > (cost=0.00..607024.83 rows=61680 width=8) > Filter: ((l_shipdate >= '1996-01-01'::date) AND > (l_shipdate <= '1997-01-01'::date) AND (l_discount >= '0.08'::double > precision) AN > D (l_discount <= '0.1'::double precision) AND (l_quantity < '24'::double > precision)) > (6 rows) > > postgres=# select > sum(l_extendedprice*l_discount) as revenue > from > lineitem_projection > where > l_shipdate between '1996-01-01' and '1997-01-01' > and l_discount between 0.08 and 0.1 > and l_quantity < 24; >revenue > - > 6.2e+08 > (1 row) > > Time: 1171.324 ms (00:01.171) > > postgres=# create index vci_idx on lineitem_projection using > vci(l_shipdate,l_quantity,l_extendedprice,l_discount,l_tax, > l_returnflag,l_linestatus); > CREATE INDEX > Time: 4.705 ms > > > postgres=# explain select > * from > lineitem_projection > where > l_shipdate between '1996-01-01' and '1997-01-01' > and l_discount between 0.08 and 0.1 > and l_quantity < 24; > > QUERY > PLAN > > > > --- > --- > Seq Scan on lineitem_projection (cost=0.00..382077.00 rows=1 width=22) >Filter: ((l_shipdate >= '1996-01-01'::date) AND (l_shipdate <= > '1997-01-01'::date) AND (l_discount >= '0.08'::double precision) AND > (l_discount <= ' > 0.1'::double precision) AND (l_quantity < '24'::double precision)) > (2 rows) > > postgres=# select > > > sum(l_extendedprice*l_discount) as revenue > from > lineitem_projection > where > l_shipdate between '1996-01-01' and '1997-01-01' > and l_discount between 0.08 and 0.1 > and l_quantity < 24; > revenue > > 6.2112e+08 > (1 row) > > Time: 4304.355 ms (00:04.304) > > > I wonder if there is any query which can demonstrate advantages of using > VCI index? > The current patch that I shared doesn't contains the plan and executor changes to show the performance benefit of the clustered index. we used custom plan to generate the plan for the clustered index. Currently I am working on it to rebase it to current master and other necessary changes. In the current state of the patch, I cannot take any performance tests, as it needs some major changes according to the latest PostgreSQL version. I have an old performance report that is took on 9.5 attached for your reference. The current patch that is shared is to find out the best approach in developing a columnar storage in PostgreSQL, by adopting Index access methods + additional hooks or pluggable storage access methods? The only problem I can think of pluggable storage methods is, to use the proper benefits of columnar storage, the planner and executor needs to be changed to support vector processing, But whereas in the current model, we implemented the same with custom plan and additional hooks. The same may be possible with pluggable storage methods also. Regards, Hari Babu Fujitsu Australia VCI_DBT3_Query_Performance.xlsx Description: MS-Excel 2007 spreadsheet -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel Index Scans
On Mon, Feb 13, 2017 at 5:47 PM, Robert Haas wrote: > On Sat, Feb 11, 2017 at 6:35 PM, Amit Kapila wrote: > Why can't we rely on _bt_walk_left? The reason is mentioned in comments, but let me try to explain with some example. When you reach that point of code, it means that either the current page (assume page number is 10) doesn't contain any matching items or it is a half-dead page, both of which indicates that we have to move to the previous page. Now, before checking if the current page contains matching items, we signal parallel machinery (via _bt_parallel_release) to allow workers to read the previous page (assume previous page number is 9). So it is quite possible that after deciding that current page (page number 10) doesn't contain any matching tuples if we directly move to the previous page (in this case it will be 9) by using _bt_walk_left, some other worker would have read page 9. In short, if we directly use _bt_walk_left(), then we are prone to returning some of the values twice as multiple workers can read the same page. >>> But ... the entire point of the seize-and-release stuff is to avoid >>> this problem. You're suppose to seize the scan, read the current >>> page, walk left, store the page you find in the scan, and then release >>> the scan. >> Exactly and that is what is done in the patch. Basically, if we found >> that the current page is half-dead or it doesn't contain any matching >> items, then release the current buffer, seize the scan, read the >> current page, walk left and so on. I am slightly confused here >> because it seems both of us agree on what is the right thing to do and >> according to me that is how it is implemented. Are you just ensuring >> about whether I have implemented as discussed or do you see a problem >> with the way it is implemented? > > Well, before, I thought you said that relying entirely on > _bt_walk_left couldn't work because then two people might end up > running it at the same time, and that would cause problems. But if > you can only run _bt_walk_left while you've got the scan seized, then > that can't happen. Evidently I'm missing something here. > I think the comment at that place is not as clear as it should be. So how about changing it as below: Existing comment: -- /* * For parallel scans, get the last page scanned as it is quite * possible that by the time we try to fetch previous page, other * worker has also decided to scan that previous page. So we * can't rely on _bt_walk_left call. */ Modified comment: -- /* * For parallel scans, it is quite possible that by the time we try to fetch * the previous page, another worker has also decided to scan that * previous page. So to avoid that we need to get the last page scanned * from shared scan descriptor before calling _bt_walk_left. */ -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] renaming pg_resetxlog to pg_resetwal has broken pg_upgrade.
On Tue, Feb 14, 2017 at 9:09 AM, Jeff Janes wrote: > check for "/usr/local/pgsql9_6/bin/pg_resetwal" failed: No such file or > directory > > This looks somewhat complicated to fix. Should check_bin_dir test the old > cluster version, and make a deterministic check based on that? Or just > check for either spelling, and stash the successful result somewhere? The fix does not seem that complicated to me. get_bin_version() just needs pg_ctl to be present, so we could move that in check_bin_dir() after looking if pg_ctl is in a valid state, and reuse the version of bin_version to see if the binary version is post-10 or not. Then the decision making just depends on this value. Please see the patch attached, this is passing 9.6->10 and check-world. I have added as well an open item on the wiki. -- Michael pgupgrade-rename-fix.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PATCH: two slab-like memory allocators
Hi, On 2017-02-11 14:40:18 +0100, Tomas Vondra wrote: > On 02/11/2017 02:33 AM, Andres Freund wrote: > > > I have a hard time believing this the cache efficiency of linked lists > > > (which may or may not be real in this case) out-weights this, but if you > > > want to try, be my guest. > > > > I'm not following - why would there be overhead in anything for > > allocations bigger than 4 (or maybe 8) bytes? You can store the list > > (via chunk ids, not pointers) inside the chunks itself, where data > > otherwise would be. And I don't see why you'd need a doubly linked > > list, as the only operations that are needed are to push to the front of > > the list, and to pop from the front of the list - and both operations > > are simple to do with a singly linked list? > > > > Oh! I have not considered storing the chunk indexes (for linked lists) in > the chunk itself, which obviously eliminates the overhead concerns, and > you're right a singly-linked list should be enough. > > There will be some minimum-chunk-size requirement, depending on the block > size/chunk size. I wonder whether it makes sense to try to be smart and make > it dynamic, so that we only require 1B or 2B for cases when only that many > chunks fit into a block, or just say that it's 4B and be done with it. I doubt it's worth it - it seems likely that the added branch is more noticeable overall than the possible savings of 3 bytes. Also, won't the space be lost due to alignment *anyway*? + /* chunk, including SLAB header (both addresses nicely aligned) */ + fullChunkSize = MAXALIGN(sizeof(SlabChunkData) + MAXALIGN(chunkSize)); In that case I'd just Assert(MAXIMUM_ALIGNOF >= sizeof(slist_head)) and use a plain slist - no point in being more careful than that. > I mean 2^32 chunks ought to be enough for anyone, right? Yea, that seems enough; but given the alignment thing pointed out above, I think we can just use plain pointers - and that definitely should be enough :P > I'm still not buying the cache efficiency argument, though. One of the > reasons is that the implementation prefers blocks with fewer free chunks > when handling palloc(), so pfree() is making the block less likely to be > chosen by the next palloc(). That'll possibly de-optimize L1, but for L2 usage the higher density seems like it'll be a win. All this memory is only accessed by a single backend, so packing as densely as possible is good. > > If so, if you have e.g. 8 byte allocations and 64kb sized blocks, you > > end up with an array of 1024 doubly linked lists, which'll take up 64kb > > on its own. And there a certainly scenarios where even bigger block > > sizes could make sense. That's both memory overhead, and runtime > > overhead, because at reset-time we'll have to check the whole array > > (which'll presumably largely be empty lists). Resetting is a pretty > > common path... > > > > True, but it's not entirely clear if resetting is common for the paths where > we use those new allocators. That's fair enough. There's still the memory overhead, but I guess we can also live with that. > Also, if we accept that it might be a problem, what other solution you > propose? I don't think just merging everything into a single list is a good > idea, for the reasons I explained before (it might make the resets somewhat > less expensive, but it'll make pfree() more expensive). Now that I think about it, a binary heap, as suggested elsewhere, isn't entirely trivial to use for this - it's more or less trivial to "fix" the heap after changing an element's value, but it's harder to find that element first. But a two-level list approach seems like it could work quite well - basically a simplified skip-list. A top-level list contains that has the property that all the elements have a distinct #free, and then hanging of those sub-lists for all the other blocks with the same number of chunks. I think that'd be a better implementation, but I can understand if you don't immediately want to go there. > What might work is replacing the array with a list, though. So we'd have a > list of lists, which would eliminate the array overhead. That seems likely to be significantly worse, because a) iteration is more expensive b) accessing the relevant list to move between two different "freecount" lists would be O(n). - Andres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Logical Replication and Character encoding
At Sat, 4 Feb 2017 21:27:32 +0100, Petr Jelinek wrote in > Hmm I wonder if we should just make the subscriber send the > client_encoding always (based on server encoding of the subscriber). > That should solve the issue in combination with your patch no? Yeah, right. I considered that a subscriber might want to set its own value for that but that is useless. The attached patch does the following things to just prevent making a logical replication connection between databases with inconsistent encodings. - added client_encoding with subscriber(or standby)'s encoding at the last of options in libpqrcv_connect. - CheckLogicalDecodingRequirements refuses connection for a request with inconsistent encodings. > ERROR: logical replication requires consistent encodings on both side > (publisher = UTF8, subscriber = EUC_JP) We could check this earlier if involving physical replication but I think this is a matter of logical replication. regards, -- Kyotaro Horiguchi NTT Open Source Software Center >From e8233c47d174261a331718e9434d5fc825523305 Mon Sep 17 00:00:00 2001 From: Kyotaro Horiguchi Date: Tue, 14 Feb 2017 11:18:20 +0900 Subject: [PATCH] Refuse logical replication with inconsistent encodings Logical replication requires encoding conversion of characters if the publisher and subscriber are on different encodings. We could add character conversion on-the-fly but we just hinhibit a connection for the case as the first step. --- .../replication/libpqwalreceiver/libpqwalreceiver.c| 14 ++ src/backend/replication/logical/logical.c | 13 + 2 files changed, 27 insertions(+) diff --git a/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c b/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c index 44a89c7..550a76d 100644 --- a/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c +++ b/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c @@ -24,6 +24,7 @@ #include "access/xlog.h" #include "miscadmin.h" #include "pgstat.h" +#include "mb/pg_wchar.h" #include "replication/logicalproto.h" #include "replication/walreceiver.h" #include "storage/proc.h" @@ -134,9 +135,22 @@ libpqrcv_connect(const char *conninfo, bool logical, const char *appname, } keys[++i] = "fallback_application_name"; vals[i] = appname; + + /* + * Override clinet_encoding with the database encoding for logical + * replication. + */ + if (logical) + { + keys[++i] = "client_encoding"; + vals[i] = GetDatabaseEncodingName(); + } + keys[++i] = NULL; vals[i] = NULL; + Assert(i < 5); /* size of keys/vals */ + conn = palloc0(sizeof(WalReceiverConn)); conn->streamConn = PQconnectdbParams(keys, vals, /* expand_dbname = */ true); if (PQstatus(conn->streamConn) != CONNECTION_OK) diff --git a/src/backend/replication/logical/logical.c b/src/backend/replication/logical/logical.c index 5529ac8..fc74ff1 100644 --- a/src/backend/replication/logical/logical.c +++ b/src/backend/replication/logical/logical.c @@ -33,6 +33,8 @@ #include "access/xact.h" #include "access/xlog_internal.h" +#include "mb/pg_wchar.h" + #include "replication/decode.h" #include "replication/logical.h" #include "replication/reorderbuffer.h" @@ -87,6 +89,17 @@ CheckLogicalDecodingRequirements(void) (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE), errmsg("logical decoding requires a database connection"))); + /* + * Currently logical replication refuses subscription that requires + * chacater conversion. + */ + if (pg_get_client_encoding() != GetDatabaseEncoding()) + ereport(ERROR, +(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE), + errmsg("logical replication requires consistent encodings on both side (publisher = %s, subscriber = %s)", + GetDatabaseEncodingName(), + pg_get_client_encoding_name(; + /* * TODO: We got to change that someday soon... * -- 2.9.2 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] REINDEX CONCURRENTLY 2.0
On 02/13/2017 06:31 AM, Michael Paquier wrote: - What should we do with REINDEX DATABASE CONCURRENTLY and the system catalog? I so not think we can reindex the system catalog concurrently safely, so what should REINDEX DATABASE do with the catalog indexes? Skip them, reindex them while taking locks, or just error out? System indexes cannot have their OIDs changed as they are used in syscache lookups. So just logging a warning looks fine to me, and the price to pay to avoid taking an exclusive lock even for a short amount of time. Good idea, I think I will add one line of warning if it finds any system index in the schema. - What level of information should be output in VERBOSE mode? Er, something like that as well, no? DETAIL: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s. REINDEX (VERBOSE) currently prints one such line per index, which does not really work for REINDEX (VERBOSE) CONCURRENTLY since it handles all indexes on a relation at the same time. It is not immediately obvious how this should work. Maybe one such detail line per table? This is a crasher: create table aa (a int primary key); reindex (verbose) schema concurrently public ; For invalid indexes sometimes snapshots are still active (after issuing the previous crash for example): =# reindex (verbose) table concurrently aa; WARNING: XX002: cannot reindex concurrently invalid index "public.aa_pkey_cct", skipping LOCATION: ReindexRelationConcurrently, indexcmds.c:2119 WARNING: 01000: snapshot 0x7fde12003038 still active Thanks for testing the patch! The crash was caused by things being allocated in the wrong memory context when reindexing multiple tables and therefore freed on the first intermediate commit. I have created a new memory context to handle this in which I only allocate the lists which need to survive between transactions.. Hm, when writing the above I just realized why ReindexTable/ReindexIndex did not suffer from the same bug. It is because the first transaction there allocated in the PortalHeapMemory context which survives commit. I really need to look at if there is a clean way to handle memory contexts in my patch. I also found the snapshot still active bug, it seems to have been caused by REINDEX TABLE CONCURRENTLY leaving an open snapshot which cannot be popped by PortalRunUtility(). Thanks again! Andreas -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] possibility to specify template database for pg_regress
On 2017-02-13 20:59:43 +0100, Pavel Stehule wrote: > Hi > > 2017-02-13 6:46 GMT+01:00 Michael Paquier : > > > On Sat, Feb 11, 2017 at 3:03 PM, Pavel Stehule > > wrote: > > > here is new update - check is done before any creating > > > > It may be better to do any checks before dropping existing databases > > as well... It would be as well just simpler to complain with a single > > error message like "database and template list lengths do not match". > > > > next step I still fail to see why --use-existing as suggested in https://www.postgresql.org/message-id/20170208002900.vkldujzfkwbvq...@alap3.anarazel.de isn't sufficient. - Andres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] possibility to specify template database for pg_regress
On Tue, Feb 14, 2017 at 11:36 AM, Andres Freund wrote: > On 2017-02-13 20:59:43 +0100, Pavel Stehule wrote: >> Hi >> >> 2017-02-13 6:46 GMT+01:00 Michael Paquier : >> >> > On Sat, Feb 11, 2017 at 3:03 PM, Pavel Stehule >> > wrote: >> > > here is new update - check is done before any creating >> > >> > It may be better to do any checks before dropping existing databases >> > as well... It would be as well just simpler to complain with a single >> > error message like "database and template list lengths do not match". >> > >> >> next step This looks fine to me. > I still fail to see why --use-existing as suggested in > https://www.postgresql.org/message-id/20170208002900.vkldujzfkwbvq...@alap3.anarazel.de > isn't sufficient. Some tests create objects without removing them, meaning that continuous runs would fail with only --use-existing. This patch brings value in such cases. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] possibility to specify template database for pg_regress
On 2017-02-14 11:46:52 +0900, Michael Paquier wrote: > > I still fail to see why --use-existing as suggested in > > https://www.postgresql.org/message-id/20170208002900.vkldujzfkwbvq...@alap3.anarazel.de > > isn't sufficient. > > Some tests create objects without removing them, meaning that > continuous runs would fail with only --use-existing. This patch brings > value in such cases. You can trivially script the CREATE/DROP DB outside with --use-existing. Which seems a lot more flexible than adding more and more options to pg_regress. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Possible TODO: allow arbitrary expressions in event trigger WHEN
Jim Nasby writes: > Is there a reason not to allow $SUBJECT? Specifically, it'd be nice to > be able to do something like WHEN tag LIKE 'ALTER%'. Seems like it would be a seriously bad idea for such an expression to be able to invoke arbitrary SQL code. What if it calls a user-defined function that tries to do DDL? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Provide list of subscriptions and publications in psql's completion
On Tue, Feb 14, 2017 at 2:07 AM, Fujii Masao wrote: > No, the tab-completions for ALTER/DROP PUBLICATION should show the local > publications because those commands drop and alter the local ones. OTOH, > CREATE/ALTER SUBSCRIPTOIN ... PUBLICATION should show nothing because > the remote publications in the publisher side should be specified there. Doh, yes. You are right about that. >> - Addition of a view pg_subscriptions with all the non-sensitive data. >> (- Or really extend pg_stat_subscriptions with the database ID and use >> it for tab completion?) > > Probably I failed to get Peter's point... Anyway IMO that we can expose all > the > columns except the sensitive information (i.e., subconninfo field) > in pg_subscription to even non-superusers. Then we can use pg_subscription > for the tab-completion for ALTER/DROP SUBSCRIPTION. To be honest, I find subconninfo quite useful to check where a subscription is getting its changes from, so I'd rather not touch it. It looks as well a bit overkill to just create a new view on an object type non-superusers cannot even use... There are already 1 view and 1 system catalog related to it, so I'd be of the opinion to let it fail silently with a permission error and keep it as an empty list for them. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] pg_waldump's inclusion of backend headers is a mess
Dilip complained on another thread about the apparent difficulty of getting the system to compile after adding a reference to dsa_pointer to tidbitmap.c: https://www.postgresql.org/message-id/CAFiTN-u%3DBH_b0QE9FG%2B%2B_Ht6Q%3DF84am%3DJ-rt7fNbQkq3%2BqX3VQ%40mail.gmail.com I dug into the problem and discovered that pg_waldump is slurping up a tremendous crapload of backend headers. It includes gin.h, gist_private.h, hash_xlog.h, nbtree.h, and spgist.h, which all end up including amapi.h, which includes genam.h, which includes tidbitmap.h. When you make tidbitmap.h include utils/dsa.h, it in turn includes port/atomics.h, which has an #error preventing frontend includes. There are a number of ways to fix this problem; probably the cheapest available hack is to stick #ifndef FRONTEND around the additional stuff getting added to tidbitmap.h. But that seems to be attacking the problem from the wrong end. The problem isn't really that having tidbitmap.h refer to backend-only stuff is unreasonable; it's that pg_waldump is skating on thin ice by importing so many backend-only headers. It's only a matter of time before some other one of the many files that it directly or indirectly includes develops a similar problem. Therefore, I proposed the attached patch, which splits spgxlog.h out of spgist_private.h, nbtxlog.h out of nbtree.h, gistxlog.h out of gist_private.h, and ginxlog.h and ginblock.h out of gin_private.h. These new header files are included by pg_waldump in lieu of the "private" versions. This solves the immediate problem and I suspect it will head off future problems as well. Thoughts, comments, objections, better ideas? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company disentangle-am-xlog.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] log_autovacuum_min_duration doesn't log VACUUMs
On Mon, Feb 13, 2017 at 5:19 PM, Simon Riggs wrote: > On 13 February 2017 at 17:12, Robert Haas wrote: >> On Sun, Feb 12, 2017 at 9:20 PM, Jim Nasby wrote: >>> On 2/10/17 2:33 PM, Robert Haas wrote: That having been said, I think it could certainly be useful to have more control over what DDL gets logged in foreground processes. >>> >>> FWIW, this is a significant problem outside of DDL. Once you're past 1-2 >>> levels of nesting SET client_min_messages = DEBUG becomes completely >>> useless. >>> >>> I think the ability to filter logging based on context would be very >>> valuable. AFAIK you could actually do that for manual logging with existing >>> plpgsql support, but obviously that won't help for anything else. >> >> Well, that's moving the goalposts a lot further and in an unrelated >> direction. I don't think that it's a good idea to change the >> semantics of log_autovacuum_min_duration in the way Simon is proposing >> for the reasons I noted, but I think that an acceptable patch could be >> 100 lines of pretty straightforward code and documentation, like a new >> GUC that controls this output for the vac-non-autovac case. > > If my idea would not log manual ANALYZE, well, we can add that in > easily. There is no reason to block the patch for such a minor foible. > > This is a short patch to address a specific minor issue, not a blue > sky redesign of logging. > > If someone else wants to add more, they can, later. Incremental > change, just as happens all the time everywhere else. Please don't ignore the other points in my original response. Thanks, -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel bitmap heap scan
On Mon, Feb 13, 2017 at 8:48 AM, Dilip Kumar wrote: > On Mon, Feb 13, 2017 at 6:24 PM, Robert Haas wrote: >> I don't think it's acceptable (or necessary) to move the DSA >> definitions into postgres.h. Why do you think you need to do that, >> vs. just including dsa.h in a few more places? > > I need to access dsa_pointer in tidbitmap.h, which is included from > FRONTEND as well. Now, problem is that dsa.h is including #include > "port/atomics.h", but atomic.h can not be included if FRONTEND is > defined. > > #ifndef ATOMICS_H > #define ATOMICS_H > #ifdef FRONTEND > #error "atomics.h may not be included from frontend code" > #endif > > Is there any other solution to this ? Well, any problem like this generally has a bunch of solutions, so I'll say yes. I spent a good chunk of today studying the issue and started a new thread devoted specifically to it: https://www.postgresql.org/message-id/CA%2BTgmoZ%3DF%3DGkxV0YEv-A8tb%2BAEGy_Qa7GSiJ8deBKFATnzfEug%40mail.gmail.com -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Removal of deprecated views pg_user, pg_group, pg_shadow
On Mon, Feb 13, 2017 at 5:29 PM, Stephen Frost wrote: > * Robert Haas (robertmh...@gmail.com) wrote: >> On Thu, Feb 9, 2017 at 8:54 PM, Stephen Frost wrote: >> > Note that these views have not been consistently maintained and have >> > ended up including some role attributes from recent versions >> >> That's not a bug. According to the documentation, these views exist >> for compatibility with PostgreSQL versions before 8.1, so there's no >> need to update them with newer fields. Clients who are expecting to >> talk with a pre-8.1 PostgreSQL won't expect those fields to be present >> anyway. > > Yet we added bypassrls to them, after a similar discussion of how > they're for backwards compat and we can't get rid of them, but we should > update them with new things, blah, blah. My recollection of that conversation is a little fuzzy and I can't immediately find the email in the archives, but I don't think I argued for that; I think I may have argued against it; I don't think that I understood the point of it then and I don't now. In short, if you're feeling under some kind of obligation to update those views, don't feel obliged on my account. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] REINDEX CONCURRENTLY 2.0
On Tue, Feb 14, 2017 at 11:32 AM, Andreas Karlsson wrote: > On 02/13/2017 06:31 AM, Michael Paquier wrote: >> Er, something like that as well, no? >> DETAIL: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s. > > REINDEX (VERBOSE) currently prints one such line per index, which does not > really work for REINDEX (VERBOSE) CONCURRENTLY since it handles all indexes > on a relation at the same time. It is not immediately obvious how this > should work. Maybe one such detail line per table? Hard to recall this thing in details with the time and the fact that a relation is reindexed by processing all the indexes once at each step. Hm... What if ReindexRelationConcurrently() actually is refactored in such a way that it processes all the steps for each index individually? This way you can monitor the time it takes to build completely each index, including its . This operation would consume more transactions but in the event of a failure the amount of things to clean up is really reduced particularly for relations with many indexes. This would as well reduce VERBOSE to print one line per index rebuilt. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Removal of deprecated views pg_user, pg_group, pg_shadow
On Mon, Feb 13, 2017 at 5:38 PM, Stephen Frost wrote: > I really do not think the PG core project should be held hostage by an > external and apparently not-really-maintained project. What if we > introduce some other difference in PG10 that breaks pgAdmin3? Are we > going to roll that change back? Are we sure that none exists already? As a general rule, I don't agree that taking account of what will break external projects constitutes being "held hostage". I think that kind of hyperbole is unhelpful. How about we call it "trying not to gratuitously break popular third-party tools"? But in this case, I conceded the exact point that you are making here later on in the exact same email to which you are replying, so I'm a little mystified by the way you wrote this response. > In short, my > recollection is that we added them because it was easy to do at the time > and we didn't have the foresight to realize just how hard they would > become to get rid of and how much time they would suck up from people > arguing about them. I'm pretty sure we've spent more time arguing about them than it would have taken to maintain them from now until 2030, and I don't really understand why you're on the warpath to get rid of them. But I don't really care about it enough to keep arguing now that I've realized pgAdmin3 isn't going to work with PG 10 either way. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] parallelize queries containing subplans
On Thu, Feb 2, 2017 at 9:23 AM, Amit Kapila wrote: > On Wed, Feb 1, 2017 at 10:04 PM, Robert Haas wrote: >> On Tue, Jan 31, 2017 at 6:01 AM, Amit Kapila wrote: >>> Moved this patch to next CF. >> >> So here is what seems to be the key hunk from this patch: >> >> /* >> - * Since we don't have the ability to push subplans down to workers at >> - * present, we treat subplan references as parallel-restricted. We need >> - * not worry about examining their contents; if they are unsafe, we >> would >> - * have found that out while examining the whole tree before reduction >> of >> - * sublinks to subplans. (Really we should not see SubLink during a >> - * max_interesting == restricted scan, but if we do, return true.) >> + * We can push the subplans only if they don't contain any >> parallel-aware >> + * node as we don't support multi-level parallelism (parallel workers >> + * invoking another set of parallel workers). >> */ >> -else if (IsA(node, SubLink) || >> - IsA(node, SubPlan) || >> - IsA(node, AlternativeSubPlan)) >> +else if (IsA(node, SubPlan)) >> +return !((SubPlan *) node)->parallel_safe; >> +else if (IsA(node, AlternativeSubPlan)) >> { >> -if (max_parallel_hazard_test(PROPARALLEL_RESTRICTED, context)) >> -return true; >> +AlternativeSubPlan *asplan = (AlternativeSubPlan *) node; >> +ListCell *lc; >> + >> +foreach(lc, asplan->subplans) >> +{ >> +SubPlan*splan = (SubPlan *) lfirst(lc); >> + >> +Assert(IsA(splan, SubPlan)); >> + >> +if (max_parallel_hazard_walker((Node *) splan, context)) >> +return true; >> +} >> + >> +return false; >> } >> >> I don't see the reason for having this new code that makes >> AlternativeSubPlan walk over the subplans; expression_tree_walker >> already does that. >> I have removed the check of AlternativeSubPlan so that it can be handled by expression_tree_walker. > > It is done this way mainly to avoid recursion/nested calls, but I > think you prefer to handle it via expression_tree_walker so that code > looks clean. Another probable reason could be that there is always a > chance that we miss handling of some expression of a particular node > (in this case AlternativeSubPlan), but if that is the case then there > are other places in the code which do operate on individual subplan/'s > in AlternativeSubPlan list. > >> On the flip side I don't see the reason for >> removing the max_parallel_hazard_test() call for AlternativeSubPlan or >> for SubLink. > > If we don't remove the current test of max_parallel_hazard_test() for > AlternativeSubPlan, then AlternativeSubPlan node will be considered > parallel restricted, so why do we want that after this patch. For > Sublinks, I think they would have converted to Subplans by the time we > reach this function for the parallel restricted check. Can you > elaborate what you have in mind for the handling of AlternativeSubPlan > and SubLink? > I have removed the changes for SubLink node. >> +* CTE scans are not consider for parallelism (cf >> >> >> considered >> Fixed. >> + select count(*)from tenk1 where (two, four) not in >> >> whitespace Fixed. Attached patch fixes all the comments raised till now. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com pq_pushdown_subplan_v5.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers