Re: [HACKERS] Prepared statements assume text type in PG10
Jack Christensenwrites: > Pre-version 10: > jack=# prepare ps as select $1; > ERROR: could not determine data type of parameter $1 > But on PG10 the type defaults to text: > jack=# prepare ps as select $1; > PREPARE > I looked through the git log and couldn't find any commits referencing > this. Is this an intended behavior change? Yes, it is; we no longer allow any SELECT output column to remain unresolved as unknown type. If it would have come out like that before, we force it to text instead. Peter misidentified the responsible commit though: it was 1e7c4bb00. (I guess the commit log message is a bit misleading, because it only talks about unknown literals ... but undetermined parameters and undecorated NULL constants would receive the same force-to-text-type treatment.) regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Prepared statements assume text type in PG10
On Sat, Oct 7, 2017 at 4:27 PM, Peter Geogheganwrote: > I suspect commit d8d32d9 is involved here, though I haven't verified that. Weirdly, there is a git hash collision here, so you'll have to put in d8d32d9a (8 characters -- the default of 7 for a short git hash isn't cutting it). -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Prepared statements assume text type in PG10
On Sat, Oct 7, 2017 at 2:56 PM, Jack Christensenwrote: > The test suite for the Go PostgreSQL driver pgx > (https://github.com/jackc/pgx) found an unexpected behavior change in PG10. > Previously, it was impossible to prepare a statement with a unknown or > ambiguous parameter type. > > Pre-version 10: > > jack=# prepare ps as select $1; > ERROR: could not determine data type of parameter $1 > > But on PG10 the type defaults to text: > > jack=# prepare ps as select $1; > PREPARE > Time: 0.183 ms > jack=# execute ps('Hello, there'); >?column? > -- > Hello, there > (1 row) > > Time: 0.437 ms > > I looked through the git log and couldn't find any commits referencing this. > Is this an intended behavior change? I suspect commit d8d32d9 is involved here, though I haven't verified that. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple
Peter Geoghegan wrote: > On Sat, Oct 7, 2017 at 1:31 AM, Alvaro Herrera> wrote: > >> As you must have seen, Alvaro said he has a variant of Dan's original > >> script that demonstrates that a problem remains, at least on 9.6+, > >> even with today's fix. I think it's the stress-test that plays with > >> fillfactor, many clients, etc [1]. > > > > I just execute setup.sql once and then run this shell command, > > > > while :; do > > psql -e -P pager=off -f ./repro.sql > > for i in `seq 1 5`; do > > psql -P pager=off -e --no-psqlrc -f ./lock.sql & > > done > > wait && psql -P pager=off -e --no-psqlrc -f ./reindex.sql > > psql -P pager=off -e --no-psqlrc -f ./report.sql > > echo "done" > > done > > I cannot reproduce the problem on my personal machine using this > script/stress-test. I tried to do so on the master branch git tip. > This reinforces the theory that there is some timing sensitivity, > because the remaining race condition is very narrow. Hmm, I think I added a random sleep (max. 100ms) right after the HeapTupleSatisfiesVacuum call in vacuumlazy.c (lazy_scan_heap), and that makes the race easier to hit. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple
On Sat, Oct 7, 2017 at 1:31 AM, Alvaro Herrerawrote: >> As you must have seen, Alvaro said he has a variant of Dan's original >> script that demonstrates that a problem remains, at least on 9.6+, >> even with today's fix. I think it's the stress-test that plays with >> fillfactor, many clients, etc [1]. > > I just execute setup.sql once and then run this shell command, > > while :; do > psql -e -P pager=off -f ./repro.sql > for i in `seq 1 5`; do > psql -P pager=off -e --no-psqlrc -f ./lock.sql & > done > wait && psql -P pager=off -e --no-psqlrc -f ./reindex.sql > psql -P pager=off -e --no-psqlrc -f ./report.sql > echo "done" > done I cannot reproduce the problem on my personal machine using this script/stress-test. I tried to do so on the master branch git tip. This reinforces the theory that there is some timing sensitivity, because the remaining race condition is very narrow. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] tablespaces inside $PGDATA considered harmful
On 26/09/17 20:44, Mark Kirkwood wrote: $ pg_basebackup -D . WARNING: could not read symbolic link "pg_tblspc/space1": Invalid argument pg_basebackup: directory "/data0/pgdata/11/pg_tblspc/space1" exists but is not empty pg_basebackup: removing contents of data directory "." Err - actually this example is wrong - sorry. In fact pg_basebackup is complaining because it does not want to overwrite the contents of the tablespace (need to use the -T option as I'm on the same host)! A correct example of pg_basebackup failing due to tablespaces inside $PGDATA/pg_tblspc can be easily demonstrated by trying to set up streaming replication on another host: $ pg_basebackup -h 10.0.119.100 -P -D . WARNING: could not read symbolic link "pg_tblspc/space1": Invalid argument pg_basebackup: could not create directory "./pg_tblspc": File exists Fortunately this can be worked around by changing to tar format: $ pg_basebackup -h 10.0.119.100 -Ft -P -D . WARNING: could not read symbolic link "pg_tblspc/space1": Invalid argument 1560632/1560632 kB (100%), 2/2 tablespaces ...however, not that great that the plain mode is busted. regards Mark -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Prepared statements assume text type in PG10
The test suite for the Go PostgreSQL driver pgx (https://github.com/jackc/pgx) found an unexpected behavior change in PG10. Previously, it was impossible to prepare a statement with a unknown or ambiguous parameter type. Pre-version 10: jack=# prepare ps as select $1; ERROR: could not determine data type of parameter $1 But on PG10 the type defaults to text: jack=# prepare ps as select $1; PREPARE Time: 0.183 ms jack=# execute ps('Hello, there'); ?column? -- Hello, there (1 row) Time: 0.437 ms I looked through the git log and couldn't find any commits referencing this. Is this an intended behavior change? Jack -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Discussion on missing optimizations
On 07/10/17 19:59, Tom Lane wrote: > Petr Jelinekwrites: >> On 07/10/17 18:15, Tom Lane wrote: >>> No, I'm afraid you didn't read that comment closely enough. This will >>> flat out fail for cases like "select ... where x=x order by x", because >>> there will already be a single-element EC for x and so the clause will >>> just disappear entirely. If that doesn't happen, then instead you're >>> creating an EC with duplicate entries, which is an idea I seriously >>> dislike; the semantics of that don't seem clear at all to me. > >> Hmm it did not disappear (and worked fine in SQL level tests). > > I may not be correctly remembering what the query would need to look > like for there to be single-element ECs in existence at this point; but > I surely would not like this code to assume that none will exist till > later. > >> I don't >> think I fully understand the "EC with duplicate entries" part and what's >> the problem with it so I'll defer to your wisdom there. > > Well, as one example, assume that we use your patch and consider what > happens with > where x = y and x = x > vs > where x = x and x = y > > In the first case we end up with an EC that's just {x,y}, because the > second process_equivalence() will find both sides in the same EC and > conclude that the second clause is redundant. (Which it is, if the > equality operators have the same notion of what to do with nulls.) > In the second case we end up with an EC containing {x,x,y}, which > at minimum will result in emitting redundant generated quals. I'm > not sure if it could have any worse consequences than that, but I'm > not sure it couldn't either. But this is bogus in any case, because > those WHERE clauses surely should produce identical results. > > Looking further ahead, if ECs have to support being multisets rather > than pure sets, that would put a crimp in ever improving this logic to > use a smarter UNION-FIND algorithm. (I've not yet seen queries where the > number of EC members is large enough to make that a serious issue, but > I think it could happen, particularly with inheritance/partitioning.) > Okay, that makes sense, thanks for explanation. Your patch is the way to go then. >>> This passes the smell test for me in the sense of not adding any >>> significant number of planner cycles except when the weird case occurs. >>> It leaves something on the table in that there are some situations >>> where X=X could be converted, but we won't do it because we don't see >>> the clause as being a potential EC (because it's not at top level), >>> as in the second new regression test case below. I think that's >>> probably all right; I don't see any way to be more thorough without >>> adding a lot of new cycles all the time, and I don't believe this is >>> worth that. > >> My code had same issue. I think going deeper would require quite a bit >> of new code (and cycles) for something that's even less likely to happen >> than simple X=X while the current patch is quite cheap win. > > Yeah. I'm not really convinced it's a big win, but it's so cheap we > might as well do it. The only case where it would expend cycles and > fail to give an improvement is if we have X=X with a non-strict operator, > which I think is a case that never arises in practice at present, > because btree effectively assumes that all btree operators are strict. > (Someday we might want to relax that, though, which is why I don't > want to let the planner just assume it.) > In real-life probably not, but seems like these small bits can be useful marketing wise, given articles like the one which started this. And it should not hurt anything either. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Horrible CREATE DATABASE Performance in High Sierra
I wrote: > Current status is that I've filed a bug report with Apple and am waiting > to see their response before deciding what to do next. If they fix the > issue promptly then there's little need for us to do anything. Not having heard a peep from Apple yet, I decided to do a bit more experimenting. I found that indeed, issuing fewer and larger mmap/msync requests helps enormously. If you're willing to go as far as issuing only one per file, the speed seems on par with non-fsync. But that requires being able to mmap 1GB-sized files, so it's probably not something we want to do. What I did instead was to adjust the logic in copy_file() so that the unit of flush requests can be a multiple of the unit of read/write requests. (My original thought of just raising the buffer size seems like not as good an idea; it's less cache-friendly for one thing.) I find that on both Linux and macOS-with-HFS, requesting a flush only every 1MB seems to be a win compared to flushing every 64KB as we currently do. Actually it seems that on macOS, every increment of increase in the flush distance helps, but with HFS you hit diminishing returns after 1MB or so. With APFS you need a flush distance of 32MB or more to have credible performance. Accordingly I propose the attached patch. If anyone's interested in experimenting on other platforms, we might be able to refine/complicate the FLUSH_DISTANCE selection further, but I think this is probably good enough for a first cut. regards, tom lane diff --git a/src/backend/storage/file/copydir.c b/src/backend/storage/file/copydir.c index a5e074e..eae9f5a 100644 --- a/src/backend/storage/file/copydir.c +++ b/src/backend/storage/file/copydir.c @@ -139,10 +139,24 @@ copy_file(char *fromfile, char *tofile) int dstfd; int nbytes; off_t offset; + off_t flush_offset; - /* Use palloc to ensure we get a maxaligned buffer */ + /* Size of copy buffer (read and write requests) */ #define COPY_BUF_SIZE (8 * BLCKSZ) + /* + * Size of data flush requests. It seems beneficial on most platforms to + * do this every 1MB or so. But macOS, at least with early releases of + * APFS, is really unfriendly to small mmap/msync requests, so there do it + * only every 32MB. + */ +#if defined(__darwin__) +#define FLUSH_DISTANCE (32 * 1024 * 1024) +#else +#define FLUSH_DISTANCE (1024 * 1024) +#endif + + /* Use palloc to ensure we get a maxaligned buffer */ buffer = palloc(COPY_BUF_SIZE); /* @@ -163,11 +177,23 @@ copy_file(char *fromfile, char *tofile) /* * Do the data copying. */ + flush_offset = 0; for (offset = 0;; offset += nbytes) { /* If we got a cancel signal during the copy of the file, quit */ CHECK_FOR_INTERRUPTS(); + /* + * We fsync the files later, but during the copy, flush them every so + * often to avoid spamming the cache and hopefully get the kernel to + * start writing them out before the fsync comes. + */ + if (offset - flush_offset >= FLUSH_DISTANCE) + { + pg_flush_data(dstfd, flush_offset, offset - flush_offset); + flush_offset = offset; + } + pgstat_report_wait_start(WAIT_EVENT_COPY_FILE_READ); nbytes = read(srcfd, buffer, COPY_BUF_SIZE); pgstat_report_wait_end(); @@ -190,15 +216,11 @@ copy_file(char *fromfile, char *tofile) errmsg("could not write to file \"%s\": %m", tofile))); } pgstat_report_wait_end(); - - /* - * We fsync the files later but first flush them to avoid spamming the - * cache and hopefully get the kernel to start writing them out before - * the fsync comes. - */ - pg_flush_data(dstfd, offset, nbytes); } + if (offset > flush_offset) + pg_flush_data(dstfd, flush_offset, offset - flush_offset); + if (CloseTransientFile(dstfd)) ereport(ERROR, (errcode_for_file_access(), -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Discussion on missing optimizations
On Fri, Oct 06, 2017 at 10:19:54PM -0400, Tom Lane wrote: > Andres Freundwrites: > > On 2017-10-06 21:33:16 -0400, Adam Brusselback wrote: > >> The article in question is here: > >> https://blog.jooq.org/2017/09/28/10-cool-sql-optimisations-that-do-not-depend-on-the-cost-model/ > > > That's interesting. > > The impression I have in a quick scan is that probably hardly any of these > are cases that any of the DB designers think are important in themselves. That's true for some of those. But some of them might become important when you start pushing WHERE constraints from outside into inner table sources and subqueries, as dumb-looking constraints can simply appear from pushing non-dumb-looking constraints. More than the op optimizations would make a big difference for me: - turning subqueries into joins - turning ORs into UNIONs It is easy enough to work around the lack of this optimization in many cases, but it does make queries more verbose. - pushing WHERE constraints from outer queries into the table source queries (_including_ VIEWs) - determining that some table in a query that had WHERE constraints pushed into it... now has a very well-filled out lookup key, therefore it's the one that should be the table source to start the plan with, i.e., that it should be first in the outermost loop of a nested loop join For me these two would be huge wins. I have to resort to functions with roughly the same body as views just so that I can have the optimizer pick the correct plan. This causes a lot of code duplication in my schemas. - pushing WHERE constraints from outer queries into HAVING thence WHERE constraints on GROUP BY queries where the outer constraints are on columns used to GROUP BY I find myself making two versions of views that do aggregation: one that does not, and one that does. This allows me to use the non-aggregating view in contexts where I need this optimization, but then I have to re-code the aggregation at that layer. Again, lots of duplication. These sorts of optimizations are huge. > Rather, they fall out of more general optimization attempts, or not, > depending on the optimization mechanisms in use in a particular DB. > For example, reducing "WHERE 1=1" to "WHERE TRUE" and then to nothing > comes out of a constant-subexpression-precalculation mechanism for us, > whereas "WHERE column=column" doesn't fall to that approach. ISTM it > would be really dumb to expend planner cycles looking specifically for > that case, so I guess that DB2 et al are finding it as a side-effect of > some more general optimization ... I wonder what that is? If you can reduce the number of compilations / optimization passes for statements, then spending more time in the optimizer is not a big deal. So, when invoked via PREPARE I would say spending more cycles looking for this sort of thing is OK, but in many other cases it's not. Also, sometimes these cases crop up do to pushing constraints into VIEWs and sub-queries. In those cases then constant sub-expression elimination can be a win. > (edit: a few minutes later, I seem to remember that equivclass.c has > to do something special with the X=X case, so maybe it could do > something else special instead, with little new overhead.) I'd expect that column = column is not trivial to turn into TRUE, not unless those columns are NOT NULLable. > > 9. Unneeded Self JOIN > > > Can't remember discussions of this. > > I can't get very excited about that one either. > > In the end, what the article fails to consider is that all of these are > tradeoffs, not unalloyed goods. If you spend planner cycles on every > query to look for cases that only the most unabashedly brain-dead ORMs > ever generate, you're not really doing your users a favor on balance. I can't get very excited about this one either, though I do believe it can arise as the author says, "when you build complex views and JOIN them to each other". Maybe I'm not excited about it because I've not needed it :) Nico -- -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Slow synchronous logical replication
On 10/07/2017 10:42 PM, Andres Freund wrote: Hi, On 2017-10-07 22:39:09 +0300, konstantin knizhnik wrote: In our sharded cluster project we are trying to use logical relication for providing HA (maintaining redundant shard copies). Using asynchronous logical replication has not so much sense in context of HA. This is why we try to use synchronous logical replication. Unfortunately it shows very bad performance. With 50 shards and level of redundancy=1 (just one copy) cluster is 20 times slower then without logical replication. With asynchronous replication it is "only" two times slower. As far as I understand, the reason of such bad performance is that synchronous replication mechanism was originally developed for streaming replication, when all replicas have the same content and LSNs. When it is used for logical replication, it behaves very inefficiently. Commit has to wait confirmations from all receivers mentioned in "synchronous_standby_names" list. So we are waiting not only for our own single logical replication standby, but all other standbys as well. Number of synchronous standbyes is equal to number of shards divided by number of nodes. To provide uniform distribution number of shards should >> than number of nodes, for example for 10 nodes we usually create 100 shards. As a result we get awful performance and blocking of any replication channel blocks all backends. So my question is whether my understanding is correct and synchronous logical replication can not be efficiently used in such manner. If so, the next question is how difficult it will be to make synchronous replication mechanism for logical replication more efficient and are there some plans to work in this direction? This seems to be a question that is a) about a commercial project we don't know much about b) hasn't received a lot of investigation. Sorry, If I was not clear. The question was about logical replication mechanism in mainstream version of Postgres. I think that most of people are using asynchronous logical replication and synchronous LR is something exotic and not well tested and investigated. It will be great if I am wrong:) Concerning our sharded cluster (pg_shardman) - it is not a commercial product yet, it is in development phase. We are going to open its sources when it will be more or less stable. But unlike multimaster, this sharded cluster is mostly built from existed components: pg_pathman + postgres_fdw + logical replication. So we are just trying to combine them all into some integrated system. But currently the most obscure point is logical replication. And the main goal of my e-mail was to know the opinion of authors and users of LR whether it is good idea to use LR to provide fault tolerance in sharded cluster. Or some other approaches, for example sharding with redundancy or using streaming replication are preferable? -- Konstantin Knizhnik Postgres Professional: http://www.postgrespro.com The Russian Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Slow synchronous logical replication
Hi, On 2017-10-07 22:39:09 +0300, konstantin knizhnik wrote: > In our sharded cluster project we are trying to use logical relication for > providing HA (maintaining redundant shard copies). > Using asynchronous logical replication has not so much sense in context of > HA. This is why we try to use synchronous logical replication. > Unfortunately it shows very bad performance. With 50 shards and level of > redundancy=1 (just one copy) cluster is 20 times slower then without logical > replication. > With asynchronous replication it is "only" two times slower. > > As far as I understand, the reason of such bad performance is that > synchronous replication mechanism was originally developed for streaming > replication, when all replicas have the same content and LSNs. When it is > used for logical replication, it behaves very inefficiently. Commit has to > wait confirmations from all receivers mentioned in > "synchronous_standby_names" list. So we are waiting not only for our own > single logical replication standby, but all other standbys as well. Number of > synchronous standbyes is equal to number of shards divided by number of > nodes. To provide uniform distribution number of shards should >> than number > of nodes, for example for 10 nodes we usually create 100 shards. As a result > we get awful performance and blocking of any replication channel blocks all > backends. > > So my question is whether my understanding is correct and synchronous logical > replication can not be efficiently used in such manner. > If so, the next question is how difficult it will be to make synchronous > replication mechanism for logical replication more efficient and are there > some plans to work in this direction? This seems to be a question that is a) about a commercial project we don't know much about b) hasn't received a lot of investigation. Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Slow synchronous logical replication
In our sharded cluster project we are trying to use logical relication for providing HA (maintaining redundant shard copies). Using asynchronous logical replication has not so much sense in context of HA. This is why we try to use synchronous logical replication. Unfortunately it shows very bad performance. With 50 shards and level of redundancy=1 (just one copy) cluster is 20 times slower then without logical replication. With asynchronous replication it is "only" two times slower. As far as I understand, the reason of such bad performance is that synchronous replication mechanism was originally developed for streaming replication, when all replicas have the same content and LSNs. When it is used for logical replication, it behaves very inefficiently. Commit has to wait confirmations from all receivers mentioned in "synchronous_standby_names" list. So we are waiting not only for our own single logical replication standby, but all other standbys as well. Number of synchronous standbyes is equal to number of shards divided by number of nodes. To provide uniform distribution number of shards should >> than number of nodes, for example for 10 nodes we usually create 100 shards. As a result we get awful performance and blocking of any replication channel blocks all backends. So my question is whether my understanding is correct and synchronous logical replication can not be efficiently used in such manner. If so, the next question is how difficult it will be to make synchronous replication mechanism for logical replication more efficient and are there some plans to work in this direction? -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Help required to debug pg_repack breaking logical replication
Hello, we have been reported, and I have experienced a couple of times, pg_repack breaking logical replication. - https://github.com/reorg/pg_repack/issues/135 - https://github.com/2ndQuadrant/pglogical/issues/113 In my experience, after the botched run, the replication slot was "stuck", and any attempt of reading (including pg_logical_slot_peek_changes()) blocked until ctrl-c. I've tried replicating the issue but first attempts have failed to fail. In the above issue #113, Petr Jelinek commented: > From quick look at pg_repack, the way it does table rewrite is almost > guaranteed > to break logical decoding unless there is zero unconsumed changes for a given > table > as it does not build the necessary mappings info for logical decoding that > standard > heap rewrite in postgres does. unfortunately he didn't follow up to further details requests. I've started drilling down the problem, observing that the swap function, swap_heap_or_index_files() [1] was cargoculted by the original author from the CLUSTER command code as of PG 8.2 [2] (with a custom addition to update the relfrozenxid which seems backwards to me as it sets the older frozen xid on the new table [3]). [1] https://github.com/reorg/pg_repack/blob/ver_1.4.1/lib/repack.c#L1082 [2] https://github.com/postgres/postgres/blob/REL8_2_STABLE/src/backend/commands/cluster.c#L783 [3] https://github.com/reorg/pg_repack/issues/152 so that code is effectively missing a good 10 years of development. Before jumping into fast-forwarding it, I would like to ask for some help, i.e. - Is Petr diagnosis right and freezing of logical replication is to be blamed to missing mapping? - Can you suggest a test to reproduce the issue reliably? - What are mapped relations anyway? Thank you in advance for any help (either info about how to fix the issue properly, or a patch by someone who happens to really understand what we are talking about). -- Daniele -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Discussion on missing optimizations
Petr Jelinekwrites: > On 07/10/17 18:15, Tom Lane wrote: >> No, I'm afraid you didn't read that comment closely enough. This will >> flat out fail for cases like "select ... where x=x order by x", because >> there will already be a single-element EC for x and so the clause will >> just disappear entirely. If that doesn't happen, then instead you're >> creating an EC with duplicate entries, which is an idea I seriously >> dislike; the semantics of that don't seem clear at all to me. > Hmm it did not disappear (and worked fine in SQL level tests). I may not be correctly remembering what the query would need to look like for there to be single-element ECs in existence at this point; but I surely would not like this code to assume that none will exist till later. > I don't > think I fully understand the "EC with duplicate entries" part and what's > the problem with it so I'll defer to your wisdom there. Well, as one example, assume that we use your patch and consider what happens with where x = y and x = x vs where x = x and x = y In the first case we end up with an EC that's just {x,y}, because the second process_equivalence() will find both sides in the same EC and conclude that the second clause is redundant. (Which it is, if the equality operators have the same notion of what to do with nulls.) In the second case we end up with an EC containing {x,x,y}, which at minimum will result in emitting redundant generated quals. I'm not sure if it could have any worse consequences than that, but I'm not sure it couldn't either. But this is bogus in any case, because those WHERE clauses surely should produce identical results. Looking further ahead, if ECs have to support being multisets rather than pure sets, that would put a crimp in ever improving this logic to use a smarter UNION-FIND algorithm. (I've not yet seen queries where the number of EC members is large enough to make that a serious issue, but I think it could happen, particularly with inheritance/partitioning.) >> This passes the smell test for me in the sense of not adding any >> significant number of planner cycles except when the weird case occurs. >> It leaves something on the table in that there are some situations >> where X=X could be converted, but we won't do it because we don't see >> the clause as being a potential EC (because it's not at top level), >> as in the second new regression test case below. I think that's >> probably all right; I don't see any way to be more thorough without >> adding a lot of new cycles all the time, and I don't believe this is >> worth that. > My code had same issue. I think going deeper would require quite a bit > of new code (and cycles) for something that's even less likely to happen > than simple X=X while the current patch is quite cheap win. Yeah. I'm not really convinced it's a big win, but it's so cheap we might as well do it. The only case where it would expend cycles and fail to give an improvement is if we have X=X with a non-strict operator, which I think is a case that never arises in practice at present, because btree effectively assumes that all btree operators are strict. (Someday we might want to relax that, though, which is why I don't want to let the planner just assume it.) regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Discussion on missing optimizations
On 07/10/17 18:15, Tom Lane wrote: > Petr Jelinekwrites: >> On 07/10/17 04:19, Tom Lane wrote: >>> (edit: a few minutes later, I seem to remember that equivclass.c has >>> to do something special with the X=X case, so maybe it could do >>> something else special instead, with little new overhead.) > >> So I wrote prototype of achieving this optimization and it seems to be >> really quite simple code-wise (see attached). I did only a limited >> manual testing of this but I don't see any negative impact on planning time. > > No, I'm afraid you didn't read that comment closely enough. This will > flat out fail for cases like "select ... where x=x order by x", because > there will already be a single-element EC for x and so the clause will > just disappear entirely. If that doesn't happen, then instead you're > creating an EC with duplicate entries, which is an idea I seriously > dislike; the semantics of that don't seem clear at all to me. Hmm it did not disappear (and worked fine in SQL level tests). I don't think I fully understand the "EC with duplicate entries" part and what's the problem with it so I'll defer to your wisdom there. > What I was imagining was that having detected X=X, instead of "throwing > back" the clause as-is we could throw back an X IS NOT NULL clause, > along the lines of the attached. Right, I wanted to avoid messing with the incoming result info, but if we don't want to call the code bellow or write tons of code for this, I guess it's the best we can do. > This passes the smell test for me in the sense of not adding any > significant number of planner cycles except when the weird case occurs. > It leaves something on the table in that there are some situations > where X=X could be converted, but we won't do it because we don't see > the clause as being a potential EC (because it's not at top level), > as in the second new regression test case below. I think that's > probably all right; I don't see any way to be more thorough without > adding a lot of new cycles all the time, and I don't believe this is > worth that. > My code had same issue. I think going deeper would require quite a bit of new code (and cycles) for something that's even less likely to happen than simple X=X while the current patch is quite cheap win. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Issue with logical replication: MyPgXact->xmin already is valid
On 07/10/17 18:23, Konstantin Knizhnik wrote: > On 10/07/2017 04:26 PM, Petr Jelinek wrote: >> >> Hmm so you start transaction (you have to when running >> CREATE_REPLICATION_SLOT with USE_SNAPSHOT parameter). And while the slot >> is being created the config is reloaded. And since now you are in >> transaction the tsearch hook for GUC processing tries to access catalogs >> which sets the xmin for the transaction. > > Actually this slot is implicitly created by LogicalRepSyncTableStart to > perform initial data sync. > That does not change what I said above. >> >> That's not good, but I can't really say I have idea about what to do >> with it other than to set some kind of flag saying that logical decoding >> snapshot is being built and using that to skip catalog access which does >> not seem very pretty. >> > It is not quite clear from the comment why it is not possible to > overwrite MyPgXact->xmin or just use existed value. > We can't use existing value because the snapshot used to read data would no longer correspond to the one we just built as consistent start point. About overwriting, well we may be able to do that if there was no active snapshot in the transaction, ie we could try doing something like SnapshotResetXmin() or possibly rather InvalidateCatalogSnapshot() since the xmin was set for reading catalog snapshot before checking the MyPgXact->xmin. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] separate serial_schedule useful?
I wrote: > Robert Haaswrites: >> There's no reason why pg_regress couldn't have a >> --bail-if-group-size-exceeds=N argument, or why we couldn't have a >> separate Perl script to validate the schedule file as part of the >> build process. > I'd go for the former approach; seems like less new code and fewer cycles > used to enforce the rule. Concretely, how about the attached? (Obviously we'd have to fix parallel_schedule before committing this.) regards, tom lane diff --git a/src/test/regress/GNUmakefile b/src/test/regress/GNUmakefile index b923ea1..56cd202 100644 *** a/src/test/regress/GNUmakefile --- b/src/test/regress/GNUmakefile *** tablespace-setup: *** 124,130 ## Run tests ## ! REGRESS_OPTS = --dlpath=. $(EXTRA_REGRESS_OPTS) check: all tablespace-setup $(pg_regress_check) $(REGRESS_OPTS) --schedule=$(srcdir)/parallel_schedule $(MAXCONNOPT) $(EXTRA_TESTS) --- 124,130 ## Run tests ## ! REGRESS_OPTS = --dlpath=. --max-concurrent-tests=20 $(EXTRA_REGRESS_OPTS) check: all tablespace-setup $(pg_regress_check) $(REGRESS_OPTS) --schedule=$(srcdir)/parallel_schedule $(MAXCONNOPT) $(EXTRA_TESTS) diff --git a/src/test/regress/pg_regress.c b/src/test/regress/pg_regress.c index abb742b..ff979b8 100644 *** a/src/test/regress/pg_regress.c --- b/src/test/regress/pg_regress.c *** char *launcher = NULL; *** 78,83 --- 78,84 static _stringlist *loadlanguage = NULL; static _stringlist *loadextension = NULL; static int max_connections = 0; + static int max_concurrent_tests = 0; static char *encoding = NULL; static _stringlist *schedulelist = NULL; static _stringlist *extra_tests = NULL; *** run_schedule(const char *schedule, test_ *** 1691,1696 --- 1692,1704 wait_for_tests(pids, statuses, NULL, 1); /* status line is finished below */ } + else if (max_concurrent_tests > 0 && max_concurrent_tests < num_tests) + { + /* can't print scbuf here, it's already been trashed */ + fprintf(stderr, _("too many tests (more than %d) in schedule file \"%s\" line %d\n"), + max_concurrent_tests, schedule, line_num); + exit(2); + } else if (max_connections > 0 && max_connections < num_tests) { int oldest = 0; *** help(void) *** 1999,2004 --- 2007,2014 printf(_("tests; can appear multiple times\n")); printf(_(" --max-connections=N maximum number of concurrent connections\n")); printf(_("(default is 0, meaning unlimited)\n")); + printf(_(" --max-concurrent-tests=N maximum number of concurrent tests in schedule\n")); + printf(_("(default is 0, meaning unlimited)\n")); printf(_(" --outputdir=DIR place output files in DIR (default \".\")\n")); printf(_(" --schedule=FILE use test ordering schedule from FILE\n")); printf(_("(can be used multiple times to concatenate)\n")); *** regression_main(int argc, char *argv[], *** 2048,2053 --- 2058,2064 {"launcher", required_argument, NULL, 21}, {"load-extension", required_argument, NULL, 22}, {"config-auth", required_argument, NULL, 24}, + {"max-concurrent-tests", required_argument, NULL, 25}, {NULL, 0, NULL, 0} }; *** regression_main(int argc, char *argv[], *** 2161,2166 --- 2172,2180 case 24: config_auth_datadir = pg_strdup(optarg); break; + case 25: + max_concurrent_tests = atoi(optarg); + break; default: /* getopt_long already emitted a complaint */ fprintf(stderr, _("\nTry \"%s -h\" for more information.\n"), -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Issue with logical replication: MyPgXact->xmin already is valid
On 10/07/2017 04:26 PM, Petr Jelinek wrote: Hmm so you start transaction (you have to when running CREATE_REPLICATION_SLOT with USE_SNAPSHOT parameter). And while the slot is being created the config is reloaded. And since now you are in transaction the tsearch hook for GUC processing tries to access catalogs which sets the xmin for the transaction. Actually this slot is implicitly created by LogicalRepSyncTableStart to perform initial data sync. That's not good, but I can't really say I have idea about what to do with it other than to set some kind of flag saying that logical decoding snapshot is being built and using that to skip catalog access which does not seem very pretty. It is not quite clear from the comment why it is not possible to overwrite MyPgXact->xmin or just use existed value. -- Konstantin Knizhnik Postgres Professional: http://www.postgrespro.com The Russian Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Discussion on missing optimizations
David Rowleywrites: > It would be much nicer if you'd at least wait for benchmarks before > shooting. Benchmarks of what? We'd have to expend quite a bit of effort just to get to a place where we'd have something to benchmark. I do not think it's unreasonable of me to express an opinion that that would likely be wasted effort. If you disagree, and are willing to expend such effort speculatively, I'm not stopping you. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Discussion on missing optimizations
Petr Jelinekwrites: > On 07/10/17 04:19, Tom Lane wrote: >> (edit: a few minutes later, I seem to remember that equivclass.c has >> to do something special with the X=X case, so maybe it could do >> something else special instead, with little new overhead.) > So I wrote prototype of achieving this optimization and it seems to be > really quite simple code-wise (see attached). I did only a limited > manual testing of this but I don't see any negative impact on planning time. No, I'm afraid you didn't read that comment closely enough. This will flat out fail for cases like "select ... where x=x order by x", because there will already be a single-element EC for x and so the clause will just disappear entirely. If that doesn't happen, then instead you're creating an EC with duplicate entries, which is an idea I seriously dislike; the semantics of that don't seem clear at all to me. What I was imagining was that having detected X=X, instead of "throwing back" the clause as-is we could throw back an X IS NOT NULL clause, along the lines of the attached. This passes the smell test for me in the sense of not adding any significant number of planner cycles except when the weird case occurs. It leaves something on the table in that there are some situations where X=X could be converted, but we won't do it because we don't see the clause as being a potential EC (because it's not at top level), as in the second new regression test case below. I think that's probably all right; I don't see any way to be more thorough without adding a lot of new cycles all the time, and I don't believe this is worth that. regards, tom lane diff --git a/src/backend/optimizer/path/equivclass.c b/src/backend/optimizer/path/equivclass.c index 7997f50..a225414 100644 *** a/src/backend/optimizer/path/equivclass.c --- b/src/backend/optimizer/path/equivclass.c *** *** 27,32 --- 27,33 #include "optimizer/paths.h" #include "optimizer/planmain.h" #include "optimizer/prep.h" + #include "optimizer/restrictinfo.h" #include "optimizer/var.h" #include "utils/lsyscache.h" *** static bool reconsider_full_join_clause( *** 71,78 * any delay by an outer join, so its two sides can be considered equal * anywhere they are both computable; moreover that equality can be * extended transitively. Record this knowledge in the EquivalenceClass ! * data structure. Returns TRUE if successful, FALSE if not (in which ! * case caller should treat the clause as ordinary, not an equivalence). * * If below_outer_join is true, then the clause was found below the nullable * side of an outer join, so its sides might validly be both NULL rather than --- 72,85 * any delay by an outer join, so its two sides can be considered equal * anywhere they are both computable; moreover that equality can be * extended transitively. Record this knowledge in the EquivalenceClass ! * data structure, if applicable. Returns TRUE if successful, FALSE if not ! * (in which case caller should treat the clause as ordinary, not an ! * equivalence). ! * ! * In some cases, although we cannot convert a clause into EquivalenceClass ! * knowledge, we can still modify it to a more useful form than the original. ! * Then, *p_restrictinfo will be replaced by a new RestrictInfo, which is what ! * the caller should use for further processing. * * If below_outer_join is true, then the clause was found below the nullable * side of an outer join, so its sides might validly be both NULL rather than *** static bool reconsider_full_join_clause( *** 104,112 * memory context. */ bool ! process_equivalence(PlannerInfo *root, RestrictInfo *restrictinfo, bool below_outer_join) { Expr *clause = restrictinfo->clause; Oid opno, collation, --- 111,121 * memory context. */ bool ! process_equivalence(PlannerInfo *root, ! RestrictInfo **p_restrictinfo, bool below_outer_join) { + RestrictInfo *restrictinfo = *p_restrictinfo; Expr *clause = restrictinfo->clause; Oid opno, collation, *** process_equivalence(PlannerInfo *root, R *** 154,169 collation); /* ! * Reject clauses of the form X=X. These are not as redundant as they ! * might seem at first glance: assuming the operator is strict, this is ! * really an expensive way to write X IS NOT NULL. So we must not risk ! * just losing the clause, which would be possible if there is already a ! * single-element EquivalenceClass containing X. The case is not common ! * enough to be worth contorting the EC machinery for, so just reject the ! * clause and let it be processed as a normal restriction clause. */ if (equal(item1, item2)) ! return false; /* X=X is not a useful equivalence */ /* * If below outer join, check for
Re: [HACKERS] Proposal: Local indexes for partitioned table
07.10.17 16:34, Robert Haas wrote: On Fri, Oct 6, 2017 at 12:37 PM, Alvaro Herrerawrote: One thing I'm a bit worried about is how to name these subordinate indexes. They have to have names because that's how pg_class works, and those names can't all be the same, again because that's how pg_class works. There's no problem right away when you first create the partitioned index, because you can just pick names out of a hat using whatever name-generating algorithm seems best. However, when you dump-and-restore (including but not limited to the pg_upgrade case) you've got to preserve those names. If you just generate a new name that may or may not be the same as the old one, then it may collide with a user-specified name that only occurs later in the dump. Also, you'll have trouble if the user has applied a COMMENT or a SECURITY LABEL to the index because that command works by name, or if the user has a reference to the index name inside a function or whatever. These are pretty annoying corner-case bugs because they're not likely to come up very often. Most people won't notice or care if the index name changes. But I don't think it's acceptable to just ignore the problem. An idea I had was to treat the abstract index - to use your term - sort of the way we treat an extension. Normally, when you create an index on a partitioned table, it cascades, but for dump and restore purpose, we tag on some syntax that says "well, don't actually create the subordinate indexes, i'll tell you about those later". Then for each subordinate index we issue a separate CREATE INDEX command followed by ALTER INDEX abstract_index ATTACH PARTITION concrete_index or something of that sort. That means you can't absolutely count on the parent index to have all of the children it's supposed to have but maybe that's OK. AFAICS, the main problem with naming is generating new unique names for subordinate indexes on the stage of migrating data scheme (pg_dump, pg_upgrade, etc). And we cannot specify these names in the 'CREATE INDEX partitioned_index' statement therefore we have to regenerate their. In this case I propose to restore index names' hierarchy *bottom-up*, i.e. first of all create indexes for the leaf partitions and then create ones for parents up to root explicitly specifying names. When creating index on parent table we have to check is there exist any index on child table that could be child index (identical criteria). If so, not generate new index but implicitly attach that index into parent one. If we have incomplete index hierarchy, e.g. we dropped some indexes of partitions previously, then recreating of parent's index would regenerate (not attach) indexes for those partitions. We could drop those odd generated indexes after building of parent's index. This decision is not straightforward but provides to consider 'CREATE INDEX paritioned_table' statement as a cascade operation. As a result, we can specify name for each concrete index while recreating a whole hierarchy. -- Regards, Maksim Milyutin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Discussion on missing optimizations
On 7 October 2017 at 15:19, Tom Lanewrote: >> 9. Unneeded Self JOIN > >> Can't remember discussions of this. > > I can't get very excited about that one either. > > In the end, what the article fails to consider is that all of these are > tradeoffs, not unalloyed goods. If you spend planner cycles on every > query to look for cases that only the most unabashedly brain-dead ORMs > ever generate, you're not really doing your users a favor on balance. I think that final sentence lacks imagination. I've seen plenty of views being called where some column is unavailable, but the caller joins the very same table again on the primary key to add the column. There was no brain-dead ORM involved, just perhaps questionable design. This was very common in my last job where we had some rats nest of views several layers deep, the core of which often had some complex logic that nobody dared to try and replicate. It would be fairly cheap to check if any of the rtekind==RTE_RELATION joinlist items have above 1 RangeTblEntry with the same relid. The joinlist is never that big anyway, and if it was the join search would be slow. The more expensive part would be to build the join clauses, check if the expressions on either side of all OpExpr matches and that nothing else will cause a non-match, then perform the uniqueness check on those OpExpr operands. We do have some infrastructure to do the unique checks. Likely the slowdown in planning would be just for cases with a legitimately useful self-join, I doubt checking for a duplicate RangeTblEntry->relid would cause much of a concern. Anyway, this is giving me some feeling of Deja vu.. Perhaps we need some pg_stats view that shows us planning time and execution time so that we can get a better idea on how much these things matter in the average case. We tend to never fare so well in these sorts of comparisons with commercial databases. It's not hard to imagine someone with migration plans loading some rats nest of views into Postgres and taking it for a spin and finding performance is not quite what they need. It's a bit sad that often the people with the loudest voices are always so fast to stomp on the ideas for improvements. It would be much nicer if you'd at least wait for benchmarks before shooting. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] separate serial_schedule useful?
Robert Haaswrites: > On Fri, Oct 6, 2017 at 4:16 PM, Tom Lane wrote: >> The other routine mistake, which I see Robert just made again, >> is to break the at-most-twenty-parallel-tests-at-once convention. >> I wonder if we can get in some sort of automated check for that. > There's no reason why pg_regress couldn't have a > --bail-if-group-size-exceeds=N argument, or why we couldn't have a > separate Perl script to validate the schedule file as part of the > build process. I'd go for the former approach; seems like less new code and fewer cycles used to enforce the rule. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] parallelize queries containing initplans
On Fri, Oct 6, 2017 at 7:08 AM, Amit Kapilawrote: > I have fixed the other review comment related to using safe_param_list > in the attached patch. I think I have fixed all comments given by > you, but let me know if I have missed anything or you have any other > comment. -Param *param = (Param *) node; +if (list_member_int(context->safe_param_ids, ((Param *) node)->paramid)) +return false; -if (param->paramkind != PARAM_EXEC || -!list_member_int(context->safe_param_ids, param->paramid)) -{ -if (max_parallel_hazard_test(PROPARALLEL_RESTRICTED, context)) -return true; -} -return false;/* nothing to recurse to */ +if (max_parallel_hazard_test(PROPARALLEL_RESTRICTED, context)) +return true; I think your revised logic is wrong here because it drops the paramkind test, and I don't see any justification for that. IIUC, param IDs are tracked separately for extern and exec parameters, so why would we ignore that distinction here? But I'm also wondering if we're missing a trick here, because isn't a PARAM_EXTERN parameter always safe, given SerializeParamList? If so, this || ought to be &&, but if that adjustment is needed, it seems like a separate patch. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposal: Local indexes for partitioned table
On Fri, Oct 6, 2017 at 12:37 PM, Alvaro Herrerawrote: > 2. create one index for each existing partition. These would be >identical to what would happen if you created the index directly on >each partition, except that there is an additional dependency to the >parent's abstract index. One thing I'm a bit worried about is how to name these subordinate indexes. They have to have names because that's how pg_class works, and those names can't all be the same, again because that's how pg_class works. There's no problem right away when you first create the partitioned index, because you can just pick names out of a hat using whatever name-generating algorithm seems best. However, when you dump-and-restore (including but not limited to the pg_upgrade case) you've got to preserve those names. If you just generate a new name that may or may not be the same as the old one, then it may collide with a user-specified name that only occurs later in the dump. Also, you'll have trouble if the user has applied a COMMENT or a SECURITY LABEL to the index because that command works by name, or if the user has a reference to the index name inside a function or whatever. These are pretty annoying corner-case bugs because they're not likely to come up very often. Most people won't notice or care if the index name changes. But I don't think it's acceptable to just ignore the problem. An idea I had was to treat the abstract index - to use your term - sort of the way we treat an extension. Normally, when you create an index on a partitioned table, it cascades, but for dump and restore purpose, we tag on some syntax that says "well, don't actually create the subordinate indexes, i'll tell you about those later". Then for each subordinate index we issue a separate CREATE INDEX command followed by ALTER INDEX abstract_index ATTACH PARTITION concrete_index or something of that sort. That means you can't absolutely count on the parent index to have all of the children it's supposed to have but maybe that's OK. Another thing that would let you do is CREATE INDEX CONCURRENTLY replacement_concrete_index; ALTER INDEX abstract_index DETACH PARTITION old_concrete_index, ATTACH PARTITION replacement_concrete_index; DROP INDEX CONCURRENTLY old_concrete_index, which seems like a thing someone might want to do. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] On markers of changed data
Alvaro, Michael, * Alvaro Herrera (alvhe...@alvh.no-ip.org) wrote: > Michael Paquier wrote: > > That’s actually what pg_rman is doing for what it calls incremental > > backups (perhaps that would be differential backup in PG > > terminology?), and the performance is bad as you can imagine. We could > > have a dedicated LSN map to do such things with 4 bytes per page. I am > > still not convinced that this much facility and the potential bug > > risks are worth it though, Postgres already knows about differential > > backups if you shape it as a delta of WAL segments. I think that, in > > order to find a LSN map more convincing, we should find first other > > use cases where it could become useful. Some use cases may pop up with > > VACUUM, but I have not studied the question hard enough... > > The case I've discussed with barman developers is a large database > (couple dozen of TBs should be enough) where a large fraction (say 95%) > is read-only but there are many changes to the active part of the data, > so that WAL is more massive than size of active data. Yes, we've seen environments like that also. Thanks! Stephen signature.asc Description: Digital signature
Re: [HACKERS] Issue with logical replication: MyPgXact->xmin already is valid
On 06/10/17 16:46, Konstantin Knizhnik wrote: > > > On 06.10.2017 15:29, Petr Jelinek wrote: >> On 06/10/17 12:16, Konstantin Knizhnik wrote: >>> When creating logical replication slots we quite often get the following >>> error: >>> >>> ERROR: cannot build an initial slot snapshot when MyPgXact->xmin >>> already is valid >>> >>> which cause restart of WAL sender. >>> The comment to this line doesn't clarify much: >>> >>> /* so we don't overwrite the existing value */ >>> if (TransactionIdIsValid(MyPgXact->xmin)) >>> elog(ERROR, "cannot build an initial slot snapshot when >>> MyPgXact->xmin already is valid"); >>> I wonder if it is normal situation or something goes wrong? >>> >> Hi, >> >> no it's not normal situation, it seems you are doing something that >> assigns xid before you run the CREATE_REPLICATION_SLOT command on that >> connection. > > I have not doing something in this connection: it is wal sender > executing CREATE_REPLICATION_SLOT replication command. > Please look at the stack in my original e-mail. It shows who and when is > setting MyPgXact->xmin. > It is GetSnapshotData called because of reloading configuration settings. > And configuration setting are reloaded because our application is > updating "synchronous_standby_names". > > Hmm so you start transaction (you have to when running CREATE_REPLICATION_SLOT with USE_SNAPSHOT parameter). And while the slot is being created the config is reloaded. And since now you are in transaction the tsearch hook for GUC processing tries to access catalogs which sets the xmin for the transaction. That's not good, but I can't really say I have idea about what to do with it other than to set some kind of flag saying that logical decoding snapshot is being built and using that to skip catalog access which does not seem very pretty. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Discussion on missing optimizations
On 07/10/17 04:19, Tom Lane wrote: > Andres Freundwrites: >> On 2017-10-06 21:33:16 -0400, Adam Brusselback wrote: >>> The article in question is here: >>> https://blog.jooq.org/2017/09/28/10-cool-sql-optimisations-that-do-not-depend-on-the-cost-model/ > >> That's interesting. > > The impression I have in a quick scan is that probably hardly any of these > are cases that any of the DB designers think are important in themselves. > Rather, they fall out of more general optimization attempts, or not, > depending on the optimization mechanisms in use in a particular DB. > For example, reducing "WHERE 1=1" to "WHERE TRUE" and then to nothing > comes out of a constant-subexpression-precalculation mechanism for us, > whereas "WHERE column=column" doesn't fall to that approach. ISTM it > would be really dumb to expend planner cycles looking specifically for > that case, so I guess that DB2 et al are finding it as a side-effect of > some more general optimization ... I wonder what that is? > > (edit: a few minutes later, I seem to remember that equivclass.c has > to do something special with the X=X case, so maybe it could do > something else special instead, with little new overhead.) > What it actually does is to specifically skip the processing for X=X (the const expression will be simplified by estimate_expression_value/eval_const_expressions separately). There is comment there that specifically states that it's not worth it to process this as it's rare clause which is equal to X IS NOT NULL. I don't actually agree with the argument of the comment there, since in practice the if the "silly" equality is not there, we'll just waste equal() call and if it is there the optimization seems worth it as it will lead to orders of magnitude better estimation in many cases. So I wrote prototype of achieving this optimization and it seems to be really quite simple code-wise (see attached). I did only a limited manual testing of this but I don't see any negative impact on planning time. Thoughts? -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services From a7d6f3fe0a5d42a96c711cc33e6962b9a2f6511f Mon Sep 17 00:00:00 2001 From: Petr Jelinek Date: Sat, 7 Oct 2017 15:10:54 +0200 Subject: [PATCH] Transform X=X expressions into X IS NOT NULL --- src/backend/optimizer/path/equivclass.c | 12 src/backend/optimizer/plan/initsplan.c | 11 +++ 2 files changed, 11 insertions(+), 12 deletions(-) diff --git a/src/backend/optimizer/path/equivclass.c b/src/backend/optimizer/path/equivclass.c index 7997f50c18..d4ffdf66f2 100644 --- a/src/backend/optimizer/path/equivclass.c +++ b/src/backend/optimizer/path/equivclass.c @@ -154,18 +154,6 @@ process_equivalence(PlannerInfo *root, RestrictInfo *restrictinfo, collation); /* - * Reject clauses of the form X=X. These are not as redundant as they - * might seem at first glance: assuming the operator is strict, this is - * really an expensive way to write X IS NOT NULL. So we must not risk - * just losing the clause, which would be possible if there is already a - * single-element EquivalenceClass containing X. The case is not common - * enough to be worth contorting the EC machinery for, so just reject the - * clause and let it be processed as a normal restriction clause. - */ - if (equal(item1, item2)) - return false; /* X=X is not a useful equivalence */ - - /* * If below outer join, check for strictness, else reject. */ if (below_outer_join) diff --git a/src/backend/optimizer/plan/initsplan.c b/src/backend/optimizer/plan/initsplan.c index 9931dddba4..1c3672a978 100644 --- a/src/backend/optimizer/plan/initsplan.c +++ b/src/backend/optimizer/plan/initsplan.c @@ -2347,6 +2347,17 @@ process_implied_equality(PlannerInfo *root, return; } } + else if (equal(item1, item2)) + { + NullTest *ntest; + ntest = makeNode(NullTest); + ntest->arg = item1; + ntest->nulltesttype = IS_NOT_NULL; + ntest->argisrow = false; + ntest->location = exprLocation((Node *) clause); + + clause = (Expr *) ntest; + } /* * Push the new clause into all the appropriate restrictinfo lists. -- 2.11.0 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] separate serial_schedule useful?
On Fri, Oct 6, 2017 at 4:16 PM, Tom Lanewrote: > The other routine mistake, which I see Robert just made again, > is to break the at-most-twenty-parallel-tests-at-once convention. > I wonder if we can get in some sort of automated check for that. Argh. We can argue about whether that's my mistake or Ashutosh's mistake, but I do try to catch these things. It's just that there are so many rules that require a committer to (a) know the rule and (b) remember to enforce the rule that it's really easy to miss one. And I do know that rule, but it slipped my mind in the course of trying to make sure that we'd covered all the bases in terms of the feature itself. There's no reason why pg_regress couldn't have a --bail-if-group-size-exceeds=N argument, or why we couldn't have a separate Perl script to validate the schedule file as part of the build process. I feel like the need to manually enforce so many tedious coding rules is a real limiting factor on our ability to (a) involve new people in the project and (b) get their work committed. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [POC] hash partitioning
On Fri, Oct 6, 2017 at 5:35 PM, Jesper Pedersenwrote: > Hi Amul, > > Could you rebase on latest master ? > Sure will post that soon, but before that, I need to test hash partitioning with recent partition-wise join commit (f49842d1ee), thanks. Regards, Amul -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Predicate Locks for writes?
SERIALIZABLE looks for chains of rw cases. When we perform UPDATEs and DELETEs we search for rows and then modify them. The current implementation views that as a read followed by a write because we issue PredicateLockTuple() during the index fetch. Is it correct that a statement that only changes data will add predicate locks for the rows that it modifies? PredicateLockTuple() specifically avoids adding an SIRead lock if the tuple already has a write lock on it, so surely it must also be correct to skip the SIRead lock if we are just about to update the row? I am suggesting that we ignore PredicateLockTuple() for cases where we are about to update or delete the found tuple. ISTM that a Before Row Trigger would need to add an SIRead lock since that is clearly a read. Thoughts? -- Simon Riggshttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] parallel worker (PID ) exited with exit code 1
Michael Paquier writes: > On Fri, Oct 6, 2017 at 9:19 PM, tusharwrote: >> ERROR: recovery is not in progress > > Perhaps there is a way to blacklist some functions depending on the > server context. This question may be better asked directly where the > project is maintained then: https://github.com/anse1/sqlsmith. I am > adding as well Andreas in CC, he works on sqlsmith. Blacklisting when testing with sqlsmith typically happens on the error logging side: Logging into a database via --log-to with the schema shipped with sqlsmith filters out boring error messages with triggers. The ERROR reported is actually in the filter tables that ship with sqlsmith. I don't think adding such product-specific filtering on the generating side is necessarily the right thing, as it would reduces code coverage while testing. We might miss a core dump when something becomes buggy in that path. I did make an exception for syntax errors: sqlsmith notices when grammar productions consistently lead to errors. This feature was added to allow testing other products or older postgres, so sqlsmith doesn't waste time generating upsert statements on 9.1. It's more apparent when testing sqlite3: The error rate is 90% on startup, but shrinks to 2% or so after a couple thousand statements have been generated. regards, Andreas -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] On markers of changed data
Michael Paquier wrote: > That’s actually what pg_rman is doing for what it calls incremental > backups (perhaps that would be differential backup in PG > terminology?), and the performance is bad as you can imagine. We could > have a dedicated LSN map to do such things with 4 bytes per page. I am > still not convinced that this much facility and the potential bug > risks are worth it though, Postgres already knows about differential > backups if you shape it as a delta of WAL segments. I think that, in > order to find a LSN map more convincing, we should find first other > use cases where it could become useful. Some use cases may pop up with > VACUUM, but I have not studied the question hard enough... The case I've discussed with barman developers is a large database (couple dozen of TBs should be enough) where a large fraction (say 95%) is read-only but there are many changes to the active part of the data, so that WAL is more massive than size of active data. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple
Peter Geoghegan wrote: > On Fri, Oct 6, 2017 at 2:09 PM, Wong, Yi Wenwrote: > > Yesterday, I've been spending time with pg_visibility on the pages when I > > reproduce the issue in 9.6. > > None of the all-frozen or all-visible bits are necessarily set in > > problematic pages. > > Since this happened yesterday, I assume it was with an unfixed version? > > As you must have seen, Alvaro said he has a variant of Dan's original > script that demonstrates that a problem remains, at least on 9.6+, > even with today's fix. I think it's the stress-test that plays with > fillfactor, many clients, etc [1]. I just execute setup.sql once and then run this shell command, while :; do psql -e -P pager=off -f ./repro.sql for i in `seq 1 5`; do psql -P pager=off -e --no-psqlrc -f ./lock.sql & done wait && psql -P pager=off -e --no-psqlrc -f ./reindex.sql psql -P pager=off -e --no-psqlrc -f ./report.sql echo "done" done Note that you need to use pg10's psql because of the \if lines in lock.sql. For some runs I change the values to compare random() to, and originally the commented out section in lock.sql was not commented out, but I'm fairly sure the failures I saw where with this version. Also, I sometime change the 5 in the `seq` command to higher values (180, 250). I didn't find the filler column to have any effect, so I took that out. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services begin; values ('for key share') /*, ('') */ order by random() limit 1 \gset select pg_sleep(random() * 0.03); select id from t where id=3 :column1 ; select random() > .0 as update \gset \if :update select pg_sleep(random() * 0.03); update t set x=x+1 where id=3; \endif /* select random() > .0 as update2 \gset \if :update2 savepoint foo; update t set x=x+1 where id=3; \endif select random() > .5 as rollback_to \gset \if :rollback_to rollback to foo; \endif */ select random() > .0 as commit \gset \if :commit commit; \else rollback; \endif select pg_sleep(random() * 0.03); vacuum freeze t; reindex index t_pkey; with pages (blkno) as (select generate_series::int from generate_series(0, pg_relation_size('t')/current_setting('block_size')::int - 1)), rawpages (blkno, pg) as (select blkno, get_raw_page from pages, get_raw_page('t', blkno)), heapitems as (select blkno, heap_page_items.* from rawpages, heap_page_items(pg)) select blkno, lp, lp_flags, lp_off, t_xmin, t_xmax, t_ctid, infomask(t_infomask, 1) as infomask, infomask(t_infomask2, 2) as infomask2 from heapitems where lp_off <> 0 drop table t; create table t (id int primary key, name char(3), x integer) -- with (fillfactor = 10) ; insert into t values (1, '111', 0); insert into t values (3, '333', 0); create type infomask_bit_desc as (mask varbit, symbol text); create or replace function infomask(msk int, which int) returns text language plpgsql as $$ declare r infomask_bit_desc; str text = ''; append_bar bool = false; begin for r in select * from infomask_bits(which) loop if (msk::bit(16) & r.mask)::int <> 0 then if append_bar then str = str || '|'; end if; append_bar = true; str = str || r.symbol; end if; end loop; return str; end; $$ ; create or replace function infomask_bits(which int) returns setof infomask_bit_desc language plpgsql as $$ begin if which = 1 then return query values (x'8000'::varbit, 'MOVED_IN'), (x'4000', 'MOVED_OFF'), (x'2000', 'UPDATED'), (x'1000', 'XMAX_IS_MULTI'), (x'0800', 'XMAX_INVALID'), (x'0400', 'XMAX_COMMITTED'), (x'0200', 'XMIN_INVALID'), (x'0100', 'XMIN_COMMITTED'), (x'0080', 'XMAX_LOCK_ONLY'), (x'0040', 'EXCL_LOCK'), (x'0020', 'COMBOCID'), (x'0010', 'XMAX_KEYSHR_LOCK'), (x'0008', 'HASOID'), (x'0004', 'HASEXTERNAL'), (x'0002', 'HASVARWIDTH'), (x'0001', 'HASNULL'); elsif which = 2 then return query values (x'2000'::varbit, 'UPDATE_KEY_REVOKED'), (x'4000', 'HOT_UPDATED'), (x'8000', 'HEAP_ONLY_TUPLE'); end if; end; $$; create extension if not exists pageinspect; -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] On markers of changed data
> Le 6 oct. 2017 à 23:44, Alvaro Herreraa écrit : > > Michael Paquier wrote: > >> The only sane method for Postgres is really to scan the >> page header LSNs, and of course you already know that. > > I hope the idea is not to have to scan every single page in the > database, because that would be too slow. It should be possible to > build this so that a single summary LSN is kept for a largish group of > pages, allowing large portions of the database to be skipped from even > being read if they are known not to contain any page newer than the > previous backup. That’s actually what pg_rman is doing for what it calls incremental backups (perhaps that would be differential backup in PG terminology?), and the performance is bad as you can imagine. We could have a dedicated LSN map to do such things with 4 bytes per page. I am still not convinced that this much facility and the potential bug risks are worth it though, Postgres already knows about differential backups if you shape it as a delta of WAL segments. I think that, in order to find a LSN map more convincing, we should find first other use cases where it could become useful. Some use cases may pop up with VACUUM, but I have not studied the question hard enough... -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers