Re: [HACKERS] [sqlsmith] crash in RestoreLibraryState during low-memory testing
Amit Kapilawrites: > Any other ideas? Given that the crash is so far down inside __dlopen(), and that there's a clear reference to the string we presumably passed to that: #11 0x7f518485e489 in _dl_open (file=0x55b692f2d2b0 "/home/smith/postgres/inst/master/lib/pgcrypto.so", mode=-2147483390, caller_dlopen=0x55b691cb4c7e < I don't actually believe that this is Postgres' fault. I suspect that what we're looking at here is a low-memory bug in dlopen itself, probably something strdup'ing an input string and forgetting to check for a null result. Presumably somebody could dig into the libc source code and prove or disprove this, though it would sure help to know exactly what platform and version Andreas is testing on. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Transactions involving multiple postgres foreign servers
On Fri, Sep 29, 2017 at 9:12 PM, Robert Haaswrote: > > It's possible that we might find that neither of the above approaches > are practical and that the performance benefits of resolving the > transaction from the original connection are large enough that we want > to try to make it work anyhow. However, I think we can postpone that > work to a future time. Any general solution to this problem at least > needs to be ABLE to resolve transactions at a later time from a > different session, so let's get that working first, and then see what > else we want to do. > +1. -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [sqlsmith] crash in RestoreLibraryState during low-memory testing
On Tue, Oct 3, 2017 at 8:31 AM, Amit Kapilawrote: > On Tue, Oct 3, 2017 at 3:04 AM, Andreas Seltenreich > wrote: >> Hi, >> >> doing low-memory testing with REL_10_STABLE at 1f19550a87 also produced >> a couple of parallel worker core dumps with the backtrace below. >> Although most of the backtrace is inside the dynamic linker, it looks >> like it was passed a pointer to gone-away shared memory. >> > > It appears to be some dangling pointer, but not sure how it is > possible. Can you provide some more details, like do you have any > other library which you want to get loaded in the backend (like by > using shared_preload_libraries or by some other way)? I think without > that we shouldn't try to load anything in the parallel worker. > Another possibility could be that the memory for library space has been overwritten either in master backend or in worker backend. I think that is possible in low-memory conditions if in someplace we try to write in the memory without ensuring if space is allocated. I have browsed the nearby code and didn't find any such instance. One idea to narrow down the problem is to see if the other members in worker backend are sane, for ex. can you try printing the value of MyFixedParallelState as we get that value from shared memory similar to libraryspace. It seems from call stack that the memory of libraryspace is corrupted, so we can move the call to lookup/RestoreLibraryState immediately after we assign MyFixedParallelState. I think if after this also the memory for libraryspace is corrupted, then probably something bad has happened in master backend. Any other ideas? -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PoC: full merge join on comparison clause
On Thu, Sep 28, 2017 at 8:57 PM, Alexander Kuzmenkovwrote: > Hi Ashutosh, > > Thanks for the review. > > Jeff, I'm copying you because this is relevant to our discussion about what > to do with mergeopfamilies when adding new merge join types. > > You have renamed RestrictInfo member mergeopfamilies as > equivopfamilies. I don't think that's a good name; it doesn't convey > that these are opfamilies containing merge operators. The changes in > check_mergejoinable() suggest that an operator may act as equality > operator in few operator families and comparison operator in others. > That looks odd. Actually an operator family contains operators other > than equality operators, so you may want to retain this member and add > a new member to specify whether the clause is an equality clause or > not. > > > For mergeopfamilies, I'm not sure what is the best thing to do. I'll try to > explain my understanding of the situation, please correct me if I'm wrong. > > Before the patch, mergeopfamilies was used for two things: creating > equivalence classes and performing merge joins. > > For equivalence classes: we look at the restriction clauses, and if they > have mergeopfamilies set, it means that these clause are based on an > equality operator, and the left and right variables must be equal. To record > this fact, we create an equivalence class. The variables might be equal for > one equality operator and not equal for another, so we record the particular > operator families to which our equality operator belongs. > > For merge join: we look at the join clauses, and if they have > mergeopfamilies set, it means that these clauses are based on an equality > operator, and we can try performing this particular join as merge join. > These opfamilies are also used beforehand to create the equivalence classes > for left and right variables. The equivalence classes are used to match the > join clauses to pathkeys describing the ordering of join inputs. > > So, if we want to start doing merge joins for operators other than equality, > we still need to record their opfamilies, but only use them for the second > case and not the first. I chose to put these opfamilies to different > variables, and > name the one used for equivalence classes 'equivopfamilies' and the one used > for merge joins 'mergeopfamilies'. The equality operators are used for both > cases, so we put their opfamilies into both of these variables. > > I agree this might look confusing. Indeed, we could keep a single variable > for opfamilies, and add separate flags that show how they can be used, be > that for equivalence classes, merge joins, range joins or some combination > of them. This is similar to what Jeff did in his range merge join patch [1]. > I will think more about this and try to produce an updated patch. > I think we have (ab?)used mergeopfamilies to indicate equality condition, which needs some changes. May be these two patches are where we can do those changes. > > In mergejoinscansel() you have just removed Assert(op_strategy == > BTEqualStrategyNumber); Probably this function is written considering > on equality operators. But now that we are using this for all other > operators, we will need more changes to this function. That may be the > reason why INNER join in your earlier example doesn't choose right > costing. > > > I changed mergejoinscansel() slightly to reflect the fact that the inner > relation is scanned from the beginning if we have an inequality merge > clause. > > > The comment change in final_cost_mergejoin() needs more work. n1, n2, > n3 are number of rows on inner side with values 1, 2, 3 resp. So n1 + > n2 + n3 + ... = size of inner relation is correct. In that context I > am not able to understand your change > +* If the merge clauses contain inequality, (n1 + n2 + ...) ~= > +* (size of inner relation)^2. > > > I extended the comment in final_cost_mergejoin(). Not sure if that > approximation makes any sense, but this is the best I could think of. > > Style problems are fixed. > > Attached please find the new version of the patch that addresses all the > review comments except mergeopfamilies. > > The current commitfest is ending, but I'd like to continue working on this > patch, so I am moving it to the next one. > > Thanks for working on the comments. I am interested to continue reviewing it in the next commitfest. -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Conversion error
> Tatsuo Ishiiwrites: >> I saw this while restoring 9.6 database to 10.0 database. >> \connect: FATAL: conversion between UTF8 and MULE_INTERNAL is not supported >> Is this expected? I saw nothing releated to this in the release notes. > > Don't think that's ever been supported. Sorry, that was my misunderstanding. All versions of PostgreSQL allow to create MULE_INTERNAL database while executing restore, but after that it fails while loading rows into the mule internal database. So, if there's no table in the database, the restore succeeds. That's why I got wrong expressions. Best regards, -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese:http://www.sraoss.co.jp -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Possible SSL improvements for a newcomer to tackle
On Tue, Oct 3, 2017 at 1:15 PM, Zeus Kronionwrote: > I previously made one minuscule contribution to the project two years ago. > I'm interested in doing some more, and I'm trying to figure out what to > focus on. Two SSL-related projects caught my attention: > 1) Allow automatic selection of SSL client certificates from a certificate > store (https://www.postgresql.org/message-id/8766.1241799...@sss.pgh.pa.us). > It seems relatively straightforward to support an additional file format for > key-value pairs in postgresql.crt/.key, and I think this is something I > could take on if it's still desired. > 2) I was surprised to learn the following from the docs: One other thing that could be improved, and that has been already asked for is improvement for passphrase handling, particularly since SSL parameters can be reloaded, by adding for example a new GUC parameter that calls a shell command which outputs what is wanted to stdout. It could be tricky to implement as the postmaster should be able to handle requests when launching the command. But I think you get the idea. >> By default, PostgreSQL will not perform any verification of the server >> certificate. This means that it is possible to spoof the server identity >> (for example by modifying a DNS record or by taking over the server IP >> address) without the client knowing. In order to prevent spoofing, SSL >> certificate verification must be used. > > Is there a technical reason to perform no verification by default? Wouldn't > a safer default be desirable? It would be nice to get into a stronger default with "require" at least, the recommendation is to use at least "verify-ca" for any serious deployment. Note that not long ago there were arguments about how the default value of sslmode called 'prefer' is good at giving a false sense of security, but this led nowhere (can't put my hands on this thread now..). -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Possible SSL improvements for a newcomer to tackle
Zeus Kronionwrites: > 2) I was surprised to learn the following from the docs: >> By default, PostgreSQL will not perform any verification of the server >> certificate. > Is there a technical reason to perform no verification by default? Wouldn't > a safer default be desirable? I'm not an SSL expert, so insert appropriate grain of salt, but AIUI the question is what are you going to verify against? You need some local notion of which are your trusted root certificates before you can verify anything. So to default to verification would be to default to failing to connect at all until user has created a ~/.postgresql/root.crt file with valid, relevant entries. That seems like a nonstarter. It's possible that we could adopt some policy like "if the root.crt file exists then default to verify" ... but that seems messy and unreliable, so I'm not sure it would really add any security. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Conversion error
Tatsuo Ishiiwrites: > I saw this while restoring 9.6 database to 10.0 database. > \connect: FATAL: conversion between UTF8 and MULE_INTERNAL is not supported > Is this expected? I saw nothing releated to this in the release notes. Don't think that's ever been supported. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Conversion error
> I saw this while restoring 9.6 database to 10.0 database. > > \connect: FATAL: conversion between UTF8 and MULE_INTERNAL is not supported > > Is this expected? I saw nothing releated to this in the release notes. This had been allowed in 9.6. So I think 10.0 silently drops the feature. Best regards, -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese:http://www.sraoss.co.jp -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Logging idle checkpoints
At Tue, 3 Oct 2017 10:23:08 +0900, Michael Paquierwrote in > On Tue, Oct 3, 2017 at 12:01 AM, Stephen Frost wrote: > > I certainly don't care for the idea of adding log messages saying we > > aren't doing anything just to match a count that's incorrectly claiming > > that checkpoints are happening when they aren't. > > > > The down-thread suggestion of keeping track of skipped checkpoints might > > be interesting, but I'm not entirely convinced it really is. We have > > time to debate that, of course, but I don't really see how that's > > helpful. At the moment, it seems like the suggestion to add that column > > is based on the assumption that we're going to start logging skipped > > checkpoints and having that column would allow us to match up the count > > between the new column and the "skipped checkpoint" messages in the logs > > and I can not help but feel that this is a ridiculous amount of effort > > being put into the analysis of something that *didn't* happen. > > Being able to look at how many checkpoints are skipped can be used as > a tuning indicator of max_wal_size and checkpoint_timeout, or in short > increase them if those remain idle. We ususally adjust the GUCs based on how often checkpoint is *executed* and how many of the executed checkpoints have been triggered by xlog progress (or with shorter interval than timeout). It seems enough. Counting skipped checkpoints gives just a rough estimate of how long the system was getting no substantial updates. I doubt that users get something valuable by counting skipped checkpoints. > Since their introduction in > 335feca4, m_timed_checkpoints and m_requested_checkpoints track the > number of checkpoint requests, not if a checkpoint has been actually > executed or not, I am not sure that this should be changed after 10 > years. So, to put it in other words, wouldn't we want a way to track > checkpoints that are *executed*, meaning that we could increment a > counter after doing the skip checks in CreateRestartPoint() and > CreateCheckPoint()? This sounds reasonable to me. CreateRestartPoint() is already returning ckpt_performed, it is used to let checkpointer retry in 15 seconds rather than waiting the next checkpoint_timeout. Checkpoint might deserve the same treatment on skipping. By the way RestartCheckPoint emits DEBUG2 messages on skipping. Although restartpoint has different characteristics from checkpoint, if we change the message level for CreateCheckPoint (currently DEBUG1), CreateRestartPoint might should get the same change. (Elsewise at least they ought to have the same message level?) regards, -- Kyotaro Horiguchi NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Possible SSL improvements for a newcomer to tackle
I previously made one minuscule contribution to the project two years ago. I'm interested in doing some more, and I'm trying to figure out what to focus on. Two SSL-related projects caught my attention: 1) Allow automatic selection of SSL client certificates from a certificate store (https://www.postgresql.org/message-id/8766.1241799...@sss.pgh.pa.us). It seems relatively straightforward to support an additional file format for key-value pairs in postgresql.crt/.key, and I think this is something I could take on if it's still desired. 2) I was surprised to learn the following from the docs: > By default, PostgreSQL will not perform any verification of the server certificate. This means that it is possible to spoof the server identity (for example by modifying a DNS record or by taking over the server IP address) without the client knowing. In order to prevent spoofing, SSL certificate verification must be used. Is there a technical reason to perform no verification by default? Wouldn't a safer default be desirable?
Re: [HACKERS] path toward faster partition pruning
Hi David. Thanks a lot for your review comments and sorry it took me a while to reply. On 2017/09/28 18:16, David Rowley wrote: > On 27 September 2017 at 14:22, Amit Langote >wrote: >> - 0001 includes refactoring that Dilip proposed upthread [1] (added him as >> an author). I slightly tweaked his patch -- renamed the function >> get_matching_clause to match_clauses_to_partkey, similar to >> match_clauses_to_index. > > Hi Amit, > > I've made a pass over the 0001 patch while trying to get myself up to > speed with the various developments that are going on in partitioning > right now. > > These are just my thoughts from reading over the patch. I understand > that there's quite a bit up in the air right now about how all this is > going to work, but I have some thoughts about that too, which I > wouldn't mind some feedback on as my line of thought may be off. > > Anyway, I will start with some small typos that I noticed while reading: > > partition.c: > > + * telling what kind of NullTest has been applies to the corresponding > > "applies" should be "applied" Will fix. > plancat.c: > > * might've occurred to satisfy the query. Rest of the fields are set in > > "Rest of the" should be "The remaining" Will fix. > Any onto more serious stuff: > > allpath.c: > > get_rel_partitions() > > I wonder if this function does not belong in partition.c. I'd have > expected a function to exist per partition type, RANGE and LIST, I'd > imagine should have their own function in partition.c to eliminate > partitions > which cannot possibly match, and return the remainder. I see people > speaking of HASH partitioning, but we might one day also want > something like RANDOM or ROUNDROBIN (I'm not really kidding, imagine > routing records to be processed into foreign tables where you never > need to query them again). It would be good if we could easily expand > this list and not have to touch files all over the optimizer to do > that. Of course, there would be other work to do in the executor to > implement any new partitioning method too. I think there will have to be at least some code in the optimizer. That is, the code to match the query to the table's partition keys and using the constant values therein to then look up the table's partitions. More importantly, once the partitions (viz. their offsets in the table's partition descriptor) have been looked up by partition.c, we must be able to then map them to their planner data structures viz. their AppendRelInfo, and subsequently RelOptInfo. This last part will have to be in the optimizer. In fact, that was the role of get_rel_partitions in the initial versions of the patch, when neither of the code for matching keys and for pruning using constants was implemented. One may argue that the first part, that is, matching clauses to match to the partition key and subsequent lookup of partitions using constants could both be implemented in partition.c, but it seems better to me to keep at least the part of matching clauses to the partition keys within the planner (just like matching clauses to indexes is done by the planner). Looking up partitions using constants cannot be done outside partition.c anyway, so that's something we have to implement there. > I know there's mention of it somewhere about get_rel_partitions() not > being so smart about WHERE partkey > 20 AND partkey > 10, the code > does not understand what's more restrictive. I think you could > probably follow the same rules here that are done in > eval_const_expressions(). Over there I see that evaluate_function() > will call anything that's not marked as volatile. Hmm, unless I've missed it, I don't see in evaluate_function() anything about determining which clause is more restrictive. AFAIK, such determination depends on the btree operator class semantics (at least in the most common cases where, say, ">" means greater than in a sense that btree code uses it), so I was planning to handle it the way btree code handles it in _bt_preprocess_keys(). In fact, the existing code in predtest.c, which makes decisions of the similar vein also relies on btree semantics. It's OK to do so, because partitioning methods that exist today and for which we'd like to have such smarts use btree semantics to partition the data. Also, we won't need to optimize such cases for hash partitioning anyway. > I'd imagine, for > each partition key, you'd want to store a Datum with the minimum and > maximum possible value based on the quals processed. If either the > minimum or maximum is still set to NULL, then it's unbounded at that > end. If you encounter partkey = Const, then minimum and maximum can be > set to the value of that Const. From there you can likely ignore any > other quals for that partition key, as if the query did contain > another qual with partkey = SomeOtherConst, then that would have > become a gating qual during the constant folding process. This way
Re: [HACKERS] Horrible CREATE DATABASE Performance in High Sierra
Andres Freundwrites: > On 2017-10-02 19:50:51 -0400, Tom Lane wrote: >> What I saw was that the backend process was consuming 100% of (one) CPU, >> while the I/O transaction rate viewed by "iostat 1" started pretty low >> --- under 10% of what the machine is capable of --- and dropped from >> there as the copy proceeded. I did not think to check if that was user >> or kernel-space CPU, but I imagine it has to be the latter. > So that's pretty clearly a kernel bug... Hm. I wonder if it's mmap() or > msync() that's the problem here. I guess you didn't run a profile? Interestingly, profiling with Activity Monitor seems to blame the problem entirely on munmap() ... which squares with the place I hit every time when randomly stopping the process with gdb^Hlldb, so I'm inclined to believe it. This still offers no insight as to why CREATE DATABASE is hitting the problem while regular flush activity doesn't. > One interesting thing here is that in the CREATE DATABASE case there'll > probably be a lot larger contiguous mappings than in *_flush_after > cases. So it might be related to the size of the mapping / flush "unit". Meh, the mapping is only 64K in this case vs. 8K in the other. Hard to credit that it breaks that easily. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [sqlsmith] crash in RestoreLibraryState during low-memory testing
On Tue, Oct 3, 2017 at 3:04 AM, Andreas Seltenreichwrote: > Hi, > > doing low-memory testing with REL_10_STABLE at 1f19550a87 also produced > a couple of parallel worker core dumps with the backtrace below. > Although most of the backtrace is inside the dynamic linker, it looks > like it was passed a pointer to gone-away shared memory. > It appears to be some dangling pointer, but not sure how it is possible. Can you provide some more details, like do you have any other library which you want to get loaded in the backend (like by using shared_preload_libraries or by some other way)? I think without that we shouldn't try to load anything in the parallel worker. Also, if you can get the failed query (check in server log), it would be great. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Tests for reloptions
On Sun, Oct 1, 2017 at 5:18 AM, Nikolay Shaplovwrote: > src/backend/access/common/reloptions.c get only 7 lines, it was quite covered > by existing test, but all most of the access methods gets some coverage > increase: > > src/backend/access/brin 1268 -> 1280 (+18) > src/backend/access/gin 2924 -> 2924 (0) > src/backend/access/gist 1673 -> 2108 (+435) > src/backend/access/hash 1580 -> 1638 (+58) > src/backend/access/heap 2863 -> 2866 (+3) > src/backend/access/nbtree 2565 -> 2647 (+82) > src/backend/access/spgist 2066 -> 2068 (+2) > > Though I should say that incredible coverage boost for gist, is the result of > not steady results of test run. The real value should be much less... +-- Test buffering and fillfactor reloption takes valid values +create index gist_pointidx2 on gist_point_tbl using gist(p) with (buffering = on, fillfactor=50); +create index gist_pointidx3 on gist_point_tbl using gist(p) with (buffering = off); +create index gist_pointidx4 on gist_point_tbl using gist(p) with (buffering = auto); Those are the important bits doing the boost in gist visibly. To be kept. -CREATE INDEX hash_f8_index ON hash_f8_heap USING hash (random float8_ops); +CREATE INDEX hash_f8_index ON hash_f8_heap USING hash (random float8_ops) WITH (fillfactor=60); Woah. So that alone increases hash by 58 steps. +CREATE INDEX bloomidx2 ON tst USING bloom (i, t) WITH (length=0); +ERROR: value 0 out of bounds for option "length" +DETAIL: Valid values are between "1" and "4096". contrib/bloom contributes to the coverage of reloptions.c thanks to its coverage of the add_ routines when the library is loaded. And those don't actually improve any coverage, right? > Nevertheless tests touches the reloptions related code, checks for proper > error handling, and it is good. Per your measurements reloptions.c is improved by 1.7%, or 7 lines as you say. Five of them are in parse_one_reloption for integer (2) and reals (2), plus one error at the top. Two are in transformRelOptions for a valid namespace handling. In your patch reloptions.sql is 141 lines. Couldn't it be shorter with the same improvements? It looks that a lot of tests are overlapping with existing ones. > I think we should commit it. My take is that a lighter version could be committed instead. My suggestion is to keep the new test reloptions minimal so as it tests the relopt types and their bounds, and I think that you could remove the same bounding checks in the other tests that you added: bloom, gist, etc. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Partition-wise join for join between (declaratively) partitioned tables
On Thu, Sep 21, 2017 at 8:21 AM, Ashutosh Bapatwrote: > Here's set of rebased patches. The patch with extra tests is not for > committing. All other patches, except the last one, will need to be > committed together. The last patch may be committed along with other > patches or as a separate patch. In set_append_rel_size, is it necessary to set attr_needed = bms_copy(rel->attr_needed[index]) rather than just pointing to the existing value? If so, perhaps the comments should explain the reasons. I would have thought that the values wouldn't change after this point, in which case it might not be necessary to copy them. Regarding nomenclature and my previous griping about wisdom, I was wondering about just calling this a "partition join" like you have in the regression test. So the GUC would be enable_partition_join, you'd have generate_partition_join_paths(), etc. Basically just delete "wise" throughout. The elog(DEBUG3) in try_partition_wise_join() doesn't follow message style guidelines and I think should just be removed. It was useful for development, I'm sure, but it's time for it to go. +elog(ERROR, "unrecognized path node type %d", (int) nodeTag(path)); I think we should use the same formulation as elsewhere, namely "unrecognized node type: %d". And likewise probably "unexpected join type: %d". partition_join_extras.sql has a bunch of whitespace damage, although it doesn't really matter since, as you say, that's not for commit. (This is not a full review, just a few thoughts.) -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Assert that the correct locks are held when calling PageGetLSN()
On Tue, Oct 3, 2017 at 12:54 AM, Alvaro Herrerawrote: > Michael Paquier wrote: >> On Mon, Oct 2, 2017 at 11:44 PM, Robert Haas wrote: >> > On Mon, Oct 2, 2017 at 10:19 AM, Michael Paquier >> > wrote: >> >> So those bits could be considered for integration. >> > >> > Yes, and they also tend to suggest that the rest of the patch has value. >> >> Well, there are cases where you don't need any locking checks, and the >> proposed patch ignores that. Take for example pageinspect which works >> on a copy of a page, or just WAL replay which serializes everything, >> and in both cases PageGetLSN can be used safely. So for compatibility >> reasons I think that PageGetLSN should be kept alone to not surprise >> anybody using it, or at least an equivalent should be introduced. It >> would be interesting to make BufferGetLSNAtomic hold tighter checks, >> like something that makes use of LWLockHeldByMe for example. Jacob, here are some ideas to make this thread move on. I would suggest to produce a set of patches that do things incrementally: 1) One patch that changes the calls of PageGetLSN to BufferGetLSNAtomic which are now not appropriate. You have spotted some of them in the btree and gist code, but not all based on my first lookup. There is still one in gistFindCorrectParent and one in btree _bt_split. The monitoring of the other calls (sequence.c and vacuumlazy.c) looked safe. There is another one in XLogRecordAssemble that should be fixed I think. 2) A second patch that strengthens a bit checks around BufferGetLSNAtomic. One idea would be to use LWLockHeldByMe, as you are originally suggesting. A comment could be as well added in bufpage.h for PageGetLSN to let users know that it should be used carefully, in the vein of what is mentioned in src/backend/access/transam/README. > I'd argue about this in the same direction I argued about > BufferGetPage() needing an LSN check that's applied separately: if it's > too easy for a developer to do the wrong thing (i.e. fail to apply the > separate LSN check after reading the page) then the routine should be > patched to do the check itself, and another routine should be offered > for the cases that explicitly can do without the check. > > I was eventually outvoted in the BufferGetPage() saga, though, so I'm > not sure that that discussion sets precedent. Oh... I don't recall this discussion. A quick lookup at the archives does not show me a specific thread either. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Combining expr{Type,Typmod,Collation}() into one function.
Hi, I'd recently noticed that expr* functions actually show up in profiles because we use them at some very common paths (e.g. ExecTypeFromTLInternal()) and that we commonly call all the three variants from $subject in sequence. Looking at their code I was wondering whether it's reasonable to combine them into one function. The performance effects in my case were neglegible, but it might still be worth it just to reduce duplication. I've attached my *WIP* patch to do so. Unless somebody finds this interesting and prods me, I don't plan to do something further with this. If we were to go with this, some cleanup would be needed. Greetings, Andres Freund >From 04dc797d6e99d7fe5902a31ae5fdd73950c8c48f Mon Sep 17 00:00:00 2001 From: Andres FreundDate: Thu, 21 Sep 2017 12:13:38 -0700 Subject: [PATCH] WIP: Combine expr{Type,Typmod,Collation} into one function. Slight speedup, removal of some duplication. Not entirely sure it's worth it. --- src/backend/executor/execTuples.c | 12 +- src/backend/nodes/nodeFuncs.c | 929 +- src/include/nodes/nodeFuncs.h | 1 + 3 files changed, 435 insertions(+), 507 deletions(-) diff --git a/src/backend/executor/execTuples.c b/src/backend/executor/execTuples.c index 51d2c5d166..1035efb4aa 100644 --- a/src/backend/executor/execTuples.c +++ b/src/backend/executor/execTuples.c @@ -919,18 +919,24 @@ ExecTypeFromTLInternal(List *targetList, bool hasoid, bool skipjunk) foreach(l, targetList) { TargetEntry *tle = lfirst(l); + Oid type; + int32 typmod; + Oid collation; if (skipjunk && tle->resjunk) continue; + exprTypeInfo((Node *) tle->expr, + , , ); + TupleDescInitEntry(typeInfo, cur_resno, tle->resname, - exprType((Node *) tle->expr), - exprTypmod((Node *) tle->expr), + type, + typmod, 0); TupleDescInitEntryCollation(typeInfo, cur_resno, - exprCollation((Node *) tle->expr)); + collation); cur_resno++; } diff --git a/src/backend/nodes/nodeFuncs.c b/src/backend/nodes/nodeFuncs.c index 8e6f27e153..caf5ee6a42 100644 --- a/src/backend/nodes/nodeFuncs.c +++ b/src/backend/nodes/nodeFuncs.c @@ -35,69 +35,134 @@ static bool planstate_walk_members(List *plans, PlanState **planstates, /* - * exprType - - * returns the Oid of the type of the expression's result. + * exprTypeInfo - + * lookup an expression result's type, typmod, collation. */ -Oid -exprType(const Node *expr) +void +exprTypeInfo(const Node *expr, Oid *type, int32 *typmod, Oid *collation) { - Oid type; + /* initialize to defaults, overridden where appropriate */ + *type = InvalidOid; + *typmod = -1; + *collation = InvalidOid; if (!expr) - return InvalidOid; + return; switch (nodeTag(expr)) { case T_Var: - type = ((const Var *) expr)->vartype; - break; + { +const Var *var = (const Var *) expr; +*type = var->vartype; +*typmod = var->vartypmod; +*collation = var->varcollid; +break; + } case T_Const: - type = ((const Const *) expr)->consttype; - break; + { +const Const *c = (const Const *) expr; +*type = c->consttype; +*typmod = c->consttypmod; +*collation = c->constcollid; +break; + } case T_Param: - type = ((const Param *) expr)->paramtype; - break; + { +const Param *p = (const Param *) expr; +*type = p->paramtype; +*typmod = p->paramtypmod; +*collation = p->paramcollid; +break; + } case T_Aggref: - type = ((const Aggref *) expr)->aggtype; - break; + { +const Aggref *a = (const Aggref *) expr; +*type = a->aggtype; +*collation = a->aggcollid; +break; + } case T_GroupingFunc: - type = INT4OID; + *type = INT4OID; break; case T_WindowFunc: - type = ((const WindowFunc *) expr)->wintype; - break; + { +const WindowFunc *w = (const WindowFunc *) expr; +*type = w->wintype; +*collation = w->wincollid; +break; + } case T_ArrayRef: { const ArrayRef *arrayref = (const ArrayRef *) expr; /* slice and/or store operations yield the array type */ if (arrayref->reflowerindexpr || arrayref->refassgnexpr) - type = arrayref->refarraytype; + *type = arrayref->refarraytype; else - type = arrayref->refelemtype; + *type = arrayref->refelemtype; + +*typmod = arrayref->reftypmod; +*collation = arrayref->refcollid; } break; case T_FuncExpr: - type = ((const FuncExpr *) expr)->funcresulttype; - break; + { +const FuncExpr *func = (const FuncExpr *) expr; +int32 coercedTypmod; + +*type = func->funcresulttype; + +/* Be smart about length-coercion functions... */ +if (exprIsLengthCoercion(expr, )) + *typmod = coercedTypmod; +else + *typmod = -1; +*collation = func->funccollid; +break; + } case T_NamedArgExpr: - type = exprType((Node *) ((const NamedArgExpr *)
Re: [JDBC] [HACKERS] Channel binding support for SCRAM-SHA-256
On Tue, Oct 3, 2017 at 1:30 AM, Robert Haaswrote: > On Fri, Sep 15, 2017 at 6:29 PM, Michael Paquier > wrote: >> I would like to point out that per the RFC, if the client attempts a >> SSL connection with SCRAM and that the server supports channel >> binding, then it has to publish the SASL mechanism for channel >> binding, aka SCRAM-PLUS. If the client tries to force the use of SCRAM >> even if SCRAM-PLUS is specified, this is seen as a downgrade attack by >> the server which must reject the connection. So this parameter has >> meaning only if you try to connect to a PG10 server using a PG11 >> client (assuming that channel binding gets into PG11). If you connect >> with a PG11 client to a PG11 server with SSL, the server publishes >> SCRAM-PLUS, the client has to use it, hence this turns out to make >> cbind=disable and prefer meaningless in the long-term. If the client >> does not use SSL, then there is no channel binding, and cbind=require >> loses its value. So cbind's fate is actually linked to sslmode. > > That seems problematic. What if the client supports SCRAM but not > channel binding? Peter has outlined here that my interpretation of the RFC was wrong on the client side to begin with: https://www.postgresql.org/message-id/f74525e4-6c53-c653-6860-a8cc8d7c8...@2ndquadrant.com If a client does not support channel binding (it is not compiled with SSL or the connection is done without SSL), it should not send 'y' but 'n'. It should be up to the client to decide if it wants to use channel binding or not. libpq is also going to need some extra logic to send 'y' when it thinks that the server should have channel binding support. This can be done by looking at the backend version. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS][BUG] Cache invalidation for queries that contains const of temporary composite type
Hi, Sorry, I saw this once but somehow my attension was blown away on the way. At Tue, 3 Oct 2017 02:41:34 +0300, Alexander Korotkovwrote in > On Tue, Oct 3, 2017 at 12:20 AM, Maksim Milyutin > wrote: > > > I have tested the following case: > > > > create type pair as (x int, y int); > > prepare test as select json_populate_record(null::pair, '{"x": 1, "y": > > 2}'::json); > > drop type pair cascade; > > > > execute test; > > > > -- The following output is obtained before patch > > ERROR: cache lookup failed for type 16419 > > > > -- After applying patch > > ERROR: type "pair" does not exist > > > > But after recreating 'pair' type I'll get the following message: > > ERROR: cached plan must not change result type > > > > I don't know whether it's right behavior. Anyhow your point is a good > > motivation to experiment and investigate different scenarios of work with > > cached plan that depends on non-stable type. Thanks for that. > > > > I think ideally, cached plan should be automatically invalidated and stored > procedure should work without error. > Not really sure if it's feasible... Without the patch dropping a table used in a prepared query results in the similar error. So I suppose it's the desired behavior in the case. execute test; | ERROR: relation "t3" does not exist The first thought that patch gave me is that the problem is not limited to constants. Actually the following sequence also reproduces similar failure even with this patch. create table t2 (x int , y int); create type pair as (x int, y int); prepare test as select row(x, y)::pair from t2; drop type pair; execute test; | ERROR: cache lookup failed for type 16410 In this case the causal expression is in the following form. TargetEntry ( expr = ( RowExpr: typeid = 16410, row_format = COERCE_EXPLICIT_CAST, args = List (Var(t2.x), Var(t2.y)) ) ) regards, -- Kyotaro Horiguchi NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Commitfest 201709 is now closed
On Tue, Oct 3, 2017 at 3:12 AM, Robert Haaswrote: > On Mon, Oct 2, 2017 at 11:57 AM, Tom Lane wrote: > > Daniel Gustafsson writes: > >> Thanks to everyone who participated, and to everyone who have responded > to my > >> nagging via the CF app email function. This is clearly an awesome > community. > > > > And thanks to you for your hard work as CFM! That's tedious and > > largely thankless work, but it's needed to keep things moving. > > +1. > > I think we need to figure out some plan for tackling the backlog of > Ready for Committer patches, though. There are currently 42 of them, > 10 of which are listed (maybe not all correctly) as bug fixes. That's > not great. How about having an another extra week like (commit week) after commitfest closed(may be new state like "commit" or etc) for the Ready for committer patches to let have the feed back from committers, In case of some rework is required and it will be get ready by the next commitfest begins. I feel something along these lines may get earlier feedback to the the patches. Regards, Hari Babu Fujitsu Australia
Re: [HACKERS] Logging idle checkpoints
On Tue, Oct 3, 2017 at 12:01 AM, Stephen Frostwrote: > I certainly don't care for the idea of adding log messages saying we > aren't doing anything just to match a count that's incorrectly claiming > that checkpoints are happening when they aren't. > > The down-thread suggestion of keeping track of skipped checkpoints might > be interesting, but I'm not entirely convinced it really is. We have > time to debate that, of course, but I don't really see how that's > helpful. At the moment, it seems like the suggestion to add that column > is based on the assumption that we're going to start logging skipped > checkpoints and having that column would allow us to match up the count > between the new column and the "skipped checkpoint" messages in the logs > and I can not help but feel that this is a ridiculous amount of effort > being put into the analysis of something that *didn't* happen. Being able to look at how many checkpoints are skipped can be used as a tuning indicator of max_wal_size and checkpoint_timeout, or in short increase them if those remain idle. Since their introduction in 335feca4, m_timed_checkpoints and m_requested_checkpoints track the number of checkpoint requests, not if a checkpoint has been actually executed or not, I am not sure that this should be changed after 10 years. So, to put it in other words, wouldn't we want a way to track checkpoints that are *executed*, meaning that we could increment a counter after doing the skip checks in CreateRestartPoint() and CreateCheckPoint()? -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 64-bit queryId?
On Tue, Oct 3, 2017 at 9:07 AM, Alexander Korotkovwrote: > +1, > I see 3 options there: > 1) Drop high-order bit, as you proposed. > 2) Allow negative queryIds. > 3) Implement unsigned 64-type. > > #1 causes minor loss of precision which looks rather insignificant in given > context. > #2 might be rather unexpected for users whose previously had non-negative > queryIds. Changing queryId from 32-bit to 64-bit itself might require some > adoption from monitoring software. But queryIds are user-visible, and > negative queryIds would look rather nonlogical. Per the principal of least astonishment perhaps: https://en.wikipedia.org/wiki/Principle_of_least_astonishment Negative values tend to be considered as error codes as well. > #3 would be attaching hard and long-term problem by insufficient reason. > Thus, #1 looks like most harmless solution. In this case going for #1 looks like the safest bet. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [sqlsmith] stuck spinlock in pg_stat_get_wal_receiver after OOM
On Tue, Oct 3, 2017 at 6:54 AM, Tom Lanewrote: > I wrote: >> If this is the only problem then I'd agree we should stick to a spinlock >> (I assume the strings in question can't be very long). I was thinking >> more about what to do if we find other violations that are harder to fix. I don't think that there is any need to switch to a LWLock. Any issues in need to be dealt with here don't require it, if we are fine with the memcpy method of course. > I took a quick look through walreceiver.c, and there are a couple of > obvious problems of the same ilk in WalReceiverMain, which is doing this: > > walrcv->lastMsgSendTime = walrcv->lastMsgReceiptTime = > walrcv->latestWalEndTime = GetCurrentTimestamp(); > > (ie, a potential kernel call) inside a spinlock. But there seems no > real problem with just collecting the timestamp before we enter that > critical section. No problems seen either from here. > I also don't especially like the fact that just above there it reaches > elog(PANIC) with the lock still held, though at least that's a case that > should never happen. This part has been around since the beginning in 1bb2558. I agree that the lock should be released before doing the logging. > Further down, it's doing a pfree() inside the spinlock, apparently > for no other reason than to save one "if (tmp_conninfo)". Check. > I don't especially like the Asserts inside spinlocks, either. Personally, > I think if those conditions are worth testing then they're worth testing > for real (in production). Variables that are manipulated by multiple > processes are way more likely to assume unexpected states than local > variables. Those could be replaced by similar checks using some PG_USED_FOR_ASSERTS_ONLY out of the spin lock sections, though I am not sure if those are worth worrying. What do others think? > I'm also rather befuddled by the fact that this code sets and clears > walrcv->latch outside the critical sections. If that field is used > by any other process, surely that's completely unsafe. If it isn't, > why is it being kept in shared memory? Do you mean the introduction of WalRcvForceReply by 314cbfc? This is more recent, and has been discussed during the review of the remote_apply patch here to avoid sending SIGUSR1 too much from the startup process to the WAL receiver: https://www.postgresql.org/message-id/CA+TgmobPsROS-gFk=_KJdW5scQjcKtpiLhsH9Cw=uwh1htf...@mail.gmail.com I am attaching a patch that addresses the bugs for the spin lock sections. -- Michael walreceiver-spin-calls.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Conversion error
I saw this while restoring 9.6 database to 10.0 database. \connect: FATAL: conversion between UTF8 and MULE_INTERNAL is not supported Is this expected? I saw nothing releated to this in the release notes. Best regards, -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese:http://www.sraoss.co.jp -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Commitfest 201709 is now closed
On 2017/10/03 7:16, Michael Paquier wrote: > On Tue, Oct 3, 2017 at 1:23 AM, Alexander Korotkov >wrote: >> On Mon, Oct 2, 2017 at 6:57 PM, Tom Lane wrote: >>> >>> Daniel Gustafsson writes: Thanks to everyone who participated, and to everyone who have responded to my nagging via the CF app email function. This is clearly an awesome community. >>> >>> And thanks to you for your hard work as CFM! That's tedious and >>> largely thankless work, but it's needed to keep things moving. >> >> +1, >> Thank you very much, Daniel! It was a pleasure working with you at >> commitfest. > > Thanks for doing a bunch of work, Daniel. This is a difficult task. +1. Thank you, Daniel! Regards, Amit -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 64-bit queryId?
On Tue, Oct 3, 2017 at 12:32 AM, Tom Lanewrote: > Peter Geoghegan writes: > > You need to change the SQL interface as well, although I'm not sure > > exactly how. The problem is that you are now passing a uint64 queryId > > to Int64GetDatumFast() within pg_stat_statements_internal(). That > > worked when queryId was a uint32, because you can easily represent > > values <= UINT_MAX as an int64/int8. However, you cannot represent the > > second half of the range of uint64 within a int64/int8. I think that > > this will behave different depending on USE_FLOAT8_BYVAL, if nothing > > else. > > Maybe intentionally drop the high-order bit, so that it's a 63-bit ID? > +1, I see 3 options there: 1) Drop high-order bit, as you proposed. 2) Allow negative queryIds. 3) Implement unsigned 64-type. #1 causes minor loss of precision which looks rather insignificant in given context. #2 might be rather unexpected for users whose previously had non-negative queryIds. Changing queryId from 32-bit to 64-bit itself might require some adoption from monitoring software. But queryIds are user-visible, and negative queryIds would look rather nonlogical. #3 would be attaching hard and long-term problem by insufficient reason. Thus, #1 looks like most harmless solution. -- Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: [HACKERS] Horrible CREATE DATABASE Performance in High Sierra
On 2017-10-02 19:50:51 -0400, Tom Lane wrote: > Andres Freundwrites: > > On 2017-10-02 18:33:17 -0400, Tom Lane wrote: > >> I'm kind of surprised that machine B doesn't show obvious tanking in this > >> test given that msync() makes it suck so badly at copying a database. > >> I wonder what is different from the kernel's standpoint ... maybe the > >> sheer number of different files mmap'd by a single process during the > >> copy? > > > Yea, that's curious. I've really no clue about OSX, so pardon my > > question: With HEAD and CREATE DATABASE, is it IO blocked or kernel cpu > > blocked? > > What I saw was that the backend process was consuming 100% of (one) CPU, > while the I/O transaction rate viewed by "iostat 1" started pretty low > --- under 10% of what the machine is capable of --- and dropped from > there as the copy proceeded. I did not think to check if that was user > or kernel-space CPU, but I imagine it has to be the latter. So that's pretty clearly a kernel bug... Hm. I wonder if it's mmap() or msync() that's the problem here. I guess you didn't run a profile? One interesting thing here is that in the CREATE DATABASE case there'll probably be a lot larger contiguous mappings than in *_flush_after cases. So it might be related to the size of the mapping / flush "unit". Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Horrible CREATE DATABASE Performance in High Sierra
Andres Freundwrites: > On 2017-10-02 18:33:17 -0400, Tom Lane wrote: >> I'm kind of surprised that machine B doesn't show obvious tanking in this >> test given that msync() makes it suck so badly at copying a database. >> I wonder what is different from the kernel's standpoint ... maybe the >> sheer number of different files mmap'd by a single process during the >> copy? > Yea, that's curious. I've really no clue about OSX, so pardon my > question: With HEAD and CREATE DATABASE, is it IO blocked or kernel cpu > blocked? What I saw was that the backend process was consuming 100% of (one) CPU, while the I/O transaction rate viewed by "iostat 1" started pretty low --- under 10% of what the machine is capable of --- and dropped from there as the copy proceeded. I did not think to check if that was user or kernel-space CPU, but I imagine it has to be the latter. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS][BUG] Cache invalidation for queries that contains const of temporary composite type
On Tue, Oct 3, 2017 at 12:20 AM, Maksim Milyutinwrote: > I have tested the following case: > > create type pair as (x int, y int); > prepare test as select json_populate_record(null::pair, '{"x": 1, "y": > 2}'::json); > drop type pair cascade; > > execute test; > > -- The following output is obtained before patch > ERROR: cache lookup failed for type 16419 > > -- After applying patch > ERROR: type "pair" does not exist > > But after recreating 'pair' type I'll get the following message: > ERROR: cached plan must not change result type > > I don't know whether it's right behavior. Anyhow your point is a good > motivation to experiment and investigate different scenarios of work with > cached plan that depends on non-stable type. Thanks for that. > I think ideally, cached plan should be automatically invalidated and stored procedure should work without error. Not really sure if it's feasible... -- Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: [HACKERS] Horrible CREATE DATABASE Performance in High Sierra
Hi, On 2017-10-02 18:33:17 -0400, Tom Lane wrote: > Andres Freundwrites: > > To demonstrate what I'm observing here, on linux with a fairly fast ssd: > > ... > > I tried to replicate this test as closely as I could on the Mac hardware > I have laying about. Thanks! > I only bothered with the synchronous_commit=off case, though, since > you say that shows the worst effects. That's the case on linux at least, but I'd suspect it's a much more general thing - you just can't get that much data dirty with pgbench otherwise. > I'm kind of surprised that machine B doesn't show obvious tanking in this > test given that msync() makes it suck so badly at copying a database. > I wonder what is different from the kernel's standpoint ... maybe the > sheer number of different files mmap'd by a single process during the > copy? Yea, that's curious. I've really no clue about OSX, so pardon my question: With HEAD and CREATE DATABASE, is it IO blocked or kernel cpu blocked? > If we could arrange to not use pg_flush_after in copydir.c on macOS, > I'd be okay with leaving it alone for the configurable flush_after > calls. But I can't think of a way to do that that wouldn't be a > complete kluge. I don't much want to do > > +#ifndef __darwin__ > pg_flush_data(dstfd, offset, nbytes); > +#endif > > but I don't see any better alternative ... What we could do is introduce a guc (~create_database_flush_data) that controls whether we flush here, and have the defaults set differently on OSX. Not sure if that's actually any better. - Andres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] list of credits for release notes
On Mon, Oct 2, 2017 at 02:12:50PM -0400, Stephen Frost wrote: > > How should this be handled for the Postgres 11 release notes? > > Ideally, we would let the individuals choose how to be recognized in > release notes, and anywhere else we recognize them. We have the start > of that in https://postgresql.org/account/profile but that isn't very > easily tied to things in the commit history or elsewhere, yet. I'd > suggest that we try to improve on that by: My smaller question is how will this list be generated in PG 11? From the commit log when the release notes are created, or some other method? -- Bruce Momjianhttp://momjian.us EnterpriseDB http://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Ancient Roman grave inscription + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Horrible CREATE DATABASE Performance in High Sierra
Andres Freundwrites: > To demonstrate what I'm observing here, on linux with a fairly fast ssd: > ... I tried to replicate this test as closely as I could on the Mac hardware I have laying about. I only bothered with the synchronous_commit=off case, though, since you say that shows the worst effects. I used the same parameters you did and the same pgbench settings. I attach the pgbench output for six cases, flush_after disabled or enabled on three different machines: (A) 2016 MacBook Pro, 2.7GHz i7 + SSD, Sierra, HFS+ file system (B) 2013 MacBook Pro, 2.3GHz i7 + SSD, High Sierra, APFS file system (C) 2012 Mac Mini, 2.3GHz i7 + 5400-RPM SATA, High Sierra, HFS+ file system There is some benefit on the SSD machines, but it's in the range of a few percent --- clearly, these kernels are not as subject to the basic I/O-scheduling problem as Linux is. The spinning-rust machine shows a nice gain in overall TPS with flush enabled, but it's actually a bit worse off in terms of the worst-case slowdown --- note that only that case shows things coming to a complete halt. It'd be interesting to check the behavior of a pre-High-Sierra kernel with spinning rust, but I don't have any remotely modern machine answering that description. I'm kind of surprised that machine B doesn't show obvious tanking in this test given that msync() makes it suck so badly at copying a database. I wonder what is different from the kernel's standpoint ... maybe the sheer number of different files mmap'd by a single process during the copy? > What I'm basically wondering is whether we're screwing somebody over > that made the effort to manually configure this on OSX. It's fairly > obvious we need to find a way to disable the msync() by default. I suspect that anybody who cares about DB performance on macOS will be running it on SSD-based hardware these days. The benefit seen on the Mac Mini would have been worth the trouble of a custom configuration a few years ago, but I'm dubious that it matters in the real world anymore. If we could arrange to not use pg_flush_after in copydir.c on macOS, I'd be okay with leaving it alone for the configurable flush_after calls. But I can't think of a way to do that that wouldn't be a complete kluge. I don't much want to do +#ifndef __darwin__ pg_flush_data(dstfd, offset, nbytes); +#endif but I don't see any better alternative ... regards, tom lane progress: 1.0 s, 8132.9 tps, lat 0.978 ms stddev 0.434 progress: 2.0 s, 8841.0 tps, lat 0.905 ms stddev 0.260 progress: 3.0 s, 9020.1 tps, lat 0.887 ms stddev 0.418 progress: 4.0 s, 9005.9 tps, lat 0.888 ms stddev 0.353 progress: 5.0 s, 9167.1 tps, lat 0.873 ms stddev 0.259 progress: 6.0 s, 9248.1 tps, lat 0.865 ms stddev 0.333 progress: 7.0 s, 9295.0 tps, lat 0.861 ms stddev 0.325 progress: 8.0 s, 9435.0 tps, lat 0.848 ms stddev 0.190 progress: 9.0 s, 9453.0 tps, lat 0.846 ms stddev 0.261 progress: 10.0 s, 9717.1 tps, lat 0.823 ms stddev 0.230 progress: 11.0 s, 9658.8 tps, lat 0.828 ms stddev 0.179 progress: 12.0 s, 9332.8 tps, lat 0.857 ms stddev 0.110 progress: 13.0 s, 9430.2 tps, lat 0.848 ms stddev 0.272 progress: 14.0 s, 9479.0 tps, lat 0.844 ms stddev 0.143 progress: 15.0 s, 9381.0 tps, lat 0.853 ms stddev 0.132 progress: 16.0 s, 9464.8 tps, lat 0.845 ms stddev 0.260 progress: 17.0 s, 9563.3 tps, lat 0.836 ms stddev 0.183 progress: 18.0 s, 9646.0 tps, lat 0.829 ms stddev 0.138 progress: 19.0 s, 9554.0 tps, lat 0.837 ms stddev 0.146 progress: 20.0 s, 9296.0 tps, lat 0.861 ms stddev 0.213 progress: 21.0 s, 9255.9 tps, lat 0.864 ms stddev 0.174 progress: 22.0 s, 9224.1 tps, lat 0.867 ms stddev 0.163 progress: 23.0 s, 9381.2 tps, lat 0.853 ms stddev 0.226 progress: 24.0 s, 9263.9 tps, lat 0.864 ms stddev 0.225 progress: 25.0 s, 9320.8 tps, lat 0.858 ms stddev 0.226 progress: 26.0 s, 9538.8 tps, lat 0.837 ms stddev 0.185 progress: 27.0 s, 9588.3 tps, lat 0.836 ms stddev 0.192 progress: 28.0 s, 9720.0 tps, lat 0.823 ms stddev 0.135 progress: 29.0 s, 9958.0 tps, lat 0.803 ms stddev 0.121 progress: 30.0 s, 9080.1 tps, lat 0.881 ms stddev 0.249 progress: 31.0 s, 8969.8 tps, lat 0.892 ms stddev 0.334 progress: 32.0 s, 9059.9 tps, lat 0.883 ms stddev 0.210 progress: 33.0 s, 9052.2 tps, lat 0.884 ms stddev 0.308 progress: 34.0 s, 8738.9 tps, lat 0.915 ms stddev 0.313 progress: 35.0 s, 8969.0 tps, lat 0.891 ms stddev 0.484 progress: 36.0 s, 8959.8 tps, lat 0.893 ms stddev 0.530 progress: 37.0 s, 8943.3 tps, lat 0.895 ms stddev 0.486 progress: 38.0 s, 8593.7 tps, lat 0.930 ms stddev 0.699 progress: 39.0 s, 8715.1 tps, lat 0.919 ms stddev 0.661 progress: 40.0 s, 8990.3 tps, lat 0.890 ms stddev 0.366 progress: 41.0 s, 8940.8 tps, lat 0.895 ms stddev 0.459 progress: 42.0 s, 9226.0 tps, lat 0.867 ms stddev 0.278 progress: 43.0 s, 9149.2 tps, lat 0.874 ms stddev 0.353 progress: 44.0 s, 9252.0 tps, lat 0.865 ms stddev 0.215 progress: 45.0 s, 7473.9 tps, lat 1.070 ms stddev 2.826 progress: 46.0 s,
Re: [HACKERS] Commitfest 201709 is now closed
On Tue, Oct 3, 2017 at 1:23 AM, Alexander Korotkovwrote: > On Mon, Oct 2, 2017 at 6:57 PM, Tom Lane wrote: >> >> Daniel Gustafsson writes: >> > Thanks to everyone who participated, and to everyone who have responded >> > to my >> > nagging via the CF app email function. This is clearly an awesome >> > community. >> >> And thanks to you for your hard work as CFM! That's tedious and >> largely thankless work, but it's needed to keep things moving. > > +1, > Thank you very much, Daniel! It was a pleasure working with you at > commitfest. Thanks for doing a bunch of work, Daniel. This is a difficult task. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Binary search in fmgr_isbuiltin() is a bottleneck.
On 2017-10-02 17:57:51 -0400, Tom Lane wrote: > Andres Freundwrites: > > Done that way. It's a bit annoying, because we've to take care to > > initialize the "unused" part of the array with a valid signalling it's > > an unused mapping. Can't use 0 for that because fmgr_builtins[0] is a > > valid entry. > > The prototype code I posted further upthread just used -1 as the "unused" > marker. There's no reason the array can't be int16 rather than uint16, > and "if (index < 0)" is probably a faster test anyway. Right, but whether we use -1 or UINT16_MAX or such doesn't matter. The relevant bit is that we can't use 0, so we can't rely on the rest of the array being zero initialized, but instead of to initialize all of it explicitly. I've no real feelings about using -1 or UINT16_MAX - I'd be very surprised if there's any sort of meaningful performance difference. Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Binary search in fmgr_isbuiltin() is a bottleneck.
Andres Freundwrites: > Done that way. It's a bit annoying, because we've to take care to > initialize the "unused" part of the array with a valid signalling it's > an unused mapping. Can't use 0 for that because fmgr_builtins[0] is a > valid entry. The prototype code I posted further upthread just used -1 as the "unused" marker. There's no reason the array can't be int16 rather than uint16, and "if (index < 0)" is probably a faster test anyway. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [sqlsmith] stuck spinlock in pg_stat_get_wal_receiver after OOM
I wrote: > If this is the only problem then I'd agree we should stick to a spinlock > (I assume the strings in question can't be very long). I was thinking > more about what to do if we find other violations that are harder to fix. I took a quick look through walreceiver.c, and there are a couple of obvious problems of the same ilk in WalReceiverMain, which is doing this: walrcv->lastMsgSendTime = walrcv->lastMsgReceiptTime = walrcv->latestWalEndTime = GetCurrentTimestamp(); (ie, a potential kernel call) inside a spinlock. But there seems no real problem with just collecting the timestamp before we enter that critical section. I also don't especially like the fact that just above there it reaches elog(PANIC) with the lock still held, though at least that's a case that should never happen. Further down, it's doing a pfree() inside the spinlock, apparently for no other reason than to save one "if (tmp_conninfo)". I don't especially like the Asserts inside spinlocks, either. Personally, I think if those conditions are worth testing then they're worth testing for real (in production). Variables that are manipulated by multiple processes are way more likely to assume unexpected states than local variables. I'm also rather befuddled by the fact that this code sets and clears walrcv->latch outside the critical sections. If that field is used by any other process, surely that's completely unsafe. If it isn't, why is it being kept in shared memory? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Binary search in fmgr_isbuiltin() is a bottleneck.
On 2017-09-28 19:06:27 -0400, Tom Lane wrote: > Andres Freundwrites: > > On 2017-09-28 18:52:28 -0400, Tom Lane wrote: > >> Uh, what? Access to fmgr_nbuiltins shouldn't be part of any critical path > >> anymore after this change. > > > Indeed. But the size of the the oid -> fmgr_builtins index array is > > relevant now. We could of course just make that dependent on > > FirstBootstrapObjectId, but that'd waste some memory. > > Not enough to notice, considering there are pg_proc OIDs up in the 8K > range already. We blow 2KB of never-accessed space for far less good > reason than this. Done that way. It's a bit annoying, because we've to take care to initialize the "unused" part of the array with a valid signalling it's an unused mapping. Can't use 0 for that because fmgr_builtins[0] is a valid entry. We could introduce a dummy element at that position, but that doesn't really seem nice either. Therefore the first attached commit moves find_defined_symbol from genbki to Catalog.pm, so we can easily extract FirstBootstrapObjectId in Gen_fmgrtab.pl. Greetings, Andres Freund >From 16e35356cead73291d676764072abfebc2efa79b Mon Sep 17 00:00:00 2001 From: Andres Freund Date: Mon, 2 Oct 2017 14:14:42 -0700 Subject: [PATCH 1/2] Move genbki.pl's find_defined_symbol to Catalog.pm. Will be used in Gen_fmgrtab.pl in a followup commit. --- src/backend/catalog/Catalog.pm | 35 ++- src/backend/catalog/genbki.pl | 34 -- src/backend/utils/Makefile | 2 +- 3 files changed, 39 insertions(+), 32 deletions(-) diff --git a/src/backend/catalog/Catalog.pm b/src/backend/catalog/Catalog.pm index 7abfda3d3a..54f83533b6 100644 --- a/src/backend/catalog/Catalog.pm +++ b/src/backend/catalog/Catalog.pm @@ -19,7 +19,7 @@ use warnings; require Exporter; our @ISA = qw(Exporter); our @EXPORT= (); -our @EXPORT_OK = qw(Catalogs SplitDataLine RenameTempFile); +our @EXPORT_OK = qw(Catalogs SplitDataLine RenameTempFile FindDefinedSymbol); # Call this function with an array of names of header files to parse. # Returns a nested data structure describing the data in the headers. @@ -252,6 +252,39 @@ sub RenameTempFile rename($temp_name, $final_name) || die "rename: $temp_name: $!"; } + +# Find a symbol defined in a particular header file and extract the value. +# +# The include path has to be passed as a reference to an array. +sub FindDefinedSymbol +{ + my ($catalog_header, $include_path, $symbol) = @_; + + for my $path (@$include_path) + { + + # Make sure include path ends in a slash. + if (substr($path, -1) ne '/') + { + $path .= '/'; + } + my $file = $path . $catalog_header; + next if !-f $file; + open(my $find_defined_symbol, '<', $file) || die "$file: $!"; + while (<$find_defined_symbol>) + { + if (/^#define\s+\Q$symbol\E\s+(\S+)/) + { +return $1; + } + } + close $find_defined_symbol; + die "$file: no definition found for $symbol\n"; + } + die "$catalog_header: not found in any include directory\n"; +} + + # verify the number of fields in the passed-in DATA line sub check_natts { diff --git a/src/backend/catalog/genbki.pl b/src/backend/catalog/genbki.pl index 2eebb061b7..256a9c9c93 100644 --- a/src/backend/catalog/genbki.pl +++ b/src/backend/catalog/genbki.pl @@ -87,9 +87,11 @@ open my $shdescr, '>', $shdescrfile . $tmpext # NB: make sure that the files used here are known to be part of the .bki # file's dependencies by src/backend/catalog/Makefile. my $BOOTSTRAP_SUPERUSERID = - find_defined_symbol('pg_authid.h', 'BOOTSTRAP_SUPERUSERID'); + Catalog::FindDefinedSymbol('pg_authid.h', \@include_path, + 'BOOTSTRAP_SUPERUSERID'); my $PG_CATALOG_NAMESPACE = - find_defined_symbol('pg_namespace.h', 'PG_CATALOG_NAMESPACE'); + Catalog::FindDefinedSymbol('pg_namespace.h', \@include_path, + 'PG_CATALOG_NAMESPACE'); # Read all the input header files into internal data structures my $catalogs = Catalog::Catalogs(@input_files); @@ -500,34 +502,6 @@ sub emit_schemapg_row return $row; } -# Find a symbol defined in a particular header file and extract the value. -sub find_defined_symbol -{ - my ($catalog_header, $symbol) = @_; - for my $path (@include_path) - { - - # Make sure include path ends in a slash. - if (substr($path, -1) ne '/') - { - $path .= '/'; - } - my $file = $path . $catalog_header; - next if !-f $file; - open(my $find_defined_symbol, '<', $file) || die "$file: $!"; - while (<$find_defined_symbol>) - { - if (/^#define\s+\Q$symbol\E\s+(\S+)/) - { -return $1; - } - } - close $find_defined_symbol; - die "$file: no definition found for $symbol\n"; - } - die "$catalog_header: not found in any include directory\n"; -} - sub usage { die <
[HACKERS] [sqlsmith] crash in RestoreLibraryState during low-memory testing
Hi, doing low-memory testing with REL_10_STABLE at 1f19550a87 also produced a couple of parallel worker core dumps with the backtrace below. Although most of the backtrace is inside the dynamic linker, it looks like it was passed a pointer to gone-away shared memory. regards, Andreas Core was generated by `postgres: bgworker: parallel worker for PID 24326 '. Program terminated with signal SIGSEGV, Segmentation fault. #0 strlen () at ../sysdeps/x86_64/strlen.S:106 #1 0x7f5184852a36 in fillin_rpath (rpath=, rpath@entry=0x55b692f0d360 "/home/smith/postgres/inst/master/lib", result=result@entry=0x55b692f1b380, sep=sep@entry=0x7f5184868060 ":", check_trusted=check_trusted@entry=0, what=what@entry=0x7f51848683bd "RUNPATH", where=where@entry=0x55b692f2d2f0 "/home/smith/postgres/inst/master/lib/pgcrypto.so", l=0x55b692f2d330) at dl-load.c:444 #2 0x7f5184852daf in decompose_rpath (sps=sps@entry=0x55b692f2d6d8, rpath=, l=l@entry=0x55b692f2d330, what=what@entry=0x7f51848683bd "RUNPATH") at dl-load.c:618 #3 0x7f5184852ef7 in cache_rpath (l=l@entry=0x55b692f2d330, sp=sp@entry=0x55b692f2d6d8, tag=tag@entry=29, what=what@entry=0x7f51848683bd "RUNPATH") at dl-load.c:652 #4 0x7f5184853c62 in cache_rpath (what=0x7f51848683bd "RUNPATH", tag=29, sp=0x55b692f2d6d8, l=0x55b692f2d330) at dl-load.c:2307 #5 _dl_map_object (loader=0x55b692f2d330, name=0x7f517f300cc3 "libz.so.1", type=2, trace_mode=0, mode=, nsid=) at dl-load.c:2314 #6 0x7f5184857e70 in openaux (a=a@entry=0x7ffd4f686130) at dl-deps.c:63 #7 0x7f518485a4f4 in _dl_catch_error (objname=objname@entry=0x7ffd4f686128, errstring=errstring@entry=0x7ffd4f686120, mallocedp=mallocedp@entry=0x7ffd4f68611f, operate=operate@entry=0x7f5184857e40 , args=args@entry=0x7ffd4f686130) at dl-error.c:187 #8 0x7f51848580df in _dl_map_object_deps (map=map@entry=0x55b692f2d330, preloads=preloads@entry=0x0, npreloads=npreloads@entry=0, trace_mode=trace_mode@entry=0, open_mode=open_mode@entry=-2147483648) at dl-deps.c:254 #9 0x7f518485ea02 in dl_open_worker (a=a@entry=0x7ffd4f6863c0) at dl-open.c:280 #10 0x7f518485a4f4 in _dl_catch_error (objname=objname@entry=0x7ffd4f6863b0, errstring=errstring@entry=0x7ffd4f6863b8, mallocedp=mallocedp@entry=0x7ffd4f6863af, operate=operate@entry=0x7f518485e8f0 , args=args@entry=0x7ffd4f6863c0) at dl-error.c:187 #11 0x7f518485e489 in _dl_open (file=0x55b692f2d2b0 "/home/smith/postgres/inst/master/lib/pgcrypto.so", mode=-2147483390, caller_dlopen=0x55b691cb4c7e, nsid=-2, argc=, argv=, env=0x55b692eef880) at dl-open.c:660 #12 0x7f5184020ee9 in dlopen_doit (a=a@entry=0x7ffd4f6865f0) at dlopen.c:66 #13 0x7f518485a4f4 in _dl_catch_error (objname=0x55b692eef6d0, errstring=0x55b692eef6d8, mallocedp=0x55b692eef6c8, operate=0x7f5184020e90 , args=0x7ffd4f6865f0) at dl-error.c:187 #14 0x7f5184021521 in _dlerror_run (operate=operate@entry=0x7f5184020e90 , args=args@entry=0x7ffd4f6865f0) at dlerror.c:163 #15 0x7f5184020f82 in __dlopen (file=, mode=mode@entry=258) at dlopen.c:87 #16 0x55b691cb4c7e in internal_load_library (libname=libname@entry=0x7f51848be7f8 ) at dfmgr.c:231 #17 0x55b691cb5928 in RestoreLibraryState (start_address=0x7f51848be7f8 ) at dfmgr.c:754 #18 0x55b6919459d9 in ParallelWorkerMain (main_arg=) at parallel.c:1030 #19 0x55b691b23746 in StartBackgroundWorker () at bgworker.c:835 #20 0x55b691b2faf5 in do_start_bgworker (rw=0x55b692f0e050) at postmaster.c:5680 #21 maybe_start_bgworkers () at postmaster.c:5884 #22 0x55b691b305c8 in sigusr1_handler (postgres_signal_arg=) at postmaster.c:5073 #23 #24 0x7f5183a5f273 in __select_nocancel () at ../sysdeps/unix/syscall-template.S:84 #25 0x55b6918b8c0b in ServerLoop () at postmaster.c:1717 #26 0x55b691b31c65 in PostmasterMain (argc=3, argv=0x55b692eea5f0) at postmaster.c:1361 #27 0x55b6918bac4d in main (argc=3, argv=0x55b692eea5f0) at main.c:228 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [sqlsmith] stuck spinlock in pg_stat_get_wal_receiver after OOM
Andres Freundwrites: > On 2017-10-02 17:30:25 -0400, Tom Lane wrote: >> Or replace the spinlock with an LWLock? > That'd probably be a good idea, but I'm loathe to do so in the back > branches. Not at this callsite, but some others, there's some potential > for contention. If this is the only problem then I'd agree we should stick to a spinlock (I assume the strings in question can't be very long). I was thinking more about what to do if we find other violations that are harder to fix. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 64-bit queryId?
Peter Geogheganwrites: > You need to change the SQL interface as well, although I'm not sure > exactly how. The problem is that you are now passing a uint64 queryId > to Int64GetDatumFast() within pg_stat_statements_internal(). That > worked when queryId was a uint32, because you can easily represent > values <= UINT_MAX as an int64/int8. However, you cannot represent the > second half of the range of uint64 within a int64/int8. I think that > this will behave different depending on USE_FLOAT8_BYVAL, if nothing > else. Maybe intentionally drop the high-order bit, so that it's a 63-bit ID? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [sqlsmith] stuck spinlock in pg_stat_get_wal_receiver after OOM
On 2017-10-02 17:30:25 -0400, Tom Lane wrote: > Andres Freundwrites: > > Yes, that'd be a bad idea. It's not great to have memcpys in a critical > > section, but it's way better than pallocs. So we need to use some local > > buffers that this get copied to. > > Or replace the spinlock with an LWLock? That'd probably be a good idea, but I'm loathe to do so in the back branches. Not at this callsite, but some others, there's some potential for contention. Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [sqlsmith] stuck spinlock in pg_stat_get_wal_receiver after OOM
Andres Freundwrites: > On 2017-10-02 22:56:49 +0200, Andreas Seltenreich wrote: >> low-memory testing with REL_10_STABLE at 1f19550a87 produced the >> following PANIC: >> stuck spinlock detected at pg_stat_get_wal_receiver, walreceiver.c:1397 > Ugh. Egad. > Yes, that'd be a bad idea. It's not great to have memcpys in a critical > section, but it's way better than pallocs. So we need to use some local > buffers that this get copied to. Or replace the spinlock with an LWLock? In any case, I think it would be a good idea to look at every other critical section touching that lock to see if there are any other blatant coding-rule violations. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS][BUG] Cache invalidation for queries that contains const of temporary composite type
Hi, Alexander! Thanks for the comments. 02.10.17 20:02, Alexander Korotkov wrote: Please, register this patch at upcoming commitfest to ensure it wouldn't be forgotten. Regression test changes (both .sql and .out) are essential parts of the patch. I see no point in posting them separately. Please, incorporate them into your patch. OK, I'll take your advice. Did you check this patch with manually created composite type (made by CREATE TYPE typname AS ...)? I have tested the following case: create type pair as (x int, y int); prepare test as select json_populate_record(null::pair, '{"x": 1, "y": 2}'::json); drop type pair cascade; execute test; -- The following output is obtained before patch ERROR: cache lookup failed for type 16419 -- After applying patch ERROR: type "pair" does not exist But after recreating 'pair' type I'll get the following message: ERROR: cached plan must not change result type I don't know whether it's right behavior. Anyhow your point is a good motivation to experiment and investigate different scenarios of work with cached plan that depends on non-stable type. Thanks for that. -- Regards, Maksim Milyutin
Re: [HACKERS] 64-bit queryId?
On Mon, Oct 2, 2017 at 9:10 AM, Robert Haaswrote: > On Mon, Oct 2, 2017 at 11:02 AM, Joshua D. Drake > wrote: >> +1 to both of these as well. > > OK, so here's a patch. Review appreciated. You need to change the SQL interface as well, although I'm not sure exactly how. The problem is that you are now passing a uint64 queryId to Int64GetDatumFast() within pg_stat_statements_internal(). That worked when queryId was a uint32, because you can easily represent values <= UINT_MAX as an int64/int8. However, you cannot represent the second half of the range of uint64 within a int64/int8. I think that this will behave different depending on USE_FLOAT8_BYVAL, if nothing else. The background here is that we used int8 as the output type for the function when queryId was first exposed via the SQL interface because there was no 32-bit unsigned int type that we could have used (except possibly Oid, but that's weird). You see the same pattern in a couple of other places. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] list of credits for release notes
> On Wed, Sep 27, 2017 at 8:29 PM, Michael Paquier >wrote: >> Looking at this list, the first name is followed by the family name, >> so there are inconsistencies with some Japanese names: >> - Fujii Masao should be Masao Fujii. >> - KaiGai Kohei should be Kohei Kaigai. > > But his emails say Fujii Masao, not Masao Fujii. Michael is correct. Sometimes people choose family name first in the emails. However I am sure "Fujii" is the family name and "Masao" is the firstname. > KaiGai's case is a bit trickier, as his email name has changed over time. Michael is correct about Kaigai too. Best regards, -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese:http://www.sraoss.co.jp -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [sqlsmith] stuck spinlock in pg_stat_get_wal_receiver after OOM
On 2017-10-02 22:56:49 +0200, Andreas Seltenreich wrote: > Hi, > > low-memory testing with REL_10_STABLE at 1f19550a87 produced the > following PANIC: > > stuck spinlock detected at pg_stat_get_wal_receiver, walreceiver.c:1397 Ugh. > I was about to wrap the pstrdup()s with a PG_TRY block, but I can't find > a spinlock being released in a PG_CATCH block anywhere, so maybe that's > a bad idea? Yes, that'd be a bad idea. It's not great to have memcpys in a critical section, but it's way better than pallocs. So we need to use some local buffers that this get copied to. This seems to have been introduced as part of b1a9bad9e74 and then 9ed551e0a4f. Authors CCed. Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [sqlsmith] stuck spinlock in pg_stat_get_wal_receiver after OOM
On Tue, Oct 3, 2017 at 9:56 AM, Andreas Seltenreichwrote: > low-memory testing with REL_10_STABLE at 1f19550a87 produced the > following PANIC: > > stuck spinlock detected at pg_stat_get_wal_receiver, walreceiver.c:1397 > > I was about to wrap the pstrdup()s with a PG_TRY block, but I can't find > a spinlock being released in a PG_CATCH block anywhere, so maybe that's > a bad idea? No comment on what might be holding the spinlock there, but perhaps the spinlock-protected code should strncpy into stack-local buffers instead of calling pstrdup()? The buffers could be statically sized with NAMEDATALEN and MAXCONNINFO. -- Thomas Munro http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] [sqlsmith] stuck spinlock in pg_stat_get_wal_receiver after OOM
Hi, low-memory testing with REL_10_STABLE at 1f19550a87 produced the following PANIC: stuck spinlock detected at pg_stat_get_wal_receiver, walreceiver.c:1397 I was about to wrap the pstrdup()s with a PG_TRY block, but I can't find a spinlock being released in a PG_CATCH block anywhere, so maybe that's a bad idea? regards, Andreas -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 64-bit queryId?
On 03/10/17 04:02, Joshua D. Drake wrote: On 10/01/2017 04:22 PM, Robert Haas wrote: On Sun, Oct 1, 2017 at 3:48 PM, Greg Starkwrote: Well these kinds of monitoring systems tend to be used by operations people who are a lot more practical and a lot less worried about theoretical concerns like that. +1, well said. In context the point was merely that the default pg_stat_statements.max of 5000 isn't sufficient to argue that 32-bit values are enough. It wouldn't be hard for there to be 64k different queries over time and across all the databases in a fleet and at that point it becomes likely there'll be a 32-bit collision. Yeah. I think Alexander Korotkov's points are quite good, too. +1 to both of these as well. jD Did a calculation: # probability of collision 54561 0.43 54562 0.55 Essentially, you hit a greater than 50% chance of a collision before you get to 55 thousand statements. Cheers, Gavin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Horrible CREATE DATABASE Performance in High Sierra
On 2017-10-02 15:59:05 -0400, Tom Lane wrote: > Andres Freundwrites: > > On 2017-10-02 15:54:43 -0400, Tom Lane wrote: > >> Should I expect there to be any difference at all? We don't enable > >> *_flush_after by default on non-Linux platforms. > > > Right, you'd have to enable that. But your patch would neuter an > > intentionally enabled config too, no? > > Well, if you want to suggest a specific scenario to try, I'm happy to. > I am not going to guess as to what will satisfy you. To demonstrate what I'm observing here, on linux with a fairly fast ssd: with: -c autovacuum_analyze_threshold=2147483647 # to avoid analyze snapshot issue -c fsync=on -c synchronous_commit=on -c shared_buffers=4GB -c max_wal_size=30GB -c checkpoint_timeout=30s -c checkpoint_flush_after=0 -c bgwriter_flush_after=0 and pgbench -i -s 100 -q a pgbench -M prepared -c 8 -j 8 -n -P1 -T 100 often has periods like: synchronous_commit = on: progress: 73.0 s, 395.0 tps, lat 20.029 ms stddev 4.001 progress: 74.0 s, 289.0 tps, lat 23.730 ms stddev 23.337 progress: 75.0 s, 88.0 tps, lat 104.029 ms stddev 178.038 progress: 76.0 s, 400.0 tps, lat 20.055 ms stddev 4.844 latency average = 21.599 ms latency stddev = 13.865 ms tps = 370.346506 (including connections establishing) tps = 370.372550 (excluding connections establishing) with synchronous_commit=off those periods are a lot worse: progress: 57.0 s, 21104.3 tps, lat 0.379 ms stddev 0.193 progress: 58.0 s, 9994.1 tps, lat 0.536 ms stddev 3.140 progress: 59.0 s, 0.0 tps, lat -nan ms stddev -nan progress: 60.0 s, 0.0 tps, lat -nan ms stddev -nan progress: 61.0 s, 0.0 tps, lat -nan ms stddev -nan progress: 62.0 s, 0.0 tps, lat -nan ms stddev -nan progress: 63.0 s, 3319.6 tps, lat 12.860 ms stddev 253.664 progress: 64.0 s, 20997.0 tps, lat 0.381 ms stddev 0.190 progress: 65.0 s, 20409.1 tps, lat 0.392 ms stddev 0.303 ... latency average = 0.745 ms latency stddev = 20.470 ms tps = 10743.53 (including connections establishing) tps = 10743.815591 (excluding connections establishing) contrasting that to checkpoint_flush_after=256kB and bgwriter_flush_after=512kB: synchronous_commit=on worst: progress: 87.0 s, 298.0 tps, lat 26.874 ms stddev 26.691 latency average = 21.898 ms latency stddev = 6.416 ms tps = 365.308180 (including connections establishing) tps = 365.318793 (excluding connections establishing) synchronous_commit=on worst: progress: 30.0 s, 7026.8 tps, lat 1.137 ms stddev 11.070 latency average = 0.550 ms latency stddev = 5.599 ms tps = 14547.842213 (including connections establishing) tps = 14548.325102 (excluding connections establishing) If you do the same on rotational disks, the stall periods can get a *lot* worse (multi-minute stalls with pretty much no activity). What I'm basically wondering is whether we're screwing somebody over that made the effort to manually configure this on OSX. It's fairly obvious we need to find a way to disable the msync() by default. Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Horrible CREATE DATABASE Performance in High Sierra
Andres Freundwrites: > On 2017-10-02 15:54:43 -0400, Tom Lane wrote: >> Should I expect there to be any difference at all? We don't enable >> *_flush_after by default on non-Linux platforms. > Right, you'd have to enable that. But your patch would neuter an > intentionally enabled config too, no? Well, if you want to suggest a specific scenario to try, I'm happy to. I am not going to guess as to what will satisfy you. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Horrible CREATE DATABASE Performance in High Sierra
On 2017-10-02 15:54:43 -0400, Tom Lane wrote: > Andres Freundwrites: > > On 2017-10-02 15:42:25 -0400, Tom Lane wrote: > >> I experimented with this further by seeing whether the msync() code path > >> is of any value on Sierra either. The answer seems to be "no": cloning > >> a scale-1000 pgbench database takes about 17-18 seconds on my Sierra > >> laptop using unmodified HEAD, but if I dike out the msync() logic then > >> it takes 16-17 seconds. Both numbers jump around a little, but using > >> msync is strictly worse. > > > Well, that's only measuring one type of workload. Could you run a normal > > pgbench with -P1 or so for 2-3 checkpoint cycles and see how big the > > latency differences are? > > Should I expect there to be any difference at all? We don't enable > *_flush_after by default on non-Linux platforms. Right, you'd have to enable that. But your patch would neuter an intentionally enabled config too, no? Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Horrible CREATE DATABASE Performance in High Sierra
Andres Freundwrites: > On 2017-10-02 15:42:25 -0400, Tom Lane wrote: >> I experimented with this further by seeing whether the msync() code path >> is of any value on Sierra either. The answer seems to be "no": cloning >> a scale-1000 pgbench database takes about 17-18 seconds on my Sierra >> laptop using unmodified HEAD, but if I dike out the msync() logic then >> it takes 16-17 seconds. Both numbers jump around a little, but using >> msync is strictly worse. > Well, that's only measuring one type of workload. Could you run a normal > pgbench with -P1 or so for 2-3 checkpoint cycles and see how big the > latency differences are? Should I expect there to be any difference at all? We don't enable *_flush_after by default on non-Linux platforms. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Horrible CREATE DATABASE Performance in High Sierra
On 2017-10-02 15:42:25 -0400, Tom Lane wrote: > I wrote: > > In short, therefore, APFS cannot cope with the way we're using msync(). > > I experimented with this further by seeing whether the msync() code path > is of any value on Sierra either. The answer seems to be "no": cloning > a scale-1000 pgbench database takes about 17-18 seconds on my Sierra > laptop using unmodified HEAD, but if I dike out the msync() logic then > it takes 16-17 seconds. Both numbers jump around a little, but using > msync is strictly worse. Well, that's only measuring one type of workload. Could you run a normal pgbench with -P1 or so for 2-3 checkpoint cycles and see how big the latency differences are? > I propose therefore that an appropriate fix is to unconditionally disable > the msync code path on Darwin, as we have already done for Windows. When > and if Apple changes their kernel so that this path is actually of some > value, we can figure out how to detect whether to use it. I'm inclined to think you're right. This is a surprisingly massive regression for a new OS release - we're not the only database that uses msync... Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Horrible CREATE DATABASE Performance in High Sierra
Thanks for this breakdown Tom! FWIW - I'm on Postgres 9.6.5 as bundled with Postgres.app (2.0.5) running on 2013 MBP (2.7GHz i7 / 16GB / SSD) setup. It looks like this might be a priority for an upcoming release, so I might try to hold out for downstream, but thanks for the patch. It will help if we need get custom builds out to fellow devs if this becomes too unbearable. On Mon, Oct 2, 2017 at 1:42 PM, Tom Lanewrote: > I wrote: > > In short, therefore, APFS cannot cope with the way we're using msync(). > > I experimented with this further by seeing whether the msync() code path > is of any value on Sierra either. The answer seems to be "no": cloning > a scale-1000 pgbench database takes about 17-18 seconds on my Sierra > laptop using unmodified HEAD, but if I dike out the msync() logic then > it takes 16-17 seconds. Both numbers jump around a little, but using > msync is strictly worse. > > I propose therefore that an appropriate fix is to unconditionally disable > the msync code path on Darwin, as we have already done for Windows. When > and if Apple changes their kernel so that this path is actually of some > value, we can figure out how to detect whether to use it. > > The msync logic seems to date back to this thread: > > https://www.postgresql.org/message-id/flat/alpine.DEB.2. > 10.150601132.28433%40sto > > wherein Andres opined > >> I think this patch primarily needs: > >> * Benchmarking on FreeBSD/OSX to see whether we should enable the > >> mmap()/msync(MS_ASYNC) method by default. Unless somebody does so, I'm > >> inclined to leave it off till then. > > but so far as I can tell from the thread, only testing on FreeBSD ever > got done. So there's no evidence that this was ever beneficial on macOS, > and we now have evidence that it's between counterproductive and > catastrophic depending on which kernel version you look at. > > regards, tom lane >
Re: [HACKERS] issue: record or row variable cannot be part of multiple-item INTO list
On Mon, Oct 2, 2017 at 12:44 PM, Tom Lanewrote: >> And I'm saying - that argument is bogus. Regardless of what people >> want or what we have historically done in the case where the >> record/row is the only INTO target, when there are multiple targets it >> seems clear that they want to match up the query's output columns with >> the INTO targets 1:1. So let's just do that. > > Arguing that that's what people want (even if I granted your argument, > which I do not) does not make the inconsistency magically disappear, > nor will it stop people from being confused by that inconsistency. > Furthermore, if we do it like this, we will be completely backed into > a backwards-compatibility corner if someone does come along and say > "hey, I wish I could do the other thing". That is not really true. Even if we define INTO a, b, c as I am proposing (and Pavel, too, I think), I think we can later define INTO *a, INTO (a), INTO a ..., INTO a MULTIPLE, INTO a STRANGELY, and INTO %@!a??ppp#zorp to mean anything we like (unless one or more of those already have some semantics already, in which case pick something that doesn't). If we're really on the fence about which behavior people will want, we could implement both with a syntax marker for each, say SELECT ... INTO a #rhaas for the behavior I like and SELECT ... INTO a #tgl_ftw for the other behavior, and require specifying one of those decorations. Maybe that's more or less what you were proposing. If we're going to have a default, though, I think it should be the one you described as "inconsistent with the single row case" rather than the one you described as "very bug-prone", because I agree with those characterizations but feel that the latter is a much bigger problem than the former and, again, not what anybody actually wants. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Horrible CREATE DATABASE Performance in High Sierra
I wrote: > In short, therefore, APFS cannot cope with the way we're using msync(). I experimented with this further by seeing whether the msync() code path is of any value on Sierra either. The answer seems to be "no": cloning a scale-1000 pgbench database takes about 17-18 seconds on my Sierra laptop using unmodified HEAD, but if I dike out the msync() logic then it takes 16-17 seconds. Both numbers jump around a little, but using msync is strictly worse. I propose therefore that an appropriate fix is to unconditionally disable the msync code path on Darwin, as we have already done for Windows. When and if Apple changes their kernel so that this path is actually of some value, we can figure out how to detect whether to use it. The msync logic seems to date back to this thread: https://www.postgresql.org/message-id/flat/alpine.DEB.2.10.150601132.28433%40sto wherein Andres opined >> I think this patch primarily needs: >> * Benchmarking on FreeBSD/OSX to see whether we should enable the >> mmap()/msync(MS_ASYNC) method by default. Unless somebody does so, I'm >> inclined to leave it off till then. but so far as I can tell from the thread, only testing on FreeBSD ever got done. So there's no evidence that this was ever beneficial on macOS, and we now have evidence that it's between counterproductive and catastrophic depending on which kernel version you look at. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] generated columns
So yes, distinguishing stored vs. not stored computed columns is useful, especially if the expression can refer to other columns of the same row, though not only then. Examples: -- useful only if stored (assuming these never get updated) inserted_at TIMESTAMP WITHOUT TIME ZONE AS (clock_timestamp()) -- useful only if stored uuid uuid AS (uuid_generate_v4()) -- useful only if stored who_done_it TEXT (current_user) -- useful especially if not stored user_at_host TEXT (user || '@' || host) -- useful if stored original_user_at_host TEXT (user || '@' || host) I assume once set, a stored computed column cannot be updated, though maybe being able to allow this would be ok. Obviously all of this can be done with triggers and VIEWs... The most useful case is where a computed column is NOT stored, because it saves you having to have a table and a view, while support for the stored case merely saves you having to have triggers. Of course, triggers for computing columns are rather verbose, so not having to write those would be convenient. Similarly with RLS. RLS is not strictly necessary since VIEWs and TRIGGERs allow one to accomplish much the same results, but it's a lot of work to get that right, while RLS makes most policies very pithy. (RLS for *update* policies, however, still can't refer to NEW and OLD, so one still has to resort to triggers for updates in many cases). Nico -- -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] GSoC 2017 : Patch for predicate locking in Gist index
On Mon, Oct 2, 2017 at 9:11 PM, Andrew Borodinwrote: > On Mon, Oct 2, 2017 at 8:00 PM, Alexander Korotkov > wrote: > > What happen if exactly this "continue" fires? > > > >> if (GistFollowRight(stack->page)) > >> { > >> if (!xlocked) > >> { > >> LockBuffer(stack->buffer, GIST_UNLOCK); > >> LockBuffer(stack->buffer, GIST_EXCLUSIVE); > >> xlocked = true; > >> /* someone might've completed the split when we unlocked */ > >> if (!GistFollowRight(stack->page)) > >> continue; > > > > > > In this case we might get xlocked == true without calling > > CheckForSerializableConflictIn(). > Indeed! I've overlooked it. I'm remembering this issue, we were > considering not fixing splits if in Serializable isolation, but > dropped the idea. > Yeah, current insert algorithm assumes that split must be fixed before we can correctly traverse the tree downwards. > CheckForSerializableConflictIn() must be after every exclusive lock. > I'm not sure, that fixing split is the case to necessary call CheckForSerializableConflictIn(). This lock on leaf page is not taken to do modification of the page. It's just taken to ensure that nobody else is fixing this split the same this. After fixing the split, it might appear that insert would go to another page. > I think it would be rather safe and easy for understanding to more > > CheckForSerializableConflictIn() directly before gistinserttuple(). > The difference is that after lock we have conditions to change page, > and before gistinserttuple() we have actual intent to change page. > > From the point of future development first version is better (if some > new calls occasionally spawn in), but it has 3 calls while your > proposal have 2 calls. > It seems to me that CheckForSerializableConflictIn() before > gistinserttuple() is better as for now. > Agree. -- Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: [HACKERS] generated columns
On Mon, Oct 02, 2017 at 02:30:38PM -0400, Tom Lane wrote: > Nico Williamswrites: > > On Mon, Oct 02, 2017 at 12:50:14PM -0400, Adam Brusselback wrote: > >> So for me, i'd rather default to compute on read, as long storing the > >> pre-computed value is an option when necessary. > > > Sure, I agree. I was just wondering whether there might be any other > > difference besides performance characteristics. The answer to that is, > > I think, "no". > > What about non-immutable functions in the generation expression? Aha, thanks! Yes, that would be noticeable. Nico -- -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] generated columns
On Mon, Oct 02, 2017 at 02:30:38PM -0400, Tom Lane wrote: > Nico Williamswrites: > > On Mon, Oct 02, 2017 at 12:50:14PM -0400, Adam Brusselback wrote: > >> So for me, i'd rather default to compute on read, as long storing the > >> pre-computed value is an option when necessary. > > > Sure, I agree. I was just wondering whether there might be any other > > difference besides performance characteristics. The answer to that is, > > I think, "no". > > What about non-immutable functions in the generation expression? Assuming they're permitted, which...well, I could make a case, they should be mutually exclusive with the cached option. I guess documenting the behavior in the manual would suffice, tempting as it would be to include a NOTICE when the table goes from having 0 or more generated columns all of which are immutable to having at least one that's not. Best, David. -- David Fetter http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david(dot)fetter(at)gmail(dot)com Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] generated columns
Nico Williamswrites: > On Mon, Oct 02, 2017 at 12:50:14PM -0400, Adam Brusselback wrote: >> So for me, i'd rather default to compute on read, as long storing the >> pre-computed value is an option when necessary. > Sure, I agree. I was just wondering whether there might be any other > difference besides performance characteristics. The answer to that is, > I think, "no". What about non-immutable functions in the generation expression? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Horrible CREATE DATABASE Performance in High Sierra
Brent Dearthwrites: > I just recently "upgraded" to High Sierra and experiencing horrendous CREATE > DATABASE performance. Creating a database from a 3G template DB used to > take ~1m but post-upgrade is taking ~22m at a sustained write of around > 4MB/s. Occasionally, attempting to create an empty database hangs > indefinitely as well. When this happens, restarting the Postgres server > allows empty database initialization in ~1s. What PG version are you running? I tried to reproduce this, using HEAD and what I had handy: (a) Late 2016 MacBook Pro, 2.7GHz i7, still on Sierra (b) Late 2013 MacBook Pro, 2.3GHz i7, High Sierra, drive is converted to APFS I made a ~7.5GB test database using "pgbench -i -s 500 bench" and then cloned it with "create database b2 with template bench". Case 1: fsync off. Machine A did the clone in 5.6 seconds, machine B in 12.9 seconds. Considering the CPU speed difference and the fact that Apple put significantly faster SSDs into the 2016 models, I'm not sure this difference is due to anything but better hardware. Case 2: fsync on. Machine A did the clone in 7.5 seconds, machine B in 2523.5 sec (42 min!). So something is badly busted in APFS' handling of fsync, and/or we're doing it in a bad way. Interestingly, pg_test_fsync shows only about a factor-of-2 difference in the timings for regular file fsyncs. So I poked into non-fsync logic that we'd added recently, and after awhile found that diking out the msync code path in pg_flush_data reduces machine B's time to an entirely reasonable 11.5 seconds. In short, therefore, APFS cannot cope with the way we're using msync(). I observe that the copy gets slower and slower as it runs (watching the transfer rate with "iostat 1"), which makes me wonder if there's some sort of O(N^2) issue in the kernel logic for this. But anyway, as a short-term workaround you might try diff --git a/src/backend/storage/file/fd.c b/src/backend/storage/file/fd.c index b0c174284b..af35de5f7d 100644 --- a/src/backend/storage/file/fd.c +++ b/src/backend/storage/file/fd.c @@ -451,7 +451,7 @@ pg_flush_data(int fd, off_t offset, off_t nbytes) return; } #endif -#if !defined(WIN32) && defined(MS_ASYNC) +#if 0 { void *p; static int pagesize = 0; regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] list of credits for release notes
Bruce, * Bruce Momjian (br...@momjian.us) wrote: > On Fri, Sep 29, 2017 at 12:00:05PM -0400, Peter Eisentraut wrote: > > On 9/29/17 11:35, Robert Haas wrote: > > > On Wed, Sep 27, 2017 at 8:29 PM, Michael Paquier > > >wrote: > > >> Looking at this list, the first name is followed by the family name, > > >> so there are inconsistencies with some Japanese names: > > >> - Fujii Masao should be Masao Fujii. > > >> - KaiGai Kohei should be Kohei Kaigai. > > > > > > But his emails say Fujii Masao, not Masao Fujii. > > > > > > KaiGai's case is a bit trickier, as his email name has changed over time. > > > > Yes, I used the form that the person used in their emails. > > How should this be handled for the Postgres 11 release notes? Ideally, we would let the individuals choose how to be recognized in release notes, and anywhere else we recognize them. We have the start of that in https://postgresql.org/account/profile but that isn't very easily tied to things in the commit history or elsewhere, yet. I'd suggest that we try to improve on that by: - Allowing anyone to include contributor information in their .Org account, even if they aren't listed on the Contributors page. - Add in a field along the lines of "Name to be used publicly" and let them decide what they'd like, possibly even with the option to not be publicly listed at all. - Add tracking of multiple email addresses into the .Org profile (somehow sync'd with the pglister system) - Start including contributor email addresses in commit messages along with names. I don't think we're that far off from this being possible to do in a more formal way that minimizes the risk of getting things incorrect, missing someone, or mis-attributing something. This all involves mostly work on the .Org system, which we do have some folks working on now but is also open source and it certainly wouldn't hurt to have more people involved, if there are others who are interested. The place to actually start the discussion of such changes would be -www though. Thanks! Stephen signature.asc Description: Digital signature
Re: [HACKERS] GSoC 2017 : Patch for predicate locking in Gist index
On Mon, Oct 2, 2017 at 8:00 PM, Alexander Korotkovwrote: > What happen if exactly this "continue" fires? > >> if (GistFollowRight(stack->page)) >> { >> if (!xlocked) >> { >> LockBuffer(stack->buffer, GIST_UNLOCK); >> LockBuffer(stack->buffer, GIST_EXCLUSIVE); >> xlocked = true; >> /* someone might've completed the split when we unlocked */ >> if (!GistFollowRight(stack->page)) >> continue; > > > In this case we might get xlocked == true without calling > CheckForSerializableConflictIn(). Indeed! I've overlooked it. I'm remembering this issue, we were considering not fixing splits if in Serializable isolation, but dropped the idea. CheckForSerializableConflictIn() must be after every exclusive lock. > I think it would be rather safe and easy for understanding to more > CheckForSerializableConflictIn() directly before gistinserttuple(). The difference is that after lock we have conditions to change page, and before gistinserttuple() we have actual intent to change page. >From the point of future development first version is better (if some new calls occasionally spawn in), but it has 3 calls while your proposal have 2 calls. It seems to me that CheckForSerializableConflictIn() before gistinserttuple() is better as for now. Best regards, Andrey Borodin. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] generated columns
On Mon, Oct 02, 2017 at 12:50:14PM -0400, Adam Brusselback wrote: > I know that for my use-cases, having both options available would be very > appreciated. The vast majority of the computed columns I would use in my > database would be okay to compute on read. But there are for sure some > which would be performance prohibitive to have compute on read, so i'd > rather have those stored. > > So for me, i'd rather default to compute on read, as long storing the > pre-computed value is an option when necessary. Sure, I agree. I was just wondering whether there might be any other difference besides performance characteristics. The answer to that is, I think, "no". -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] list of credits for release notes
On Fri, Sep 29, 2017 at 12:00:05PM -0400, Peter Eisentraut wrote: > On 9/29/17 11:35, Robert Haas wrote: > > On Wed, Sep 27, 2017 at 8:29 PM, Michael Paquier > >wrote: > >> Looking at this list, the first name is followed by the family name, > >> so there are inconsistencies with some Japanese names: > >> - Fujii Masao should be Masao Fujii. > >> - KaiGai Kohei should be Kohei Kaigai. > > > > But his emails say Fujii Masao, not Masao Fujii. > > > > KaiGai's case is a bit trickier, as his email name has changed over time. > > Yes, I used the form that the person used in their emails. How should this be handled for the Postgres 11 release notes? -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Ancient Roman grave inscription + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] CREATE COLLATION does not sanitize ICU's BCP 47 language tags. Should it?
On Mon, Oct 2, 2017 at 9:42 AM, Peter Eisentrautwrote: > I understand where you are coming from. My experience with developing > this feature has been that it is quite fragile in some ways. We have > had this out there for testing for many months, and we have fixed many > bugs, and follow-up bugs, up to very recently. We don't have good > automatic test coverage, so we need to rely on user testing. Making > significant code and design changes a week or two before the final > release is just too late, even if the changes were undisputed. And this > wasn't just one specific isolated change, it was a set of overlapping > changes that seemingly kept changing and expanding. For the record, I don't accept that summary. We could have not bothered with the sanitization thing at all, and I wouldn't have cared very much. I even revised my proposal such that you would never have needed to do the language tag mapping at all [1] (albeit while desupporting ICU versions prior to 4.6). And, as recently as 7 days ago, you yourself said: "Reading over this code again, it is admittedly not quite clear why this "canonicalization" is in there right now. I think it had something to do with how we built the keyword variants at one point. It might not make sense. I'd be glad to take that out and use the result straight from uloc_getAvailable() for collcollate." [2] This would have been far more radical than what I proposed. > I'm satisfied that what we are shipping is "good enough". It has been > in development for over a year, it has been reviewed and tested for > months, and a lot of credit for that goes to you. It is "good enough", I suppose. I am disappointed by this particular outcome, but that's mostly because it could have been avoided. I'm still very happy that you took on this project, and I don't want that to be overshadowed by this particular disagreement. > I'm available to discuss the changes you are envisioning, preferably one > at a time, in the near future as part of the PG11 development process. I don't really think that there is anything more to be done about the issues raised on this thread, with the possible exception of the DEBUG1 display name string thing. The sanitization just isn't valuable enough to add after the fact. Apart from the risks of adding it after the fact, another reason not to bother with it is that it's *incredibly* forgiving. So forgiving that you could reasonably argue that we might as well not have it at all. I may well do more on ICU with you in the future, but not in the area of how things are canonicalized or sanitized. It's too late for that now. [1] https://www.postgresql.org/message-id/CAH2-WzmVtRyNg2gT=u=ktEC-jM3aKq4bYzJ0u2=osxe+o3k...@mail.gmail.com [2] https://www.postgresql.org/message-id/f6c0fca7-e277-3f46-c0c1-adc001bff...@2ndquadrant.com -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Issues with logical replication
I have faced two issues with logical replication. Repro scenario: 1. start two Postgres instances (I start both at local machine). 2. Initialize pgbench tables at both instances: pgbench -i -s 10 postgres 3. Create publication of pgbench_accounts table at one node: create publication pub for table pgbench_accounts; 4. Create subscription at another node: delete from pgbench_accounts; CREATE SUBSCRIPTION sub connection 'dbname=postgres host=localhost port=5432 sslmode=disable' publication pub; CREATE OR REPLACE FUNCTION replication_trigger_proc() RETURNS TRIGGER AS $$ BEGIN -- RETURN NEW; END $$ LANGUAGE plpgsql; CREATE TRIGGER replication_trigger BEFORE INSERT OR UPDATE OR DELETE ON pgbench_accounts FOR EACH ROW EXECUTE PROCEDURE replication_trigger_proc(); ALTER TABLE pgbench_accounts ENABLE ALWAYS TRIGGER replication_trigger; 5. Start pgbench at primary node: pgbench -T 1000 -P 2 -c 10 postgres Please notice commented "return new" statement. Invocation of this function cause error and in log we see repeated messages: 2017-10-02 16:47:46.764 MSK [32129] LOG: logical replication table synchronization worker for subscription "sub", table "pgbench_accounts" has started 2017-10-02 16:47:46.771 MSK [32129] ERROR: control reached end of trigger procedure without RETURN 2017-10-02 16:47:46.771 MSK [32129] CONTEXT: PL/pgSQL function replication_trigger_proc() COPY pgbench_accounts, line 1: "110 " 2017-10-02 16:47:46.772 MSK [28020] LOG: worker process: logical replication worker for subscription 17264 sync 17251 (PID 32129) exited with exit code 1 ... After few minutes of work primary node (with publication) is crashed with the following stack trace: Program terminated with signal SIGABRT, Aborted. #0 0x7f3608f8ec37 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56 56../nptl/sysdeps/unix/sysv/linux/raise.c: No such file or directory. (gdb) bt #0 0x7f3608f8ec37 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56 #1 0x7f3608f92028 in __GI_abort () at abort.c:89 #2 0x009f5740 in ExceptionalCondition (conditionName=0xbf6b30 "!(((xid) != ((TransactionId) 0)))", errorType=0xbf69af "FailedAssertion", fileName=0xbf69a8 "lmgr.c", lineNumber=582) at assert.c:54 #3 0x0086ac1d in XactLockTableWait (xid=0, rel=0x0, ctid=0x0, oper=XLTW_None) at lmgr.c:582 #4 0x0081c9f7 in SnapBuildWaitSnapshot (running=0x2749198, cutoff=898498) at snapbuild.c:1400 #5 0x0081c7c7 in SnapBuildFindSnapshot (builder=0x2807910, lsn=798477760, running=0x2749198) at snapbuild.c:1311 #6 0x0081c339 in SnapBuildProcessRunningXacts (builder=0x2807910, lsn=798477760, running=0x2749198) at snapbuild.c:1106 #7 0x0080a1c7 in DecodeStandbyOp (ctx=0x27ef870, buf=0x7ffd301858d0) at decode.c:314 #8 0x00809ce1 in LogicalDecodingProcessRecord (ctx=0x27ef870, record=0x27efb30) at decode.c:117 #9 0x0080ddf9 in DecodingContextFindStartpoint (ctx=0x27ef870) at logical.c:458 #10 0x00823968 in CreateReplicationSlot (cmd=0x27483a8) at walsender.c:934 #11 0x008246ee in exec_replication_command ( cmd_string=0x27b9520 "CREATE_REPLICATION_SLOT \"sub_17264_sync_17251\" TEMPORARY LOGICAL pgoutput USE_SNAPSHOT") at walsender.c:1511 #12 0x0088eb44 in PostgresMain (argc=1, argv=0x275b738, dbname=0x275b578 "postgres", username=0x272b800 "knizhnik") at postgres.c:4086 #13 0x007ef9a1 in BackendRun (port=0x27532a0) at postmaster.c:4357 #14 0x007ef0cb in BackendStartup (port=0x27532a0) at postmaster.c:4029 #15 0x007eb68b in ServerLoop () at postmaster.c:1753 #16 0x007eac77 in PostmasterMain (argc=3, argv=0x2729670) at postmaster.c:1361 #17 0x00728552 in main (argc=3, argv=0x2729670) at main.c:228 Now fix the trigger function: CREATE OR REPLACE FUNCTION replication_trigger_proc() RETURNS TRIGGER AS $$ BEGIN RETURN NEW; END $$ LANGUAGE plpgsql; And manually perform at master two updates inside one transaction: postgres=# begin; BEGIN postgres=# update pgbench_accounts set abalance=abalance+1 where aid=26; UPDATE 1 postgres=# update pgbench_accounts set abalance=abalance-1 where aid=26; UPDATE 1 postgres=# commit; and in replcas log we see: 2017-10-02 18:40:26.094 MSK [2954] LOG: logical replication apply worker for subscription "sub" has started 2017-10-02 18:40:26.101 MSK [2954] ERROR: attempted to lock invisible tuple 2017-10-02 18:40:26.102 MSK [2882] LOG: worker process: logical replication worker for subscription 16403 (PID 2954) exited with exit code 1 Error happens in trigger.c: #3 0x0069bddb in GetTupleForTrigger (estate=0x2e36b50, epqstate=0x7ffc4420eda0, relinfo=0x2dcfe90, tid=0x2dd08ac, lockmode=LockTupleNoKeyExclusive, newSlot=0x7ffc4420ec40) at trigger.c:3103 #4 0x0069b259 in ExecBRUpdateTriggers
Re: [HACKERS] issue: record or row variable cannot be part of multiple-item INTO list
2017-10-02 18:44 GMT+02:00 Tom Lane: > Robert Haas writes: > > On Mon, Oct 2, 2017 at 12:28 PM, Tom Lane wrote: > >> I'm not sure if that's true or not. I am sure, though, that since > >> we've done B for twenty years we can't just summarily change to A. > > > I agree, but so what? You said that we couldn't adopt Pavel's > > proposal for this reason: > > > === > > IIRC, the reason for disallowing that is that it's totally unclear what > > the semantics ought to be. Is that variable a single target (demanding > > a compatible composite-valued column from the source query), or does it > > eat one source column per field within the record/row? The former is > 100% > > inconsistent with what happens if the record/row is the only INTO target; > > while the latter would be very bug-prone, and it's especially unclear > what > > ought to happen if it's an as-yet-undefined record variable. > > === > > > And I'm saying - that argument is bogus. Regardless of what people > > want or what we have historically done in the case where the > > record/row is the only INTO target, when there are multiple targets it > > seems clear that they want to match up the query's output columns with > > the INTO targets 1:1. So let's just do that. > > Arguing that that's what people want (even if I granted your argument, > which I do not) does not make the inconsistency magically disappear, > nor will it stop people from being confused by that inconsistency. > Furthermore, if we do it like this, we will be completely backed into > a backwards-compatibility corner if someone does come along and say > "hey, I wish I could do the other thing". > > I'm fine with doing something where we add new notation to dispel > the ambiguity. I don't want to put in another layer of inconsistency > and then have even more backwards-compatibility problems constraining > our response to the inevitable complaints. > I didn't talk about record type. I talked just only about composite variables (ROW in our terminology). I don't think so for this case the special syntax is necessary, although we can use a parallel assignment with different semantics for this case. What is a motivation for this thread? I had to migrate lot of Oracle procedures where was usually two OUT variables - first - known composite type (some state variable), and second - result (text or int variable). Now, the CALL of this function in Postgres is: SELECT fx() INTO rec; var_state := rec.state; var_result := rec.result; It works, Ora2pg supports it, plpgsql_check is able to check it, but it is not elegant and less readable. So, when target is not clean REC or ROW, I am think so we can allow assignment with few limits 1. The REC type should not be used 2. The target and source fields should be same - this assignment should not be tolerant like now. Because, this situation is not supported now, there is not a compatibility risk Some modern and now well known languages like GO supports parallel assignment. Can be it the special syntax requested by Tom? So there are two proposals: 1. Implement safe restrictive SELECT INTO where target can be combination of REC or scalars 2. Parallel assignment with new behave, that allows any list of REC, ROW or scalar as target - but composite should be attached to composite var, and scalar to scalar. List of scalars should be disallowed as target for composite value should be a) disallowed every time, b) disallowed when some target var is a composite. The differences between assign command and INTO command can be messy too. So the best solution should be one rules for := and INTO - but I am not sure if it is possible Comments? Regards Pavel > regards, tom lane >
Re: [HACKERS][BUG] Cache invalidation for queries that contains const of temporary composite type
On Mon, Oct 2, 2017 at 3:43 PM, Maksim Milyutinwrote: > On 26.09.2017 23:25, Maksim Milyutin wrote: > > Updated patchset contains more transparent definition of composite type > for constant node and regression test for described above buggy case. > > Is there any interest on the problem in this thread? > Probably everybody are too busy with upcoming release and other patches. Since this patch is a bug fix, it definitely should be considered. Please, register this patch at upcoming commitfest to ensure it wouldn't be forgotten. Regression test changes (both .sql and .out) are essential parts of the patch. I see no point in posting them separately. Please, incorporate them into your patch. Did you check this patch with manually created composite type (made by CREATE TYPE typname AS ...)? -- Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: [HACKERS] Issues with logical replication
On 02/10/17 17:49, Konstantin Knizhnik wrote: > I have faced two issues with logical replication. > Reproducing scenario: > > 1. start two Postgres instances (I start both at local machine). > 2. Initialize pgbench tables at both instances: > pgbench -i -s 10 postgres > 3. Create publication of pgbench_accounts table at one node: > create publication pub for table pgbench_accounts; > 4. Create subscription at another node: > delete from pgbench_accounts; > CREATE SUBSCRIPTION sub connection 'dbname=postgres host=localhost > port=5432 sslmode=disable' publication pub; > CREATE OR REPLACE FUNCTION replication_trigger_proc() RETURNS > TRIGGER AS $$ > BEGIN > -- RETURN NEW; > END $$ LANGUAGE plpgsql; > CREATE TRIGGER replication_trigger BEFORE INSERT OR UPDATE OR DELETE > ON pgbench_accounts FOR EACH ROW EXECUTE PROCEDURE > replication_trigger_proc(); > ALTER TABLE pgbench_accounts ENABLE ALWAYS TRIGGER replication_trigger; > 5. Start pgbench at primary node: > pgbench -T 1000 -P 2 -c 10 postgres > > > Please notice commented "return new" statement. Invocation of this > function cause error and in log we see repeated messages: > > 2017-10-02 16:47:46.764 MSK [32129] LOG: logical replication table > synchronization worker for subscription "sub", table "pgbench_accounts" > has started > 2017-10-02 16:47:46.771 MSK [32129] ERROR: control reached end of > trigger procedure without RETURN > 2017-10-02 16:47:46.771 MSK [32129] CONTEXT: PL/pgSQL function > replication_trigger_proc() > COPY pgbench_accounts, line 1: "1 1 0 " > 2017-10-02 16:47:46.772 MSK [28020] LOG: worker process: logical > replication worker for subscription 17264 sync 17251 (PID 32129) exited > with exit code 1 > ... > > > After few minutes of work primary node (with publication) is crashed > with the following stack trace: > > Program terminated with signal SIGABRT, Aborted. > #0 0x7f3608f8ec37 in __GI_raise (sig=sig@entry=6) at > ../nptl/sysdeps/unix/sysv/linux/raise.c:56 > 56 ../nptl/sysdeps/unix/sysv/linux/raise.c: No such file or directory. > (gdb) bt > #0 0x7f3608f8ec37 in __GI_raise (sig=sig@entry=6) at > ../nptl/sysdeps/unix/sysv/linux/raise.c:56 > #1 0x7f3608f92028 in __GI_abort () at abort.c:89 > #2 0x009f5740 in ExceptionalCondition (conditionName=0xbf6b30 > "!(((xid) != ((TransactionId) 0)))", > errorType=0xbf69af "FailedAssertion", fileName=0xbf69a8 "lmgr.c", > lineNumber=582) at assert.c:54 > #3 0x0086ac1d in XactLockTableWait (xid=0, rel=0x0, ctid=0x0, > oper=XLTW_None) at lmgr.c:582 > #4 0x0081c9f7 in SnapBuildWaitSnapshot (running=0x2749198, > cutoff=898498) at snapbuild.c:1400 > #5 0x0081c7c7 in SnapBuildFindSnapshot (builder=0x2807910, > lsn=798477760, running=0x2749198) at snapbuild.c:1311 > #6 0x0081c339 in SnapBuildProcessRunningXacts > (builder=0x2807910, lsn=798477760, running=0x2749198) at snapbuild.c:1106 Hmm this would suggest that XLOG_RUNNING_XACTS record contains invalid transaction ids but we specifically skip those in GetRunningTransactionData(). Can you check pg_waldump output for that record (lsn is shown in the backtrace)? > > Now fix the trigger function: > CREATE OR REPLACE FUNCTION replication_trigger_proc() RETURNS TRIGGER AS $$ > BEGIN > RETURN NEW; > END $$ LANGUAGE plpgsql; > > And manually perform at master two updates inside one transaction: > > postgres=# begin; > BEGIN > postgres=# update pgbench_accounts set abalance=abalance+1 where aid=26; > UPDATE 1 > postgres=# update pgbench_accounts set abalance=abalance-1 where aid=26; > UPDATE 1 > postgres=# commit; > > > and in replica log we see: > 2017-10-02 18:40:26.094 MSK [2954] LOG: logical replication apply > worker for subscription "sub" has started > 2017-10-02 18:40:26.101 MSK [2954] ERROR: attempted to lock invisible > tuple > 2017-10-02 18:40:26.102 MSK [2882] LOG: worker process: logical > replication worker for subscription 16403 (PID 2954) exited with exit > code 1 > > Error happens in trigger.c: > > #3 0x0069bddb in GetTupleForTrigger (estate=0x2e36b50, > epqstate=0x7ffc4420eda0, relinfo=0x2dcfe90, tid=0x2dd08ac, > lockmode=LockTupleNoKeyExclusive, newSlot=0x7ffc4420ec40) at > trigger.c:3103 > #4 0x0069b259 in ExecBRUpdateTriggers (estate=0x2e36b50, > epqstate=0x7ffc4420eda0, relinfo=0x2dcfe90, tupleid=0x2dd08ac, > fdw_trigtuple=0x0, slot=0x2dd0240) at trigger.c:2748 > #5 0x006d2395 in ExecSimpleRelationUpdate (estate=0x2e36b50, > epqstate=0x7ffc4420eda0, searchslot=0x2dd0358, slot=0x2dd0240) > at execReplication.c:461 > #6 0x00820894 in apply_handle_update (s=0x7ffc442163b0) at > worker.c:736 We have locked the same tuple in RelationFindReplTupleByIndex() just before this gets called and didn't get the same error. I guess we do something wrong with snapshots. Will need to investigate more. -- Petr Jelinek
Re: [HACKERS] GSoC 2017: weekly progress reports (week 6)
On Sun, Oct 1, 2017 at 11:53 AM, Shubham Baraiwrote: > Yes, page-level predicate locking should happen only when fast update is > off. > Actually, I forgot to put conditions in updated patch. Does everything > else look ok ? > I think that isolation tests should be improved. It doesn't seems that any posting tree would be generated by the tests that you've provided, because all the TIDs could fit the single posting list. Note, that you can get some insight into GIN physical structure using pageinspect contrib. -- Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: [HACKERS] generated columns
I know that for my use-cases, having both options available would be very appreciated. The vast majority of the computed columns I would use in my database would be okay to compute on read. But there are for sure some which would be performance prohibitive to have compute on read, so i'd rather have those stored. So for me, i'd rather default to compute on read, as long storing the pre-computed value is an option when necessary. Just my $0.02 Thanks, -Adam
Re: [HACKERS] issue: record or row variable cannot be part of multiple-item INTO list
Robert Haaswrites: > On Mon, Oct 2, 2017 at 12:28 PM, Tom Lane wrote: >> I'm not sure if that's true or not. I am sure, though, that since >> we've done B for twenty years we can't just summarily change to A. > I agree, but so what? You said that we couldn't adopt Pavel's > proposal for this reason: > === > IIRC, the reason for disallowing that is that it's totally unclear what > the semantics ought to be. Is that variable a single target (demanding > a compatible composite-valued column from the source query), or does it > eat one source column per field within the record/row? The former is 100% > inconsistent with what happens if the record/row is the only INTO target; > while the latter would be very bug-prone, and it's especially unclear what > ought to happen if it's an as-yet-undefined record variable. > === > And I'm saying - that argument is bogus. Regardless of what people > want or what we have historically done in the case where the > record/row is the only INTO target, when there are multiple targets it > seems clear that they want to match up the query's output columns with > the INTO targets 1:1. So let's just do that. Arguing that that's what people want (even if I granted your argument, which I do not) does not make the inconsistency magically disappear, nor will it stop people from being confused by that inconsistency. Furthermore, if we do it like this, we will be completely backed into a backwards-compatibility corner if someone does come along and say "hey, I wish I could do the other thing". I'm fine with doing something where we add new notation to dispel the ambiguity. I don't want to put in another layer of inconsistency and then have even more backwards-compatibility problems constraining our response to the inevitable complaints. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] CREATE COLLATION does not sanitize ICU's BCP 47 language tags. Should it?
On 10/1/17 14:26, Peter Geoghegan wrote: > It does seem too late. It's disappointing that we did not do better > here. This problem was entirely avoidable. I understand where you are coming from. My experience with developing this feature has been that it is quite fragile in some ways. We have had this out there for testing for many months, and we have fixed many bugs, and follow-up bugs, up to very recently. We don't have good automatic test coverage, so we need to rely on user testing. Making significant code and design changes a week or two before the final release is just too late, even if the changes were undisputed. And this wasn't just one specific isolated change, it was a set of overlapping changes that seemingly kept changing and expanding. Analyzing and reviewing that under pressure, and then effectively owning it going forward, too, is not something I could commit to. I'm satisfied that what we are shipping is "good enough". It has been in development for over a year, it has been reviewed and tested for months, and a lot of credit for that goes to you. The "old" locale support took many years to take shape, and this one probably will, too, but at some point I feel we have to let it be for a while and take a breather and get some feedback and field experience. I'm available to discuss the changes you are envisioning, preferably one at a time, in the near future as part of the PG11 development process. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] issue: record or row variable cannot be part of multiple-item INTO list
On Mon, Oct 2, 2017 at 12:28 PM, Tom Lanewrote: > Robert Haas writes: >> On Tue, Sep 19, 2017 at 3:18 PM, Tom Lane wrote: >>> That's certainly a case that we ought to support somehow. The problem is >>> staying reasonably consistent with the two-decades-old precedent of the >>> existing behavior for one target variable. > >> My point is that you objected to Pavel's proposal saying "it's not >> clear whether users want A or B". But I think they always want A. > > I'm not sure if that's true or not. I am sure, though, that since > we've done B for twenty years we can't just summarily change to A. I agree, but so what? You said that we couldn't adopt Pavel's proposal for this reason: === IIRC, the reason for disallowing that is that it's totally unclear what the semantics ought to be. Is that variable a single target (demanding a compatible composite-valued column from the source query), or does it eat one source column per field within the record/row? The former is 100% inconsistent with what happens if the record/row is the only INTO target; while the latter would be very bug-prone, and it's especially unclear what ought to happen if it's an as-yet-undefined record variable. === And I'm saying - that argument is bogus. Regardless of what people want or what we have historically done in the case where the record/row is the only INTO target, when there are multiple targets it seems clear that they want to match up the query's output columns with the INTO targets 1:1. So let's just do that. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [JDBC] [HACKERS] Channel binding support for SCRAM-SHA-256
On Fri, Sep 15, 2017 at 6:29 PM, Michael Paquierwrote: > I would like to point out that per the RFC, if the client attempts a > SSL connection with SCRAM and that the server supports channel > binding, then it has to publish the SASL mechanism for channel > binding, aka SCRAM-PLUS. If the client tries to force the use of SCRAM > even if SCRAM-PLUS is specified, this is seen as a downgrade attack by > the server which must reject the connection. So this parameter has > meaning only if you try to connect to a PG10 server using a PG11 > client (assuming that channel binding gets into PG11). If you connect > with a PG11 client to a PG11 server with SSL, the server publishes > SCRAM-PLUS, the client has to use it, hence this turns out to make > cbind=disable and prefer meaningless in the long-term. If the client > does not use SSL, then there is no channel binding, and cbind=require > loses its value. So cbind's fate is actually linked to sslmode. That seems problematic. What if the client supports SCRAM but not channel binding? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] issue: record or row variable cannot be part of multiple-item INTO list
Robert Haaswrites: > On Tue, Sep 19, 2017 at 3:18 PM, Tom Lane wrote: >> That's certainly a case that we ought to support somehow. The problem is >> staying reasonably consistent with the two-decades-old precedent of the >> existing behavior for one target variable. > My point is that you objected to Pavel's proposal saying "it's not > clear whether users want A or B". But I think they always want A. I'm not sure if that's true or not. I am sure, though, that since we've done B for twenty years we can't just summarily change to A. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Commitfest 201709 is now closed
On Mon, Oct 2, 2017 at 6:57 PM, Tom Lanewrote: > Daniel Gustafsson writes: > > Thanks to everyone who participated, and to everyone who have responded > to my > > nagging via the CF app email function. This is clearly an awesome > community. > > And thanks to you for your hard work as CFM! That's tedious and > largely thankless work, but it's needed to keep things moving. > +1, Thank you very much, Daniel! It was a pleasure working with you at commitfest. -- Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: [HACKERS] issue: record or row variable cannot be part of multiple-item INTO list
On Tue, Sep 19, 2017 at 3:18 PM, Tom Lanewrote: > Robert Haas writes: >> I think the fact that single-target INTO lists and multiple-target >> INTO lists are handled completely differently is extremely poor >> language design. It would have been far better, as you suggested >> downthread, to have added some syntax up front to let people select >> the behavior that they want, but I think there's little hope of >> changing this now without creating even more pain. > > How so? The proposal I gave is fully backwards-compatible. It's > likely not the way we'd do it in a green field, but we don't have > a green field. > >> I have a really hard time, however, imagining that anyone writes >> SELECT a, b, c, d, e, f, g, h, i, j, k INTO x, y, z and wants some of >> a-k to go into x, some more to go into y, and some more to go into z >> (and heaven help you if you drop a column from x or y -- now the whole >> semantics of the query change, yikes). What's reasonable is to write >> SELECT a, b, c INTO x, y, z and have those correspond 1:1. > > That's certainly a case that we ought to support somehow. The problem is > staying reasonably consistent with the two-decades-old precedent of the > existing behavior for one target variable. My point is that you objected to Pavel's proposal saying "it's not clear whether users want A or B". But I think they always want A. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH]make pg_rewind to not copy useless WAL files
On 2017-10-02 23:24:30,"Alexander Korotkov"wrote: On Sun, Oct 1, 2017 at 8:27 PM, chenhj wrote: Now, this patch looks good for me. It applies cleanly, builds cleanly, passes regression tests, new functionality is covered by regression tests. Code is OK for me and docs too. I'm marking this patch as "Ready for committer". BTW, authors field in the commitfest app is empty (https://commitfest.postgresql.org/15/1302/). Please, put your name there. I hope this patch will be committed during 2017-11 commitfest. Be ready to rebase this patch if needed. Thank you for your work. I had filled the authors field of this patch in commitfest, and will rebase this patch if needed. Thank you for your help! -- Best Regards, Chen Huajun
Re: [HACKERS] [PATCH] Assert that the correct locks are held when calling PageGetLSN()
On Mon, Oct 2, 2017 at 11:05 AM, Michael Paquierwrote: > Well, there are cases where you don't need any locking checks, and the > proposed patch ignores that. I understand that, but shouldn't we then look for a way to adjust the patch so that it doesn't have that issue any longer, rather than just kicking it to the curb? I mean, just saying "patch suxxor, next" doesn't seem like the right approach to something that has apparently already found real problems. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Commitfest 201709 is now closed
On Mon, Oct 2, 2017 at 11:57 AM, Tom Lanewrote: > Daniel Gustafsson writes: >> Thanks to everyone who participated, and to everyone who have responded to my >> nagging via the CF app email function. This is clearly an awesome community. > > And thanks to you for your hard work as CFM! That's tedious and > largely thankless work, but it's needed to keep things moving. +1. I think we need to figure out some plan for tackling the backlog of Ready for Committer patches, though. There are currently 42 of them, 10 of which are listed (maybe not all correctly) as bug fixes. That's not great. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 64-bit queryId?
On Mon, Oct 2, 2017 at 11:02 AM, Joshua D. Drakewrote: > +1 to both of these as well. OK, so here's a patch. Review appreciated. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company 64-bit-queryid-v1.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Commitfest 201709 is now closed
Daniel Gustafssonwrites: > Thanks to everyone who participated, and to everyone who have responded to my > nagging via the CF app email function. This is clearly an awesome community. And thanks to you for your hard work as CFM! That's tedious and largely thankless work, but it's needed to keep things moving. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Assert that the correct locks are held when calling PageGetLSN()
Michael Paquier wrote: > On Mon, Oct 2, 2017 at 11:44 PM, Robert Haaswrote: > > On Mon, Oct 2, 2017 at 10:19 AM, Michael Paquier > > wrote: > >> So those bits could be considered for integration. > > > > Yes, and they also tend to suggest that the rest of the patch has value. > > Well, there are cases where you don't need any locking checks, and the > proposed patch ignores that. Take for example pageinspect which works > on a copy of a page, or just WAL replay which serializes everything, > and in both cases PageGetLSN can be used safely. So for compatibility > reasons I think that PageGetLSN should be kept alone to not surprise > anybody using it, or at least an equivalent should be introduced. It > would be interesting to make BufferGetLSNAtomic hold tighter checks, > like something that makes use of LWLockHeldByMe for example. I'd argue about this in the same direction I argued about BufferGetPage() needing an LSN check that's applied separately: if it's too easy for a developer to do the wrong thing (i.e. fail to apply the separate LSN check after reading the page) then the routine should be patched to do the check itself, and another routine should be offered for the cases that explicitly can do without the check. I was eventually outvoted in the BufferGetPage() saga, though, so I'm not sure that that discussion sets precedent. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Horrible CREATE DATABASE Performance in High Sierra
I just recently "upgraded" to High Sierra and experiencing horrendous CREATE DATABASE performance. Creating a database from a 3G template DB used to take ~1m but post-upgrade is taking ~22m at a sustained write of around 4MB/s. Occasionally, attempting to create an empty database hangs indefinitely as well. When this happens, restarting the Postgres server allows empty database initialization in ~1s. I had been running on an encrypted APFS volume (FileVault), but after dropping encryption, saw the tasks drop to about *~15m* per run. Still a far cry from the previous *~1m* threshold. A multi-threaded pg_restore seems to sustain writes of ~38M/s and completes in about the same time as pre-upgrade (Sierra), so I'm not sure it's entirely related to APFS / disk IO. I've completely rebuilt the Postgres data directory, re-installed Postgres (Postgres.app 2.0.5) etc. I don't have any reasonable explanation for what could have broken so catastrophically. Coworker has seen the exact same issue. Has anyone else experienced this yet or have any insight as to what could be happening? Thanks in advance!
[HACKERS] Issues with logical replication
I have faced two issues with logical replication. Reproducing scenario: 1. start two Postgres instances (I start both at local machine). 2. Initialize pgbench tables at both instances: pgbench -i -s 10 postgres 3. Create publication of pgbench_accounts table at one node: create publication pub for table pgbench_accounts; 4. Create subscription at another node: delete from pgbench_accounts; CREATE SUBSCRIPTION sub connection 'dbname=postgres host=localhost port=5432 sslmode=disable' publication pub; CREATE OR REPLACE FUNCTION replication_trigger_proc() RETURNS TRIGGER AS $$ BEGIN -- RETURN NEW; END $$ LANGUAGE plpgsql; CREATE TRIGGER replication_trigger BEFORE INSERT OR UPDATE OR DELETE ON pgbench_accounts FOR EACH ROW EXECUTE PROCEDURE replication_trigger_proc(); ALTER TABLE pgbench_accounts ENABLE ALWAYS TRIGGER replication_trigger; 5. Start pgbench at primary node: pgbench -T 1000 -P 2 -c 10 postgres Please notice commented "return new" statement. Invocation of this function cause error and in log we see repeated messages: 2017-10-02 16:47:46.764 MSK [32129] LOG: logical replication table synchronization worker for subscription "sub", table "pgbench_accounts" has started 2017-10-02 16:47:46.771 MSK [32129] ERROR: control reached end of trigger procedure without RETURN 2017-10-02 16:47:46.771 MSK [32129] CONTEXT: PL/pgSQL function replication_trigger_proc() COPY pgbench_accounts, line 1: "110 " 2017-10-02 16:47:46.772 MSK [28020] LOG: worker process: logical replication worker for subscription 17264 sync 17251 (PID 32129) exited with exit code 1 ... After few minutes of work primary node (with publication) is crashed with the following stack trace: Program terminated with signal SIGABRT, Aborted. #0 0x7f3608f8ec37 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56 56../nptl/sysdeps/unix/sysv/linux/raise.c: No such file or directory. (gdb) bt #0 0x7f3608f8ec37 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56 #1 0x7f3608f92028 in __GI_abort () at abort.c:89 #2 0x009f5740 in ExceptionalCondition (conditionName=0xbf6b30 "!(((xid) != ((TransactionId) 0)))", errorType=0xbf69af "FailedAssertion", fileName=0xbf69a8 "lmgr.c", lineNumber=582) at assert.c:54 #3 0x0086ac1d in XactLockTableWait (xid=0, rel=0x0, ctid=0x0, oper=XLTW_None) at lmgr.c:582 #4 0x0081c9f7 in SnapBuildWaitSnapshot (running=0x2749198, cutoff=898498) at snapbuild.c:1400 #5 0x0081c7c7 in SnapBuildFindSnapshot (builder=0x2807910, lsn=798477760, running=0x2749198) at snapbuild.c:1311 #6 0x0081c339 in SnapBuildProcessRunningXacts (builder=0x2807910, lsn=798477760, running=0x2749198) at snapbuild.c:1106 #7 0x0080a1c7 in DecodeStandbyOp (ctx=0x27ef870, buf=0x7ffd301858d0) at decode.c:314 #8 0x00809ce1 in LogicalDecodingProcessRecord (ctx=0x27ef870, record=0x27efb30) at decode.c:117 #9 0x0080ddf9 in DecodingContextFindStartpoint (ctx=0x27ef870) at logical.c:458 #10 0x00823968 in CreateReplicationSlot (cmd=0x27483a8) at walsender.c:934 #11 0x008246ee in exec_replication_command ( cmd_string=0x27b9520 "CREATE_REPLICATION_SLOT \"sub_17264_sync_17251\" TEMPORARY LOGICAL pgoutput USE_SNAPSHOT") at walsender.c:1511 #12 0x0088eb44 in PostgresMain (argc=1, argv=0x275b738, dbname=0x275b578 "postgres", username=0x272b800 "knizhnik") at postgres.c:4086 #13 0x007ef9a1 in BackendRun (port=0x27532a0) at postmaster.c:4357 #14 0x007ef0cb in BackendStartup (port=0x27532a0) at postmaster.c:4029 #15 0x007eb68b in ServerLoop () at postmaster.c:1753 #16 0x007eac77 in PostmasterMain (argc=3, argv=0x2729670) at postmaster.c:1361 #17 0x00728552 in main (argc=3, argv=0x2729670) at main.c:228 Now fix the trigger function: CREATE OR REPLACE FUNCTION replication_trigger_proc() RETURNS TRIGGER AS $$ BEGIN RETURN NEW; END $$ LANGUAGE plpgsql; And manually perform at master two updates inside one transaction: postgres=# begin; BEGIN postgres=# update pgbench_accounts set abalance=abalance+1 where aid=26; UPDATE 1 postgres=# update pgbench_accounts set abalance=abalance-1 where aid=26; UPDATE 1 postgres=# commit; and in replica log we see: 2017-10-02 18:40:26.094 MSK [2954] LOG: logical replication apply worker for subscription "sub" has started 2017-10-02 18:40:26.101 MSK [2954] ERROR: attempted to lock invisible tuple 2017-10-02 18:40:26.102 MSK [2882] LOG: worker process: logical replication worker for subscription 16403 (PID 2954) exited with exit code 1 Error happens in trigger.c: #3 0x0069bddb in GetTupleForTrigger (estate=0x2e36b50, epqstate=0x7ffc4420eda0, relinfo=0x2dcfe90, tid=0x2dd08ac, lockmode=LockTupleNoKeyExclusive, newSlot=0x7ffc4420ec40) at trigger.c:3103 #4 0x0069b259 in ExecBRUpdateTriggers
[HACKERS] Commitfest 201709 is now closed
We have now entered October, which marks the end of Commitfest 201709. All patches have now been dealt with and the commitfest is closed. Before starting on moving, and closing, patches the stat for this commitfest were: Needs review: 69. Waiting on Author: 22. Ready for Committer: 40. Committed: 88. Moved to next CF: 11. Rejected: 8. Returned with Feedback: 18. === Total: 256. 29 patches were in need of review but didn’t have a reviewer assigned (and no review posted och -hackers), these have all been moved to the next CF. On top of that, 40 patches were Ready for committer, and have been moved to the next commitfest. The next commitfest is thus quite the smorgasbord for committers already. A few patches had been "Waiting for author" during the entire commitfest without seeing any updates, and were thus closed as “Returned with feddback”. A few others patches were Returned with feedback too, due to either not moving from Waiting for author, or due to review comments being made. And last, but not least, a lot of patches were pushed to the next commitfest. What I’ve tried to do is move patches such that every patch submitted has a fair chance at a good review. Also, patches being actively discussed and worked on were moved of course. With the volumes in mind, I’m absolutely certain I’ve botched one or two up though. If you have a patch that was moved, and you intend to rewrite enough of it to warrant a resubmission then please go in and close your entry. In summary, this commitfest saw a large number of patches, and while lots of patches got reviewed and committed, there is a lot of improvement to be had when it comes to closing them. We clearly need more bandwidth among reviewers and committers to cope with the increasing size of the commitfests. There is a clear risk that too much gets moved to the next commitfest making each new commitfest progressively larger and harder to handle. I don’t have any good answers, but we probably need to think of something going forward. Thanks to everyone who participated, and to everyone who have responded to my nagging via the CF app email function. This is clearly an awesome community. cheers ./daniel -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH]make pg_rewind to not copy useless WAL files
On Sun, Oct 1, 2017 at 8:27 PM, chenhjwrote: > On 2017-10-01 04:09:19,"Alexander Korotkov" > wrote: > > On Sat, Sep 30, 2017 at 8:18 PM, chenhj wrote: > >> On 2017-09-30 02:17:54,"Alexander Korotkov" > > wrote: >> >> >> Great. Now code of this patch looks good for me. >> However, we forgot about documentation. >> >> >>>The result is equivalent to replacing the target data directory with >>> the >>>source one. Only changed blocks from relation files are copied; >>>all other files are copied in full, including configuration files. The >>>advantage of pg_rewind over taking a new base backup, >>> or >>>tools like rsync, is that pg_rewind >>> does >>>not require reading through unchanged blocks in the cluster. This >>> makes >>>it a lot faster when the database is large and only a small >>>fraction of blocks differ between the clusters. >>> >> >> >> At least, this paragraph need to be adjusted, because it states whose >> files are copied. And probably latter paragraphs whose state about WAL >> files. >> >> >> >> Your are rigth. >> I wrote a draft as following, but i'm afraid whether the english >> statement is accurate. >> > > I'm not native english speaker too :( > > Only the WAL files between the point of divergence and the current WAL >> insert location of the source server are copied, *for* other WAL files are >> useless for the target server. > > > I'm not sure about this usage of word *for*. For me, it probably should > be just removed. Rest of changes looks good for me. Please, integrate > them into the patch. > > > I had removed the *for* , Pleae check the new patch again. > Now, this patch looks good for me. It applies cleanly, builds cleanly, passes regression tests, new functionality is covered by regression tests. Code is OK for me and docs too. I'm marking this patch as "Ready for committer". BTW, authors field in the commitfest app is empty (https://commitfest.postgresql.org/15/1302/). Please, put your name there. I hope this patch will be committed during 2017-11 commitfest. Be ready to rebase this patch if needed. Thank you for your work. -- Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: [HACKERS] generated columns
On Thu, Aug 31, 2017 at 12:16:43AM -0400, Peter Eisentraut wrote: > In previous discussions, it has often been a source of confusion whether > these generated columns are supposed to be computed on insert/update and > stored, or computed when read. The SQL standard is not explicit, but > appears to lean toward stored. DB2 stores. Oracle computes on read. Question: How would one know the difference between storing computed columns vs. computing them on read? Answer?: Performance. If the computation is slow, then you'll really notice on read. Nico -- -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Assert that the correct locks are held when calling PageGetLSN()
On Mon, Oct 2, 2017 at 11:44 PM, Robert Haaswrote: > On Mon, Oct 2, 2017 at 10:19 AM, Michael Paquier > wrote: >> So those bits could be considered for integration. > > Yes, and they also tend to suggest that the rest of the patch has value. Well, there are cases where you don't need any locking checks, and the proposed patch ignores that. Take for example pageinspect which works on a copy of a page, or just WAL replay which serializes everything, and in both cases PageGetLSN can be used safely. So for compatibility reasons I think that PageGetLSN should be kept alone to not surprise anybody using it, or at least an equivalent should be introduced. It would be interesting to make BufferGetLSNAtomic hold tighter checks, like something that makes use of LWLockHeldByMe for example. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 64-bit queryId?
On 10/01/2017 04:22 PM, Robert Haas wrote: On Sun, Oct 1, 2017 at 3:48 PM, Greg Starkwrote: Well these kinds of monitoring systems tend to be used by operations people who are a lot more practical and a lot less worried about theoretical concerns like that. +1, well said. In context the point was merely that the default pg_stat_statements.max of 5000 isn't sufficient to argue that 32-bit values are enough. It wouldn't be hard for there to be 64k different queries over time and across all the databases in a fleet and at that point it becomes likely there'll be a 32-bit collision. Yeah. I think Alexander Korotkov's points are quite good, too. +1 to both of these as well. jD -- Command Prompt, Inc. || http://the.postgres.company/ || @cmdpromptinc PostgreSQL Centered full stack support, consulting and development. Advocate: @amplifypostgres || Learn: https://pgconf.us * Unless otherwise stated, opinions are my own. * -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Logging idle checkpoints
Vik, all, * Vik Fearing (vik.fear...@2ndquadrant.com) wrote: > I recently had a sad because I noticed that checkpoint counts were > increasing in pg_stat_bgwriter, but weren't accounted for in my logs > with log_checkpoints enabled. > After some searching, I found that it was the idle checkpoints that > weren't being logged. I think this is a missed trick in 6ef2eba3f57. > Attached is a one-liner fix. I realize how imminent we are from > releasing v10 but I hope there is still time for such a minor issue as this. Idle checkpoints aren't, well, really checkpoints though. If anything, seems like we shouldn't be including skipped checkpoints in the pg_stat_bgwriter count because we aren't actually doing a checkpoint. I certainly don't care for the idea of adding log messages saying we aren't doing anything just to match a count that's incorrectly claiming that checkpoints are happening when they aren't. The down-thread suggestion of keeping track of skipped checkpoints might be interesting, but I'm not entirely convinced it really is. We have time to debate that, of course, but I don't really see how that's helpful. At the moment, it seems like the suggestion to add that column is based on the assumption that we're going to start logging skipped checkpoints and having that column would allow us to match up the count between the new column and the "skipped checkpoint" messages in the logs and I can not help but feel that this is a ridiculous amount of effort being put into the analysis of something that *didn't* happen. Thanks! Stephen signature.asc Description: Digital signature
Re: [HACKERS] GSoC 2017 : Patch for predicate locking in Gist index
Hi, Andrew! On Mon, Oct 2, 2017 at 1:40 PM, Andrew Borodinwrote: > Thanks for looking into the patch! > > On Thu, Sep 28, 2017 at 3:59 PM, Alexander Korotkov < > a.korot...@postgrespro.ru> wrote: > >> >> >> In gistdoinsert() you do CheckForSerializableConflictIn() only if page >> wasn't exclusively locked before (xlocked is false). >> >> if (!xlocked) >>> { >>> LockBuffer(stack->buffer, GIST_UNLOCK); >>> LockBuffer(stack->buffer, GIST_EXCLUSIVE); >>> CheckForSerializableConflictIn(r, NULL, stack->buffer); >>> xlocked = true; >> >> >> However, page might be exclusively locked before. And in this case >> CheckForSerializableConflictIn() would be skipped. That happens very >> rarely (someone fixes incomplete split before we did), but nevertheless. >> > > if xlocked = true, page was already checked for conflict after setting > exclusive lock on it's buffer. I still do not see any problem here... > What happen if exactly this "continue" fires? if (GistFollowRight(stack->page)) > { > if (!xlocked) > { > LockBuffer(stack->buffer, GIST_UNLOCK); > LockBuffer(stack->buffer, GIST_EXCLUSIVE); > xlocked = true; > /* someone might've completed the split when we unlocked */ > if (!GistFollowRight(stack->page)) > continue; In this case we might get xlocked == true without calling CheckForSerializableConflictIn(). This is very rare codepath, but still... I think it would be rather safe and easy for understanding to more CheckForSerializableConflictIn() directly before gistinserttuple(). -- Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: [HACKERS] [PATCH] Assert that the correct locks are held when calling PageGetLSN()
On Mon, Oct 2, 2017 at 10:19 AM, Michael Paquierwrote: > So those bits could be considered for integration. Yes, and they also tend to suggest that the rest of the patch has value. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] parallelize queries containing initplans
On Thu, Sep 14, 2017 at 10:45 PM, Amit Kapilawrote: > The latest patch again needs to be rebased. Find rebased patch > attached with this email. I read through this patch this morning. Copying Tom in the hopes that he might chime in on the following two issues in particular: 1. Is there any superior alternative to adding ptype to ParamExecData? I think the reason why we don't have this already is because the plan tree that populates the Param must output the right type and the plan tree that reads the Param must expect the right type, and there's no need for anything else. But for serialization and deserialization this seems to be necessary. I wonder whether it would be better to try to capture this at the time the Param is generated (e.g. var->vartype in assign_param_for_var) rather than derived at execution time by applying exprType(), but I'm not sure how that would work exactly, or whether it's really any better. 2. Do max_parallel_hazard_walker and set_param_references() have the right idea about which parameters are acceptable candidates for this optimization? The idea seems to be to collect the setParam sets for all initplans between the current query level and the root. That looks correct to me but I'm not an expert on this parameter stuff. Some other comments: - I think parallel_hazard_walker should likely work by setting safe_param_ids to the right set of parameter IDs rather than testing whether the parameter is safe by checking either whether it is in safe_param_ids or some other condition is met. - contains_parallel_unsafe_param() sounds like a function that merely returns true or false, but it actually has major side effects. Those side effects also look unsafe; mutating previously-generated paths can corrupt the rel's pathlist, because it will no longer have the sort order and other characteristics that add_path() creates and upon which other code relies. - Can't is_initplan_is_below_current_query_level() be confused when there are multiple subqueries in the tree? For example if the toplevel query has subqueries a and b and a has a sub-subquery aa which has an initplan, won't this function think that b has an initplan below the current query level? If not, maybe a comment is in order explaining why not? - Why do we even need contains_parallel_unsafe_param() and is_initplan_is_below_current_query_level() in the first place, anyway? I think there's been some discussion of that on this thread, but I'm not sure I completely understand it, and the comments in the patch don't help me understand why we now need this restriction. - The new code in ExplainPrintPlan() needs a comment. - I have typically referred in comments to "Gather or Gather Merge" rather than "gather or gather merge". -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Assert that the correct locks are held when calling PageGetLSN()
On Mon, Oct 2, 2017 at 9:09 PM, Robert Haaswrote: > I think the first question we ought to be asking ourselves is whether > any of the PageGetLSN -> BufferGetLSNAtomic changes the patch > introduces are live bugs. If they are, then we ought to fix those > separately (and probably back-patch). If they are not, then we need > to think about how to adjust the patch so that it doesn't complain > about things that are in fact OK. If you look at each item one by one that the patch touches based on the contract defined in transam/README... --- a/src/backend/access/gist/gist.c +++ b/src/backend/access/gist/gist.c @@ -640,7 +640,7 @@ gistdoinsert(Relation r, IndexTuple itup, Size freespace, GISTSTATE *giststate) } stack->page = (Page) BufferGetPage(stack->buffer); - stack->lsn = PageGetLSN(stack->page); + stack->lsn = BufferGetLSNAtomic(stack->buffer); There is an incorrect use of PageGetLSN here. A shared lock can be taken on the page but there is no buffer header lock used when using PageGetLSN. @@ -890,7 +890,7 @@ gistFindPath(Relation r, BlockNumber child, OffsetNumber *downlinkoffnum) break; } - top->lsn = PageGetLSN(page); + top->lsn = BufferGetLSNAtomic(buffer); Here as well only a shared lock is taken on the page. @@ -61,7 +61,7 @@ gistkillitems(IndexScanDesc scan) * read. killedItems could be not valid so LP_DEAD hints applying is not * safe. */ - if (PageGetLSN(page) != so->curPageLSN) + if (BufferGetLSNAtomic(buffer) != so->curPageLSN) { UnlockReleaseBuffer(buffer); so->numKilled = 0; /* reset counter */ @@ -384,7 +384,7 @@ gistScanPage(IndexScanDesc scan, GISTSearchItem *pageItem, double *myDistances, * safe to apply LP_DEAD hints to the page later. This allows us to drop * the pin for MVCC scans, which allows vacuum to avoid blocking. */ - so->curPageLSN = PageGetLSN(page); + so->curPageLSN = BufferGetLSNAtomic(buffer); Those two as well only use a shared lock. @@ -249,7 +249,7 @@ gistbulkdelete(IndexVacuumInfo *info, IndexBulkDeleteResult *stats, ptr = (GistBDItem *) palloc(sizeof(GistBDItem)); ptr->blkno = ItemPointerGetBlockNumber(&(idxtuple->t_tid)); - ptr->parentlsn = PageGetLSN(page); + ptr->parentlsn = BufferGetLSNAtomic(buffer); ptr->next = stack->next; stack->next = ptr; Here also a shared lock is only taken, that's a VACUUM code path for Gist indexes. +++ b/src/backend/access/nbtree/nbtsearch.c @@ -1224,7 +1224,7 @@ _bt_readpage(IndexScanDesc scan, ScanDirection dir, OffsetNumber offnum) * safe to apply LP_DEAD hints to the page later. This allows us to drop * the pin for MVCC scans, which allows vacuum to avoid blocking. */ - so->currPos.lsn = PageGetLSN(page); + so->currPos.lsn = BufferGetLSNAtomic(so->currPos.buf); Here the caller of _bt_readpage should have taken a lock, but the page is not necessarily pinned. Still, _bt_getbuf makes sure that the pin is done. +++ b/src/backend/access/nbtree/nbtutils.c @@ -1772,7 +1772,7 @@ _bt_killitems(IndexScanDesc scan) return; page = BufferGetPage(buf); - if (PageGetLSN(page) == so->currPos.lsn) + if (BufferGetLSNAtomic(buf) == so->currPos.lsn) so->currPos.buf = buf; Same here thanks to _bt_getbuf. So those bits could be considered for integration. Also, if I read the gist code correctly, there is one other case in gistFindCorrectParent. And in btree, there is one occurence in _bt_split. In XLogRecordAssemble, there could be more checks regarding the locks taken on the buffer registered. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers