Re: AS OF queries
On 26.01.2018 03:55, Bruce Momjian wrote: On Sat, Dec 23, 2017 at 11:53:19PM +0300, konstantin knizhnik wrote: On Dec 23, 2017, at 2:08 AM, Greg Stark wrote: On 20 December 2017 at 12:45, Konstantin Knizhnik wrote: It seems to me that it will be not so difficult to implement them in Postgres - we already have versions of tuples. Looks like we only need to do three things: 1. Disable autovacuum (autovacuum = off) "The Wheel of Time turns, and Ages come and pass, leaving memories that become legend. Legend fades to myth, and even myth is long forgotten when the Age that gave it birth comes again" I think you'll find it a lot harder to get this to work than just disabling autovacuum. Notably HOT updates can get cleaned up (and even non-HOT updates can now leave tombstone dead line pointers iirc) even if vacuum hasn't run. Yeh, I suspected that just disabling autovacuum was not enough. I heard (but do no know too much) about microvacuum and hot updates. This is why I was a little bit surprised when me test didn't show lost of updated versions. May be it is because of vacuum_defer_cleanup_age. Well vacuum and single-page pruning do 3 things: 1. remove expired updated rows 2. remove deleted row 3. remove rows from aborted transactions While time travel doesn't want #1 and #2, it probably wants #3. Rows of aborted transactions are in any case excluded by visibility checks. Definitely skipping them costs some time, so large percent of aborted transactions may affect query speed. But query speed is reduced in any case if in order to support time travel we prohibit or postpone vacuum. What is the expected relation of committed and aborted transactions? I expected that it should be much bigger than one (especially if we take in account only read-write transaction which has really updated database). In this case number of versions created by aborted transaction should be much smaller than number of versions created by updated/delete of successful transactions. So them should not have significant impact on performance. -- Konstantin Knizhnik Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: [HACKERS] GnuTLS support
> On 26 Jan 2018, at 02:10, Michael Paquier wrote: > On Fri, Jan 26, 2018 at 12:27:16AM +0100, Daniel Gustafsson wrote: >> While only tangentially related to the issue this patch solves, converting >> be_tls_get_peerdn_name() to return const char * seems reasonable too to keep >> the API consistent. > > Why? This is not used for error message generation yet. We could always > change the API as needed later on. Mainly to keep the be_tls_* interface consistent in how the routines return data. I don’t have any strong opinions though, it just seemed like a good time to do it. cheers ./daniel
Re: JIT compiling with LLVM v9.0
Hi, I've spent the last weeks working on my LLVM compilation patchset. In the course of that I *heavily* revised it. While still a good bit away from committable, it's IMO definitely not a prototype anymore. Below are results on my system for Q1 TPC-H scale 10 (~13Gb database) Options Time Default 20075 jit_expressions=on 16105 jit_tuple_deforming=on 14734 jit_perform_inlining=on 13441 Also I noticed that parallel execution didsables JIT. At my computer with 4 cores time of Q1 with parallel execution is 6549. Are there any principle problems with combining JIT and parallel execution? -- Konstantin Knizhnik Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: STATISTICS retained in CREATE TABLE ... LIKE (INCLUDING ALL)?
On 21 January 2018 at 19:21, David Rowley wrote: > On 20 January 2018 at 18:50, Tom Lane wrote: >> Stephen Froehlich writes: >>> Are custom statistics in PG10 retained in LIKE (INCLUDING ALL) or do I need >>> to recreate the statistics by hand each time I create a new table? >> >> LIKE doesn't know about those, no. Perhaps it should. > > Agreed. ALL does not mean MOST. (Moving to -hackers) The attached implements this. Looking at "LIKE ALL" in more detail in the docs it claims to be equivalent to "INCLUDING DEFAULTS INCLUDING IDENTITY INCLUDING CONSTRAINTS INCLUDING INDEXES INCLUDING STORAGE INCLUDING COMMENTS", so it didn't seem right to magically have LIKE ALL include something that there's no option to include individually, so I added INCLUDING STATISTICS. This also required writing code allow statistics to be created without a name. The code I added to choose the default name is in the form ___stat. I'm sure someone will want something else, but that's just what I've personally standardised on. I'd also be fine with "stx" or "sta". Making this work required a bit of shuffling of error checking in CreateStatistics() so that we generate a name before complaining about any duplicate name. I'm unsure if this should be categorised as a bug fix or a new feature. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services create_table_like_statistics_v1.patch Description: Binary data
Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)
On Fri, Jan 26, 2018 at 12:00 PM, Peter Geoghegan wrote: > On Thu, Jan 25, 2018 at 10:00 PM, Amit Kapila wrote: >>> At this point, my preferred solution is for someone to go implement >>> Amit's WaitForParallelWorkersToAttach() idea [1] (Amit himself seems >>> like the logical person for the job). >>> >> >> I can implement it and share a prototype patch with you which you can >> use to test parallel sort stuff. > > That would be great. Thank you. > >> I would like to highlight the >> difference which you will see with WaitForParallelWorkersToAttach as >> compare to WaitForParallelWorkersToFinish() is that the former will >> give you how many of nworkers_launched workers are actually launched >> whereas latter gives an error if any of the expected workers is not >> launched. I feel former is good and your proposed way of calling it >> after the leader is done with its work has alleviated the minor >> disadvantage of this API which is that we need for workers to startup. > > I'm not sure that it makes much difference, though, since in the end > WaitForParallelWorkersToFinish() is called anyway, much like > nodeGather.c. Have I missed something? > Nopes, you are right. I had in my mind that if we have something like what I am proposing, then we don't even need to detect failures in WaitForParallelWorkersToFinish and we can finish the work without failing. > I had imagined that WaitForParallelWorkersToAttach() would give me an > error in the style of WaitForParallelWorkersToFinish(), without > actually waiting for the parallel workers to finish. > I think that is also doable. I will give it a try and report back if I see any problem with it. However, it might take me some time as I am busy with few other things and I am planning to take two days off for some personal reasons, OTOH if it turns out to be a simple (which I expect it should be), then I will report back early. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] [PROPOSAL] Temporal query processing with range types
On Mon, 2018-01-08 at 11:18 -0500, Robert Haas wrote: > I also don't agree with the idea that we should reject syntax that > doesn't appear in the SQL standard. We have a great deal of such > syntax already, and we add more of it in every release, and a good > deal of it is contributed by you and your colleagues. I don't see > why this patch should be held to a stricter standard than we do in > general. I agree that there is some possibility for pain if the SQL > standards committee adopts syntax that is similar to whatever we pick > but different in detail, but I don't think we should be too worried > about that unless other database systems, such as Oracle, have syntax > that is similar to what is proposed here but different in > detail. The > SQL standards committee seems to like standardizing on whatever > companies with a lot of money have already implemented; it's unlikely > that they are going to adopt something totally different from any > existing system but inconveniently similar to ours. We agree with you. Best regards, Anton, Johann, Michael, Peter
Re: [HACKERS] [PROPOSAL] Temporal query processing with range types
On Sun, 2018-01-07 at 09:58 +, Simon Riggs wrote: > > The attached README explains the ALIGN operation step-by-step with > > a TEMPORAL LEFT OUTER JOIN example. That is, we start from a query > > input, show how we rewrite it during parser stage, and show how the > > final execution generates result tuples. > Sorry, this was too complex for me. > > Can we get a much simpler example please? Please see the following simpler example: DROP TABLE budg; CREATE TABLE budg(name VARCHAR(5), amnt INTEGER, t DATERANGE); INSERT INTO budg VALUES ('Joe', 5, '[2012/2/1,2012/9/1)'); INSERT INTO budg VALUES ('Ann', 7, '[2012/5/1,2012/9/1)'); INSERT INTO budg VALUES ('Per', 3, '[2012/4/1,2012/10/1)'); SELECT * FROM budg AS r; SELECT * FROM ( budg r ALIGN budg s ON s.amnt > r.amnt WITH (t,t)) r WHERE NOT EXISTS ( SELECT * FROM (budg s ALIGN budg r ON s.amnt > r.amnt WITH (t,t)) s WHERE s.amnt > r.amnt AND r.t = s.t ); -- name | amnt |t -- --+--+- -- Joe |5 | [2012-02-01,2012-05-01) -- Ann |7 | [2012-05-01,2012-09-01) -- Per |3 | [2012-09-01,2012-10-01) Best regards, Anton, Johann, Michael, Peter
Re: [HACKERS] [PROPOSAL] Temporal query processing with range types
On Sat, 2018-01-06 at 20:29 +, Simon Riggs wrote: > * various PostJoinSetProjection() functions to be called as needed > So the idea is we enable Postgres to allow major new functionality, > as was done for PostGIS so successfully. If we use a post-join approach many intermediate result tuples, that do not contribute to the final result would be emmitted, and thus the performance would suffer. Best regards, Anton, Johann, Michael, Peter
Re: [HACKERS] [PROPOSAL] Temporal query processing with range types
On Thu, 2017-11-30 at 12:26 -0500, Robert Haas wrote: > I wonder if we could instead think about R NORMALIZE S ON R.x = S.y > WITH (R.time, S.time) as an ordinary join on R.x = S.y with some > extra processing afterwards. After finding all the join partners for > a row in R, extract all the lower and upper bounds from the S.time > fields of the join partners and use that to emit as many rows from R > as may be necessary. We have now a new approach to plan and execute NORMALIZE as a special join node of type NORMALIZE, an append-plan on the inner join path, and a merge-join executor. For the latter, we would need to extend nodeMergejoin.c with an point-in-range-containment join. That is, we create a new join path within sort_inner_and_outer (joinpath.c). First two projection nodes to extract the start- and end-timepoints of the range type used as interval, and above an append-plan to merge both subplans. In detail, outer_path is just sort, whereas inner_path is append of (B, ts) projection with (B, te) projection. Hereby, B is a set of non-temporal attributes used in join equality predicates, and [ts,te) forms the valid-time interval. Non-equality predicates must be handled separately as a filter step. Do you think, it is a good idea to extend the sort_inner_and_outer() with a new branch, where jointype == NORMALIZE and add the projection and append sub-paths there? > The main problem that I see with that approach is that it > seems desirable to separate the extra-processing-afterwards step into > a separate executor node, and there's no way for a separate type of > plan node to know when the join advances the outer side of the join. > That's the purpose that row_id() is serving as you have it; we'd have > to invent some other kind of signalling mechanism, which might not be > entirely trivial. :-( One advantage of the new approach is that row_id() is no longer needed. The executor knows that it is inside a new "group" after every NEXTOUTER, then the whole loop of the inner relation has the same row_id. The whole mechanism could be handled through the MergeJoinState struct. > If we could do it, though, it might be quite a bit more efficient, > because it would avoid scanning S twice and performing a UNION on the > results of those scans. Also, we wouldn't necessarily need to sort > the whole set of rows from R, which I suspect is unavoidable in your > implementation. We'd just need to sort the individual groups of rows > from S, and my guess is that in many practical cases those groups are > fairly small. We would need a special SEQSCAN node/executor, that does the projection and append steps in a single read. Do you have any suggestions how to implement this? > I wonder what identities hold for NORMALIZE. It does not seem to be > commutative, but it looks like it might be associative... i.e. (R > NORMALIZE S) NORMALIZE T produces the same output as R NORMALIZE (S > NORMALIZE T), perhaps? It is not commutative and also not associative. Assume relations that contain one tuple each as follows: R={[1, 100)}, S={[50, 100)}, and T={[10, 20)} (R NORMALIZE S) NORMALIZE T gives {[1, 10), [10, 20), [20,50), [50, 100)} while R NORMALIZE (S NORMALIZE T) gives {[1, 50), [50, 100)} Best regards, Anton, Johann, Michael, Peter
Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)
On Thu, Jan 25, 2018 at 10:00 PM, Amit Kapila wrote: >> At this point, my preferred solution is for someone to go implement >> Amit's WaitForParallelWorkersToAttach() idea [1] (Amit himself seems >> like the logical person for the job). >> > > I can implement it and share a prototype patch with you which you can > use to test parallel sort stuff. That would be great. Thank you. > I would like to highlight the > difference which you will see with WaitForParallelWorkersToAttach as > compare to WaitForParallelWorkersToFinish() is that the former will > give you how many of nworkers_launched workers are actually launched > whereas latter gives an error if any of the expected workers is not > launched. I feel former is good and your proposed way of calling it > after the leader is done with its work has alleviated the minor > disadvantage of this API which is that we need for workers to startup. I'm not sure that it makes much difference, though, since in the end WaitForParallelWorkersToFinish() is called anyway, much like nodeGather.c. Have I missed something? I had imagined that WaitForParallelWorkersToAttach() would give me an error in the style of WaitForParallelWorkersToFinish(), without actually waiting for the parallel workers to finish. > However, now I see that you and Thomas are trying to find a different > way to overcome this problem differently, so not sure if I should go > ahead or not. I have seen that you told you wanted to look at > Thomas's proposed stuff carefully tomorrow, so I will wait for you > guys to decide which way is appropriate. I suspect that the overhead of Thomas' experimental approach is going to causes problems in certain cases. Cases that are hard to foresee. That patch makes HandleParallelMessages() set ParallelMessagePending artificially, pending confirmation of having launched all workers. It was an interesting experiment, but I think that your WaitForParallelWorkersToAttach() idea has a better chance of working out. >> Once that's committed, I can >> post a new version of the patch that uses that new infrastructure -- >> I'll add a call to the new function, without changing anything else. >> Failing that, we could actually just use >> WaitForParallelWorkersToFinish(). I still don't want to use a barrier, >> mostly because it complicates parallel_leader_participation=off, >> something that Amit is in agreement with [2][3]. >> > > I think if we want we can use barrier API's to solve this problem, but > I kind of have a feeling that it doesn't seem to be the most > appropriate API, especially because existing API like > WaitForParallelWorkersToFinish() can serve the need in a similar way. I can't see a way in which using a barrier can have less complexity. I think it will have quite a bit more, and I suspect that you share this feeling. > Just to conclude, following are proposed ways to solve this problem: > > 1. Implement a new API WaitForParallelWorkersToAttach and use that to > solve this problem. Peter G. and Amit thinks, this is a good way to > solve this problem. > 2. Use existing API WaitForParallelWorkersToFinish to solve this > problem. Peter G. feels that if API mentioned in #1 is not available, > we can use this to solve the problem and I agree with that position. > Thomas is not against it. > 3. Use Thomas's new way to detect such failures. It is not clear to > me at this stage if any one of us have accepted it to be the way to > proceed, but Thomas and Peter G. want to investigate it further. > 4. Use of Barrier API to solve this problem. Robert appears to be > strongly in favor of this approach. That's a good summary. The next revision of the patch will make the leader-participates-as-worker spool/Tuplelsortstate start and finish sorting before the main leader spool/Tuplelsortstate is even started. I did this with the intention of making it very clear that my approach does not assume a number of participants up-front -- that is actually something we only need a final answer on at the point that the leader merges, which is logically the last possible moment. Hopefully this will reassure Robert. It is quite a small change, but leads to a slightly cleaner organization within nbtsort.c, since _bt_begin_parallel() is the only point that has to deal with leader participation. Another minor advantage is that this makes the trace_sort overheads/duration for each of the two tuplesort within the leader non-overlapping (when the leader participates as a worker). -- Peter Geoghegan
Re: [HACKERS] Restrict concurrent update/delete with UPDATE of partition key
On Wed, Jan 24, 2018 at 12:44 PM, amul sul wrote: > On Tue, Jan 23, 2018 at 7:01 PM, Amit Kapila wrote: >> On Fri, Jan 12, 2018 at 11:43 AM, amul sul wrote: > [] >> I have asked to change the message "tuple to be updated .." after >> heap_lock_tuple call not in nodeModifyTable.c, so please revert the >> message in nodeModifyTable.c. >> > > Understood, fixed in the attached version, sorry I'd missed it. > >> Have you verified the changes in execReplication.c in some way? It is >> not clear to me how do you ensure to set the special value >> (InvalidBlockNumber) in CTID for delete operation via logical >> replication? The logical replication worker uses function >> ExecSimpleRelationDelete to perform Delete and there is no way it can >> pass the correct value of row_moved in heap_delete. Am I missing >> something due to which we don't need to do this? >> > > You are correct, from ExecSimpleRelationDelete, heap_delete will always > receive row_moved = false and InvalidBlockNumber will never set. > So, this means that in case of logical replication, it won't generate the error this patch is trying to introduce. I think if we want to handle this we need some changes in WAL and logical decoding as well. Robert, others, what do you think? I am not very comfortable leaving this unaddressed, if we don't want to do anything about it, at least we should document it. > I didn't found any test case to hit changes in execReplication.c. I am not > sure > what should we suppose do here, and having LOG is how much worse either. > I think you can manually (via debugger) hit this by using PUBLICATION/SUBSCRIPTION syntax for logical replication. I think what you need to do is in node-1, create a partitioned table and subscribe it on node-2. Now, perform an Update on node-1, then stop the logical replication worker before it calls heap_lock_tuple. Now, in node-2, update the same row such that it moves the row. Now, continue the logical replication worker. I think it should hit your new code, if not then we need to think of some other way. > What do you think, should we add an assert like EvalPlanQualFetch() here or > the current LOG message is fine? > I think first let's try to hit this case. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: General purpose hashing func in pgbench
Hello Ildar, Applies, compiles, runs. I still have a few very minor comments, sorry for this (hopefully) last iteration from my part. I'm kind of iterative... The XML documentation source should avoid a paragraph on one very long line, but rather be indented like other sections. I'd propose simplify the second part: Hash functions can be used, for example, to modify the distribution of random_zipfian or random_exponential functions in order to obtain scattered distribution. Thus the following pgbench script simulates possible real world workload typical for social media and blogging platforms where few accounts generate excessive load: -> Hash functions can be used to scatter the distribution of random functions such as random_zipfian or random_exponential. For instance, the following pgbench script simulates possible real world workload typical for social media and blogging platforms where few accounts generate excessive load: Comment the Assert(0) as an internal error that cannot happen. I'd suggest to compact the execution code by declaring int64 variable and coerce to int in one go, like the integer bitwise functions. I'm in favor to keeping them in their own case and not reuse this one. -- Fabien.
Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)
On Fri, Jan 26, 2018 at 11:30 AM, Amit Kapila wrote: > On Thu, Jan 25, 2018 at 1:24 AM, Peter Geoghegan wrote: >> On Tue, Jan 23, 2018 at 9:43 PM, Amit Kapila wrote: >>> Right, but what if the worker dies due to something proc_exit(1) or >>> something like that before calling BarrierArriveAndWait. I think this >>> is part of the problem we have solved in >>> WaitForParallelWorkersToFinish such that if the worker exits abruptly >>> at any point due to some reason, the system should not hang. >> >> I have used Thomas' chaos-monkey-fork-process.patch to verify: >> >> 1. The problem of fork failure causing nbtsort.c to wait forever is a >> real problem. Sure enough, the coding pattern within >> _bt_leader_heapscan() can cause us to wait forever even with commit >> 2badb5afb89cd569500ef7c3b23c7a9d11718f2f, more or less as a >> consequence of the patch not using tuple queues (it uses the new >> tuplesort sharing thing instead). >> >> 2. Simply adding a single call to WaitForParallelWorkersToFinish() >> within _bt_leader_heapscan() before waiting on our condition variable >> fixes the problem -- errors are reliably propagated, and we never end >> up waiting forever. >> >> 3. This short term fix works just as well with >> parallel_leader_participation=off. >> >> At this point, my preferred solution is for someone to go implement >> Amit's WaitForParallelWorkersToAttach() idea [1] (Amit himself seems >> like the logical person for the job). >> > > I can implement it and share a prototype patch with you which you can > use to test parallel sort stuff. I would like to highlight the > difference which you will see with WaitForParallelWorkersToAttach as > compare to WaitForParallelWorkersToFinish() is that the former will > give you how many of nworkers_launched workers are actually launched > whereas latter gives an error if any of the expected workers is not > launched. I feel former is good and your proposed way of calling it > after the leader is done with its work has alleviated the minor > disadvantage of this API which is that we need for workers to startup. > /we need for workers to startup./we need to wait for workers to startup. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)
On Thu, Jan 25, 2018 at 1:24 AM, Peter Geoghegan wrote: > On Tue, Jan 23, 2018 at 9:43 PM, Amit Kapila wrote: >> Right, but what if the worker dies due to something proc_exit(1) or >> something like that before calling BarrierArriveAndWait. I think this >> is part of the problem we have solved in >> WaitForParallelWorkersToFinish such that if the worker exits abruptly >> at any point due to some reason, the system should not hang. > > I have used Thomas' chaos-monkey-fork-process.patch to verify: > > 1. The problem of fork failure causing nbtsort.c to wait forever is a > real problem. Sure enough, the coding pattern within > _bt_leader_heapscan() can cause us to wait forever even with commit > 2badb5afb89cd569500ef7c3b23c7a9d11718f2f, more or less as a > consequence of the patch not using tuple queues (it uses the new > tuplesort sharing thing instead). > > 2. Simply adding a single call to WaitForParallelWorkersToFinish() > within _bt_leader_heapscan() before waiting on our condition variable > fixes the problem -- errors are reliably propagated, and we never end > up waiting forever. > > 3. This short term fix works just as well with > parallel_leader_participation=off. > > At this point, my preferred solution is for someone to go implement > Amit's WaitForParallelWorkersToAttach() idea [1] (Amit himself seems > like the logical person for the job). > I can implement it and share a prototype patch with you which you can use to test parallel sort stuff. I would like to highlight the difference which you will see with WaitForParallelWorkersToAttach as compare to WaitForParallelWorkersToFinish() is that the former will give you how many of nworkers_launched workers are actually launched whereas latter gives an error if any of the expected workers is not launched. I feel former is good and your proposed way of calling it after the leader is done with its work has alleviated the minor disadvantage of this API which is that we need for workers to startup. However, now I see that you and Thomas are trying to find a different way to overcome this problem differently, so not sure if I should go ahead or not. I have seen that you told you wanted to look at Thomas's proposed stuff carefully tomorrow, so I will wait for you guys to decide which way is appropriate. > Once that's committed, I can > post a new version of the patch that uses that new infrastructure -- > I'll add a call to the new function, without changing anything else. > Failing that, we could actually just use > WaitForParallelWorkersToFinish(). I still don't want to use a barrier, > mostly because it complicates parallel_leader_participation=off, > something that Amit is in agreement with [2][3]. > I think if we want we can use barrier API's to solve this problem, but I kind of have a feeling that it doesn't seem to be the most appropriate API, especially because existing API like WaitForParallelWorkersToFinish() can serve the need in a similar way. Just to conclude, following are proposed ways to solve this problem: 1. Implement a new API WaitForParallelWorkersToAttach and use that to solve this problem. Peter G. and Amit thinks, this is a good way to solve this problem. 2. Use existing API WaitForParallelWorkersToFinish to solve this problem. Peter G. feels that if API mentioned in #1 is not available, we can use this to solve the problem and I agree with that position. Thomas is not against it. 3. Use Thomas's new way to detect such failures. It is not clear to me at this stage if any one of us have accepted it to be the way to proceed, but Thomas and Peter G. want to investigate it further. 4. Use of Barrier API to solve this problem. Robert appears to be strongly in favor of this approach. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: Redefining inet_net_ntop
Craig Ringer writes: > Should we be using our own if the OS has it? I'm thinking of adding a test > to configure and omitting our own version if configure finds it. Objections? I can't imagine that there's any real upside here. The amount of code involved is barely over a kilobyte, and we'd be exposing ourselves to indeterminate version discrepancies. Having said that, we got that code from bind, and the release process docs suggest that we should check for upstream changes every so often. I don't think we've done so in a long time :-( regards, tom lane
RE: Temporary tables prevent autovacuum, leading to XID wraparound
From: Robert Haas [mailto:robertmh...@gmail.com] > I think we should consider having backends try to remove their temporary > schema on startup; then, if a temp table in a backend is old enough that > it's due for vacuum for wraparound, have autovacuum kill the connection. > The former is necessary to prevent sessions from being killed on account > of temp tables they "inherited" from a backend that didn't exit cleanly. That seems to be the only reasonable solution. One might feel it annoying to emit WAL during connection establishment to delete the temp schema, but even now the client authentication can emit WAL for hot pruning while scanning pg_database, pg_authid, etc. Thanks. > The in-place update idea won't work for a couple of reasons. First, a > command shouldn't see the results of modifications made earlier in the same > command. Second, using cursors, it's possible to have more than one > distinct snapshot open against a temporary table at the same time. You're right. And if the transaction rolls back, it needs to see the old tuples, which requires undo log. Regards Takayuki Tsunakawa
RE: Temporary tables prevent autovacuum, leading to XID wraparound
From: Masahiko Sawada [mailto:sawada.m...@gmail.com] > On Thu, Jan 25, 2018 at 3:14 PM, Tsunakawa, Takayuki > wrote: > > * Why does autovacuum launcher always choose only one database when that > database need vacuuming for XID wraparound? Shouldn't it also choose other > databases? > > Yeah, I'd also like to fix this issue. This can be problem even in other > case; there are two databases that require anti-wraparound vacuum, and one > of them has a very large table that could take a long time to vacuum. In > this case, if autovacuum chooses the database having big table first, > another database would not be selected until an autovacuum worker completes > vacuum on the large table. To deal with it, I think we can make autovacuum > workers tell that it is no longer necessary to launch a new autovacuum worker > on the database to autovacuum launcher before exit. For example, autovacuum > worker reports both the total number of relations and the number of relations > that require an anti-wraparound vacuum to the stats collector. Thanks for commenting. I believe you have deep knowledge and experience with vacuum because you did a great work for freeze map in 9.6, so I appreciate your help! How would you use those two counts? How about just modifying do_start_worker(), so that the launcher chooses a database in the following order? 1. wraparound-risky database not being vacuumed by any worker 2. non-wraparound-risky database not being vacuumed by any worker 3. wraparound-risky database being vacuumed by any worker 4. non-wraparound-risky database being vacuumed by any worker Regards Takayuki Tsunakawa
Re: Redefining inet_net_ntop
On 1/25/18 22:24, Craig Ringer wrote: > port.h declares inet_net_ntop and we always compile our own > from port/inet_net_ntop.c . > > But it's part of -lresolv on Linux, and more importantly, it's declared > in . > > Should we be using our own if the OS has it? I'm thinking of adding a > test to configure and omitting our own version if configure finds it. ISTR that ours is hacked to produce specific output. Do the regression tests still pass if you use the one from the OS? -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: WIP: Covering + unique indexes.
On Fri, Jan 26, 2018 at 3:01 AM, Anastasia Lubennikova wrote: > Thanks for the reminder. Rebased patches are attached. This is a really cool and also difficult feature. Thanks for working on it! Here are a couple of quick comments on the documentation, since I noticed it doesn't build: SGML->XML change: (1) empty closing tags "" are no longer accepted, (2) now needs to be written and (3) xref IDs are now case-sensitive. + PRIMARY KEY ( column_name [, ... ] ) index_parameters INCLUDE (column_name [, ...]) | I hadn't seen that use of "" before. Almost everywhere else we use explicit [ and ] characters, but I see that there are other examples, and it is rendered as [ and ] in the output. OK, cool, but I think there should be some extra whitespace so that it comes out as: [ INCLUDE ... ] instead of: [INCLUDE ...] to fit with the existing convention. +... This also allows UNIQUE indexes to be defined on +one set of columns, which can include another set of columns in the + INCLUDE clause, on which the uniqueness is not enforced. +It's the same with other constraints (PRIMARY KEY and EXCLUDE). This can +also can be used for non-unique indexes as any columns which are not required +for the searching or ordering of records can be used in the +INCLUDE clause, which can slightly reduce the size of the index. Can I suggest rewording these three sentences a bit? Just an idea: UNIQUE indexes, PRIMARY KEY constraints and EXCLUDE constraints can be defined with extra columns in an INCLUDE clause, in which case uniqueness is not enforced for the extra columns. Moving columns that are not needed for searching, ordering or uniqueness into the INCLUDE clause can sometimes reduce the size of the index while retaining the possibility of using a faster index-only scan. -- Thomas Munro http://www.enterprisedb.com
Re: [HACKERS] generated columns
On 1/19/18 00:18, Michael Paquier wrote: > Instead of leaving bits for a feature that may or may not be > implemented, have you considered just blocking STORED at parsing level > and remove those bits? There is little point in keeping the equivalent > of dead code in the tree. So I would suggest a patch simplification: > - Drop the VIRTUAL/STORED parsing from the grammar for now. > - Define VIRTUAL as the default for the future. fixed > =# CREATE TABLE gen_1 (a int, b int GENERATED ALWAYS AS (a * 2) > VIRTUAL); > CREATE TABLE > =# insert into gen_1 values (20); > INSERT 0 1 > =# select * from gen_1 ; > ERROR: 22003: integer out of range > Overflow checks do not happen when inserting, which makes the following > SELECT to fail. This could be confusing for the user because the row has > been inserted. Perhaps some evaluation of virtual tuples at insert phase > should happen. The existing behavior makes sense as well as virtual > values are only evaluated when read, so a test would be at least > welcome. added test > Does the SQL spec mention the matter? How do other systems > handle such cases? In Oracle you get the same overflow error. > CHECK constraints can be combined, still.. I think you can compare this to a view. A view can produce overflow errors on read. But a CHECK constraint is more like a CHECK option on a view that checks as values are put into the view. > The last patch crashes for typed tables: > =# CREATE TYPE itest_type AS (f1 integer, f2 text, f3 bigint); > CREATE TYPE > =# CREATE TABLE itest12 OF itest_type (f1 WITH OPTIONS GENERATED ALWAYS > AS (f2 *2)); > [... boom ...] > And for partitions: > =# CREATE TABLE itest_parent (f1 date NOT NULL, f2 text, f3 bigint) >PARTITION BY RANGE (f1); > CREATE TABLE > =# CREATE TABLE itest_child PARTITION OF itest_parent (f3 WITH OPTIONS >GENERATED ALWAYS AS (f3)) FOR VALUES FROM ('2016-07-01') TO >('2016-08-01'); > [... boom ...] > Like what we did in 005ac298, I would suggest to throw > ERRCODE_FEATURE_NOT_SUPPORTED. Please also add some tests. done > +SELECT a, c FROM gtest12; -- FIXME: should be allowed > +ERROR: permission denied for function gf1 This is quite hard to fix and I would like to leave this for a future release. > +ALTER TABLE gtest27 ALTER COLUMN b DROP DEFAULT; -- FIXME > +ERROR: column "b" of relation "gtest27" is a generated column That FIXME seems to have been a mistake. I have removed it. > + if (get_attgenerated(relationId, attno)) > + ereport(ERROR, > + (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), > + errmsg("index creation on generated columns is not > supported"))); > Shouldn't such messages mention explicitely "virtually"-generated > columns? For stored columns the support would not be complicated in this > case. done > +-- domains > +CREATE DOMAIN gtestdomain1 AS int CHECK (VALUE < 10); > +CREATE TABLE gtest24 (a int PRIMARY KEY, b gtestdomain1 GENERATED > ALWAYS AS (a * 2)); -- prohibited > +ERROR: virtual generated column "b" cannot have a domain type > CHECK constraints can be used, so I find this restriction confusing. We currently don't have infrastructure to support this. We run all CHECK constraints whenever a row is changed, so that is easy. But we don't have facilities to recheck the domain constraint in column b when column a is updated. This could be done, but it's extra work. > No test coverage for DELETE triggers? done > +UPDATE gtest26 SET a = a * -2; > +INFO: gtest1: old = (-2,) > +INFO: gtest1: new = (4,) > +INFO: gtest3: old = (-2,) > +INFO: gtest3: new = (4,) > +INFO: gtest4: old = (3,) > +INFO: gtest4: new = (-6,) > OLD and NEW values for generated columns don't show up. Am I missing > something or they should be available? This was already discussed a few times in the thread. I don't know what a good solution is. I have in this patch added facilties to handle this better in other PLs. So the virtual generated column doesn't show up there in the trigger data. This is possible because we copy the trigger data from the internal structures into language-specific hashes/dictionaries/etc. In PL/pgSQL, this is a bit more difficult, because we handle the table's row type there. We can't just "hide" the generated column when looking at the row type. Actually, we could do it quite easily, but that would probably raise other weirdnesses. This also raises a question how a row type with generated columns should behave. I think a generation expression is a property of a table, so it does not apply in a row type. (Just like a default is a property of a table and does not apply in row types.) > Please use brackers here if you use a comment in the if() block... > [/nit_mode] done > COPY as you are proposing looks sensible to me. I am not sure about any > options though as it is possible to enforce the selection of generated > columns using COPY (SELECT). removed that comment > Per my tests, gene
Redefining inet_net_ntop
Hi folks port.h declares inet_net_ntop and we always compile our own from port/inet_net_ntop.c . But it's part of -lresolv on Linux, and more importantly, it's declared in . Should we be using our own if the OS has it? I'm thinking of adding a test to configure and omitting our own version if configure finds it. Objections? -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: [HACKERS] GnuTLS support
On 1/25/18 20:10, Michael Paquier wrote: > Peter, could you change ssl_version() and ssl_cipher() in sslinfo at the > same time please? I think that those should use the generic backend-side > APIs as well. sslinfo depends heavily on OpenSSL, OK, but if possible > getting this code more generic will help users of sslinfo to get > something partially working with other SSL implementations natively. sslinfo is currently entirely dependent on OpenSSL, so I don't think it's useful to throw in one or two isolated API changes. I'm thinking maybe we should get rid of sslinfo and fold everything into pg_stat_ssl. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: [PATCH][PROPOSAL] Refuse setting toast.* reloptions when TOAST table does not exist
On 1/25/18 09:40, Tom Lane wrote: > Peter Eisentraut writes: >> On 1/23/18 13:39, Robert Haas wrote: >>> I don't understand. AAUI, it is currently the case that if you set >>> the options before the TOAST table exists, they are lost. > >> Well, that's also weird. > > Well, maybe the right answer is to address that. It's clear to me > why that would happen if we store these things as reloptions on the > toast table, but can't they be stored on the parent table? I don't think there is a case where this can currently be a problem with the current options set, but here is what I was thinking about: Imagine there were a setting toast.fillfactor or something like that that affects the storage layout of the TOAST table. If you ran ALTER TABLE mytbl ADD COLUMN foo text DEFAULT somelongvalue(); then you would conceivably want to set TOAST-related reloptions before the TOAST table is created and have them apply as the TOAST table is being first populated. Again, currently not a problem, I think. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: [PATCH] Logical decoding of TRUNCATE
On 1/24/18 07:53, Petr Jelinek wrote: > That depends on if we consider this to be part of sequence handling or > truncate statement replication handling. It's true that if we had > sequence replication, the restart would get propagated that way anyway. > OTOH, if we'll want to add option in the future to propagate cascade (to > any additional tables on downstream), it may need to reset sequences > which are not even present upstream and as such would not be handled by > sequence replication. Logical replication, as it currently is designed, replicates discrete actions, not commands. A delete command that deletes five rows and cascades to delete three more rows somewhere else ends up being replicated as eight changes. So I think a TRUNCATE command that cascades to four tables and resets six sequences should end up being replicated as ten changes. (Since we currently don't handle sequences at all, we'll omit the six sequence changes for now.) -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: [HACKERS] Subscription code improvements
On 1/24/18 02:33, Masahiko Sawada wrote: > Thank you for notification. Since it seems to me that no one is > interested in this patch, it would be better to close out this patch. > This patch is a refactoring patch but subscription code seems to work > fine so far. If a problem appears around subscriptions, I might > propose it again. I have looked at the patches again. They seem generally reasonable, but I don't see quite why we want or need them. More details would help review them. Do they fix any bugs, or are they just rearranging code? -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: [HACKERS] [PATCH]make pg_rewind to not copy useless WAL files
On Thu, Jan 25, 2018 at 07:38:37AM -0500, Stephen Frost wrote: > Michael, all, > > * Michael Paquier (michael.paqu...@gmail.com) wrote: >> On Wed, Jan 24, 2018 at 12:43:51PM -0500, Stephen Frost wrote: >>> * chenhj (chjis...@163.com) wrote: At 2018-01-23 09:56:48, "Stephen Frost" wrote: > I've only read through the thread to try and understand what's going on > and the first thing that comes to mind is that you're changing > pg_rewind to not remove the WAL from before the divergence (split) > point, but I'm not sure why. As noted, that WAL isn't needed for > anything (it's from before the split, after all), so why keep it? Is > there something in this optimization that depends on the old WAL being > there and, if so, what and why? After run pg_rewind, the first startup of postgres will do crash recovery. And crash recovery will begin from the previous redo point preceding the divergence. So, the WAL after the redo point and before the divergence is needed. >>> >>> Right. >> >> Most of the time, and particularly since v11 has removed the need to >> retain more past segments than one completed checkpoint, those segments >> have less chances to be on the source server, limiting more the impact >> of the patch discussed on this thread. > > Good point. Actually, that is worse that than, because after doing a promotion one has to issue a checkpoint on the promoted server so as to update its control file (pg_rewind fetches the new TLI number from the on-disk file), so we can never ensure that needed WAL segments will be on the primary. Hence the proposed patch loses most of its value, because there will always be a WAL segment hole anyway. It can also help in reducing the load in creating new segments if min_wal_size is large enough on the rewound server but that is a narrow benefit. In short, it seems really to me that we should reject the approach as proposed, and replace it with something that prevents the fetching of any WAL segments from the source server. I think that we could consider as well removing all WAL segments on the primary from the point WAL forked, as those created between the last checkpoint before WAL forked up to the point where WAL forked are useful anyway. But those are bonus points to keep a minimalistic amount of space for the rewound node as they finish by being recycled anyway. For those reasons I think that the patch should be marked as returned with feedback. >> Yes, superuser is necessary now, if we could get to a point where only a >> replication permission is needed that would be nice. Now we could do >> things differently. We could have a system role dedicated to pg_rewind >> which works only on the functions from genfile.c that pg_rewind needs, >> in order to leverage the need of a superuser. > > Actually, the other work I'm doing nearby wrt removing the explicit > superuser() checks in those functions would allow a non-superuser role > to be created which could be used by pg_rewind. We could even add an > explicit 'pg_rewind' default role if we wanted to, but is that the route > we want to go with this or should we be thinking about changing > pg_rewind to use the replication protocol and a replication user > instead..? The only reason that it doesn't today is that there isn't an > easy way for it to do so with the existing replication protocol, as I > understand it. Yes, actually with the piece of work we are doing, we don't need to work more on the replication protocol, and pg_rewind does not require heavy refactoring either, so from a maintenance point of view we would gain much more in the long term by using a dedicated role. If your patch to have the file-read catalog user gets in, we should add in the docs of pg_rewind a small set of ACL commands to allow pg_rewind to work with a non-superuser role as long is it gains access to the minimal set of functions necessary. > Then again, there's this whole question of if we should even keep the > replication protocol or if we should be getting rid of it in favor of > have regular connections that can support replication. There's been > discussion of that recently, as I recall, though I can't remember > where. Hm. I don't recall this one.. > Having the rewound server decide by itself what it needs actually sounds > like it would be better and would allow whatever the restore command is > to fetch any missing WAL necessary (which it might be able to do more > efficiently, from an archive server and possibly even in parallel as > compared to pg_rewind doing it serially from the primary...). We don't > have any particularly easy way to prevent needed WAL from disappearing > off of the primary anyway, so it seems like it would be necessary to > have a working restore command for pg_rewind to be reliable. While we > could create a physical replication slot for pg_rewind when it starts, > that may already be too late. The only approach I can think of that
Re: [HACKERS] REINDEX CONCURRENTLY 2.0
Craig, Michael, all, * Craig Ringer (cr...@2ndquadrant.com) wrote: > On 21 December 2017 at 11:31, Michael Paquier > wrote: > > > On Thu, Dec 21, 2017 at 11:46 AM, Alvaro Herrera > > wrote: > > > Michael Paquier wrote: > > >> Well, the idea is really to get rid of that as there are already > > >> facilities of this kind for CREATE TABLE LIKE in the parser and ALTER > > >> TABLE when rewriting a relation. It is not really attractive to have a > > >> 3rd method in the backend code to do the same kind of things, for a > > >> method that is even harder to maintain than the other two. > > > > > > I dislike the backend code that uses SPI and manufacturing node to > > > re-creates indexes. IMO we should get rid of it. Let's not call it > > > "facilities", but rather "grotty hacks". > > > > Aha. You are making my day here ;) > > > > > I think before suggesting to add even more code to perpetuate that idea, > > > we should think about going in the other direction. I have not tried to > > > write the code, but it should be possible to have an intermediate > > > function called by ProcessUtility* which transforms the IndexStmt into > > > an internal representation, then calls DefineIndex. This way, all this > > > code that wants to create indexes for backend-internal reasons can > > > create the internal representation directly then call DefineIndex, > > > instead of the horrible hacks they use today creating parse nodes by > > > hand. > > > > Yeah, that would be likely possible. I am not volunteering for that in > > the short term though.. > > It sounds like that'd make some of ALTER TABLE a bit less ... upsetting ... > too. I'm a big fan of this patch but it doesn't appear to have made any progress in quite a while. Is there any chance we can get an updated patch and perhaps get another review before the end of this CF...? Refactoring this to have an internal representation between ProcessUtility() and DefineIndex doesn't sound too terrible and if it means the ability to reuse that, seems like it'd be awful nice to do so.. Thanks! Stephen signature.asc Description: PGP signature
Re: \describe*
On 01/26/2018 02:11 AM, Corey Huinker wrote: > Some of the discussions about making psql more user friendly (more tab > completions help, exit, etc) got me thinking about other ways that psql > could be more friendly, and the one that comes to mind is our terse but > cryptic \d* commands. > > I think it would be helpful and instructive to have corresponding > long-form describe commands. > > Take describing schemas. Is \dn intuitive? Not really. In hindsight, you > may think "yeah, a schema is a namespace", but you never guessed 'n' on > the first try, or the second. At first blush, I support this idea. > Looking over exec_command_d() a bit, I think it's a bit of a stretch do > have each command handle a long form like this: > > \describe table my_table > or > \describe table verbose my_table > > because then each \d-variant has to account for objects named "table" > and "verbose" and that's a path to unhappiness. We're already being verbose so we can easily require \describe table table for the first case, and if you move "verbose" to before the object, then we can have \describe verbose table verbose So basically, the grammar would be "\describe [verbose] [system] object name" instead of "\dXS[+] name" where X is the object. One thing not addressed here is a long version of \ditvS+. Maybe something like \describe verbose system index, table, view > But if we dash-separated them, then all of the strcmps would be in the > 'e' subsection, and each one would just have to know it's long to short > translation, and call exec_command_d with the corresponding short command > > describe => d > describe-verbose => d+ > describe-aggregates-verbose => da+ > describe-roles => du -1 > We could even presume the verbose flag in all cases (after all, the user > was being verbose...), which would also cut down on tab-completion > results, and we could check for interactive mode and display a message like > > \describe-schemas (short: \dn+) > > so that the person has the opportunity to learn the corresponding short > command. -1 on this, too. If we presume "verbose", we need to add a "terse". If the user is interested in the short forms, they can issue a \? like everybody else. > In additional to aiding tab completion discovery of the commands (i.e. > typing "\desc" and then hitting tab, it would also make scripts a little > more self-documenting. I always use long versions of options when writing scripts specifically because they are self-documenting (see 0be22457d7) so I certainly support this argument. Note: I am not volunteering to implement any of this, but I'll happily review it. -- Vik Fearing +33 6 46 75 15 36 http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support
RE: [HACKERS][PATCH] Applying PMDK to WAL operations for persistent memory
From: Robert Haas [mailto:robertmh...@gmail.com] > If I understand correctly, those results are all just pg_test_fsync results. > That's not reflective of what will happen when the database is actually > running. When you use open_sync or open_datasync, you force WAL write and > WAL flush to happen simultaneously, instead of letting the WAL flush be > delayed. Yes, that's pg_test_fsync output. Isn't pg_test_fsync the tool to determine the value for wal_sync_method? Is this manual misleading? https://www.postgresql.org/docs/devel/static/pgtestfsync.html -- pg_test_fsync - determine fastest wal_sync_method for PostgreSQL pg_test_fsync is intended to give you a reasonable idea of what the fastest wal_sync_method is on your specific system, as well as supplying diagnostic information in the event of an identified I/O problem. -- Anyway, I'll use pgbench, and submit a patch if open_datasync is better than fdatasync. I guess the current tweak of making fdatasync the default is a holdover from the era before ext4 and XFS became prevalent. > I don't have the results handy at the moment. We found it to be faster > on a database benchmark where the WAL was stored on an NVRAM device. Oh, NVRAM. Interesting. Then I'll try open_datasync/fdatasync comparison on HDD and SSD/PCie flash with pgbench. Regards Takayuki Tsunakawa
Re: Boolean partitions syntax
Hi Stephen. On 2018/01/26 10:16, Stephen Frost wrote: > * Amit Langote (langote_amit...@lab.ntt.co.jp) wrote: > Still compiles and passes regression tests, which is good. Thanks for looking at this. >>> I extended your test a bit to check whether partitions over booleans are >>> useful. >>> Note specifically the 'explain' output, which does not seem to restrict the >>> scan >>> to just the relevant partitions. You could easily argue that this is >>> beyond the scope >>> of your patch (and therefore not your problem), but I doubt it makes much >>> sense >>> to have boolean partitions without planner support for skipping partitions >>> like is >>> done for tables partitioned over other data types. >> >> Yeah. Actually, I'm aware that the planner doesn't work this. While >> constraint exclusion (planner's current method of skipping partitions) >> does not work with IS TRUE/FALSE/UNKNOWN clauses, the new partition >> pruning patch [1] addresses that. In fact, I started this thread prompted >> by some discussion about Boolean partitions on that thread [2]. >> >> That said, someone might argue that we should also fix constraint >> exclusion (the current method of partition pruning) so that partition >> skipping works correctly for Boolean partitions. > > For my 2c, at least, I don't think we need to fix constraint exclusion > to work for this case and hopefully we'll get the partition pruning > patch in but I'm not sure that we really need to wait for that either. > Worst case, we can simply document that the planner won't actually > exclude boolean-based partitions in this release and then fix it in the > future. Yeah, I meant this just as a tiny syntax extension patch. > Looking over this patch, it seems to be in pretty good shape to me > except that I'm not sure why you went with the approach of naming the > function 'NoCast'. There's a number of other functions just above > makeBoolAConst() that don't include a TypeCast and it seems like having > makeBoolConst() and makeBoolConstCast() would be more in-line with the > existing code (see makeStringConst() and makeStringConstCast() for > example, but also makeIntConst(), makeFloatConst(), et al). That would > require updating the existing callers that really want a TypeCast result > even though they're calling makeBoolAConst(), but that seems like a good > improvement to be making. Agreed, done. > I could see an argument that we should have two patches (one to rename > the existing function, another to add support for boolean) but that's > really up to whatever committer picks this up. For my 2c, I don't think > it's really necessary to split it into two patches. OK, I kept the function name change part with the main patch. Attached updated patch. Thanks, Amit >From 3ebd9387c4515cc5272e12e0138cd8035fedaaea Mon Sep 17 00:00:00 2001 From: amit Date: Tue, 12 Dec 2017 10:33:11 +0900 Subject: [PATCH v3] Allow Boolean values in partition FOR VALUES clause --- doc/src/sgml/ref/create_table.sgml | 6 +++--- src/backend/parser/gram.y | 24 +++- src/test/regress/expected/create_table.out | 14 ++ src/test/regress/sql/create_table.sql | 7 +++ 4 files changed, 43 insertions(+), 8 deletions(-) diff --git a/doc/src/sgml/ref/create_table.sgml b/doc/src/sgml/ref/create_table.sgml index a0c9a6d257..eaa79ae333 100644 --- a/doc/src/sgml/ref/create_table.sgml +++ b/doc/src/sgml/ref/create_table.sgml @@ -86,9 +86,9 @@ CREATE [ [ GLOBAL | LOCAL ] { TEMPORARY | TEMP } | UNLOGGED ] TABLE [ IF NOT EXI and partition_bound_spec is: -IN ( { numeric_literal | string_literal | NULL } [, ...] ) | -FROM ( { numeric_literal | string_literal | MINVALUE | MAXVALUE } [, ...] ) - TO ( { numeric_literal | string_literal | MINVALUE | MAXVALUE } [, ...] ) | +IN ( { numeric_literal | string_literal | TRUE | FALSE | NULL } [, ...] ) | +FROM ( { numeric_literal | string_literal | TRUE | FALSE | MINVALUE | MAXVALUE } [, ...] ) + TO ( { numeric_literal | string_literal | TRUE | FALSE | MINVALUE | MAXVALUE } [, ...] ) | WITH ( MODULUS numeric_literal, REMAINDER numeric_literal ) index_parameters in UNIQUE, PRIMARY KEY, and EXCLUDE constraints are: diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y index 459a227e57..15b5c85a6e 100644 --- a/src/backend/parser/gram.y +++ b/src/backend/parser/gram.y @@ -152,6 +152,7 @@ static Node *makeFloatConst(char *str, int location); static Node *makeBitStringConst(char *str, int location); static Node *makeNullAConst(int location); static Node *makeAConst(Value *v, int location); +static Node *makeBoolAConstCast(bool state, int location); static Node *makeBoolAConst(bool state, int location); static RoleSpec *makeRoleSpec(RoleSpecType type, int location); static void check_qualified_name(List *names, core_yyscan_t yyscanner); @@ -2793,6 +2794,8 @@ partbound_datum: Sconst { $$ = makeStri
Re: [PATCH] Logical decoding of TRUNCATE
On Thu, Jan 25, 2018 at 8:36 PM, Petr Jelinek wrote: > On 26/01/18 02:34, Robert Haas wrote: >> On Wed, Jan 17, 2018 at 12:07 PM, Petr Jelinek >> wrote: >>> The patch will cascade truncation on downstream if cascade was specified >>> on the upstream, that can potentially be dangerous and we either should >>> not do it and only truncate the tables which were truncated upstream >>> (but without restricting because of FKs), leaving the data inconsistent >>> on downstream (like we do already with DELETE or UPDATE). Or maybe make >>> it into either subscription or publication option so that user can chose >>> the behaviour here as I am sure some people will want it to cascade (but >>> the default should still IMHO be to not cascade as that's safer). >> >> Maybe I'm not understanding what is being proposed here, but it sounds >> like you're saying that if somebody removes a bunch of data on the >> logical master, replication will remove only some of it from the >> servers to which the change is replicated. That seems crazy. Then >> replication can't be counted on to produce a replica. >> > No, I was talking about extra tables that might be present on downstream > which weren't truncated on upstream. Oh. That's different. :-) -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Fix a typo in autoprewarm.c
On Fri, Jan 26, 2018 at 10:13 AM, Bruce Momjian wrote: > On Fri, Dec 22, 2017 at 02:30:56PM +0900, Masahiko Sawada wrote: >> Hi, >> >> Attached a patch for fixing a typo in autoprewarm.c. I'm sorry if I'm >> mistaken. >> >> s/withs/with/ > > Patch applied to head, thanks. Thank you! Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
RE: [HACKERS][PATCH] Applying PMDK to WAL operations for persistent memory
From: Michael Paquier [mailto:michael.paqu...@gmail.com] > Or to put it short, the lack of granular syncs in ext3 kills performance > for some workloads. Tomas Vondra's presentation on such matters are a really > cool read by the way: > https://www.slideshare.net/fuzzycz/postgresql-on-ext4-xfs-btrfs-and-zf > s Yeah, I saw this recently, too. That was cool. Regards Takayuki Tsunakawa
Re: [PATCH] Logical decoding of TRUNCATE
On 26/01/18 02:34, Robert Haas wrote: > On Wed, Jan 17, 2018 at 12:07 PM, Petr Jelinek > wrote: >> The patch will cascade truncation on downstream if cascade was specified >> on the upstream, that can potentially be dangerous and we either should >> not do it and only truncate the tables which were truncated upstream >> (but without restricting because of FKs), leaving the data inconsistent >> on downstream (like we do already with DELETE or UPDATE). Or maybe make >> it into either subscription or publication option so that user can chose >> the behaviour here as I am sure some people will want it to cascade (but >> the default should still IMHO be to not cascade as that's safer). > > Maybe I'm not understanding what is being proposed here, but it sounds > like you're saying that if somebody removes a bunch of data on the > logical master, replication will remove only some of it from the > servers to which the change is replicated. That seems crazy. Then > replication can't be counted on to produce a replica. > No, I was talking about extra tables that might be present on downstream which weren't truncated on upstream. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: [HACKERS][PATCH] Applying PMDK to WAL operations for persistent memory
On Thu, Jan 25, 2018 at 8:32 PM, Tsunakawa, Takayuki wrote: > As I showed previously, regular file writes on PCIe flash, *not writes using > PMDK on persistent memory*, was 20% faster with open_datasync than with > fdatasync. If I understand correctly, those results are all just pg_test_fsync results. That's not reflective of what will happen when the database is actually running. When you use open_sync or open_datasync, you force WAL write and WAL flush to happen simultaneously, instead of letting the WAL flush be delayed. > And you said open_datasync was significantly faster than fdatasync. Could > you show your results? What device and filesystem did you use? I don't have the results handy at the moment. We found it to be faster on a database benchmark where the WAL was stored on an NVRAM device. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: [PATCH] Logical decoding of TRUNCATE
On Wed, Jan 17, 2018 at 12:07 PM, Petr Jelinek wrote: > The patch will cascade truncation on downstream if cascade was specified > on the upstream, that can potentially be dangerous and we either should > not do it and only truncate the tables which were truncated upstream > (but without restricting because of FKs), leaving the data inconsistent > on downstream (like we do already with DELETE or UPDATE). Or maybe make > it into either subscription or publication option so that user can chose > the behaviour here as I am sure some people will want it to cascade (but > the default should still IMHO be to not cascade as that's safer). Maybe I'm not understanding what is being proposed here, but it sounds like you're saying that if somebody removes a bunch of data on the logical master, replication will remove only some of it from the servers to which the change is replicated. That seems crazy. Then replication can't be counted on to produce a replica. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
RE: [HACKERS][PATCH] Applying PMDK to WAL operations for persistent memory
From: Robert Haas [mailto:robertmh...@gmail.com]> On Thu, Jan 25, 2018 at 7:08 PM, Tsunakawa, Takayuki > wrote: > > No, I'm not saying we should make the persistent memory mode the default. > I'm simply asking whether it's time to make open_datasync the default > setting. We can write a notice in the release note for users who still > use ext3 etc. on old systems. If there's no objection, I'll submit a patch > for the next CF. > > Well, like I said, I think that will degrade performance for users of SSDs > or spinning disks. As I showed previously, regular file writes on PCIe flash, *not writes using PMDK on persistent memory*, was 20% faster with open_datasync than with fdatasync. In addition, regular file writes on HDD with ext4 was also 10% faster: -- 5 seconds per test O_DIRECT supported on this platform for open_datasync and open_sync. Compare file sync methods using one 8kB write: (in wal_sync_method preference order, except fdatasync is Linux's default) open_datasync 3408.905 ops/sec 293 usecs/op fdatasync 3111.621 ops/sec 321 usecs/op fsync 3609.940 ops/sec 277 usecs/op fsync_writethrough n/a open_sync 3356.362 ops/sec 298 usecs/op Compare file sync methods using two 8kB writes: (in wal_sync_method preference order, except fdatasync is Linux's default) open_datasync 1892.157 ops/sec 528 usecs/op fdatasync 3284.278 ops/sec 304 usecs/op fsync 3066.655 ops/sec 326 usecs/op fsync_writethrough n/a open_sync 1853.415 ops/sec 540 usecs/op -- And you said open_datasync was significantly faster than fdatasync. Could you show your results? What device and filesystem did you use? Regards Takayuki Tsunakawa
Re: [HACKERS][PATCH] Applying PMDK to WAL operations for persistent memory
On Thu, Jan 25, 2018 at 09:30:45AM -0500, Robert Haas wrote: > On Wed, Jan 24, 2018 at 10:31 PM, Tsunakawa, Takayuki > wrote: >>> This is just a guess, of course. You didn't mention what the underlying >>> storage for your test was? >> >> Uh, your guess was correct. My file system was ext3, where fsync() writes >> all dirty buffers in page cache. > > Oh, ext3 is terrible. I don't think you can do any meaningful > benchmark results on ext3. Use ext4 or, if you prefer, xfs. Or to put it short, the lack of granular syncs in ext3 kills performance for some workloads. Tomas Vondra's presentation on such matters are a really cool read by the way: https://www.slideshare.net/fuzzycz/postgresql-on-ext4-xfs-btrfs-and-zfs (I would have loved seeing this presentation in live). -- Michael signature.asc Description: PGP signature
Re: Temporary tables prevent autovacuum, leading to XID wraparound
On Thu, Jan 25, 2018 at 1:14 AM, Tsunakawa, Takayuki wrote: > * I think temporary tables should not require vacuuming for XID wraparound. > Furtherover, should updates/deletes to temporary tables be in-place instead > of creating garbage, so that any form of vacuum is unnecessary? Other > sessions do not need to read temporary tables. Temporary tables contain XIDs, so they need to be vacuumed for XID wraparound. Otherwise, queries against those tables by the session that created them could yield wrong answers. However, autovacuum can't perform that vacuuming; it would have to be done by the session. I think we should consider having backends try to remove their temporary schema on startup; then, if a temp table in a backend is old enough that it's due for vacuum for wraparound, have autovacuum kill the connection. The former is necessary to prevent sessions from being killed on account of temp tables they "inherited" from a backend that didn't exit cleanly. The in-place update idea won't work for a couple of reasons. First, a command shouldn't see the results of modifications made earlier in the same command. Second, using cursors, it's possible to have more than one distinct snapshot open against a temporary table at the same time. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Documentation of pgcrypto AES key sizes
On Fri, Jan 26, 2018 at 12:33:41PM +1300, Thomas Munro wrote: > I noticed that the documentation for encrypt()/decrypt() says "aes — > AES (Rijndael-128)", but in fact 192 and 256 bit keys are also > supported, whether you build --with-openssl or --without-openssl. > Should that say "AES (Rijndael-128, -192 or -256)" instead? Indeed. Instead of using the keysize as a prefix, I would personally find less confusing if written as "AES (Rijndael with key sizes of 128, 192 or 256 bytes)" instead of the phrasing you are proposing. Well, it is true that "Rijndael-128" and friends are wordings that can be found here and there.. -- Michael signature.asc Description: PGP signature
Re: Boolean partitions syntax
Greetings Amit, * Amit Langote (langote_amit...@lab.ntt.co.jp) wrote: > On 2017/12/20 6:46, Mark Dilger wrote: > >> On Dec 12, 2017, at 10:32 PM, Amit Langote > >> wrote: > >> Added to CF: https://commitfest.postgresql.org/16/1410/ > > > > This compiles and passes the regression tests for me. > > Thanks for the review. Still compiles and passes regression tests, which is good. > > I extended your test a bit to check whether partitions over booleans are > > useful. > > Note specifically the 'explain' output, which does not seem to restrict the > > scan > > to just the relevant partitions. You could easily argue that this is > > beyond the scope > > of your patch (and therefore not your problem), but I doubt it makes much > > sense > > to have boolean partitions without planner support for skipping partitions > > like is > > done for tables partitioned over other data types. > > Yeah. Actually, I'm aware that the planner doesn't work this. While > constraint exclusion (planner's current method of skipping partitions) > does not work with IS TRUE/FALSE/UNKNOWN clauses, the new partition > pruning patch [1] addresses that. In fact, I started this thread prompted > by some discussion about Boolean partitions on that thread [2]. > > That said, someone might argue that we should also fix constraint > exclusion (the current method of partition pruning) so that partition > skipping works correctly for Boolean partitions. For my 2c, at least, I don't think we need to fix constraint exclusion to work for this case and hopefully we'll get the partition pruning patch in but I'm not sure that we really need to wait for that either. Worst case, we can simply document that the planner won't actually exclude boolean-based partitions in this release and then fix it in the future. Looking over this patch, it seems to be in pretty good shape to me except that I'm not sure why you went with the approach of naming the function 'NoCast'. There's a number of other functions just above makeBoolAConst() that don't include a TypeCast and it seems like having makeBoolConst() and makeBoolConstCast() would be more in-line with the existing code (see makeStringConst() and makeStringConstCast() for example, but also makeIntConst(), makeFloatConst(), et al). That would require updating the existing callers that really want a TypeCast result even though they're calling makeBoolAConst(), but that seems like a good improvement to be making. I could see an argument that we should have two patches (one to rename the existing function, another to add support for boolean) but that's really up to whatever committer picks this up. For my 2c, I don't think it's really necessary to split it into two patches. Thanks! Stephen signature.asc Description: PGP signature
Re: list partition constraint shape
Fujita-san, Thanks for the review. On 2018/01/25 21:17, Etsuro Fujita wrote: > Thanks for the updated patch! Some minor comments: > > + /* > + * Construct an ArrayExpr for the non-null partition > + * values > + */ > + arrexpr = makeNode(ArrayExpr); > + arrexpr->array_typeid = > + !type_is_array(key->parttypid[0]) > + ? get_array_type(key->parttypid[0]) > + : key->parttypid[0]; > > We test the type_is_array() above in this bit, so I don't think we need to > test that again here. Ah, you're right. Fixed. > + arrexpr->array_collid = key->parttypcoll[0]; > + arrexpr->element_typeid = key->parttypid[0]; > > We can assume that keynum=0 here, so it would be okay to specify zero as > the offset. But ISTM that that makes code a bit less readable, so I'd > vote for using keynum as the offset and maybe adding an assertion that > keynum should be zero, somewhere in the PARTITION_STRATEGY_LIST block. Agreed, done. > > * Both comments in the following in get_qual_for_list needs to be updated, > because the expression made there isn't necessarily = ANY anymore. > > if (elems) > { > /* Generate the main expression, i.e., keyCol = ANY (arr) */ > opexpr = make_partition_op_expr(key, 0, BTEqualStrategyNumber, > keyCol, (Expr *) elems); > } > else > { > /* If there are no partition values, we don't need an = ANY expr */ > opexpr = NULL; > } Fixed those. Attached updated patch. Thanks again. Regards, Amit From 358b7c0fbbc646939b43b94405aeca96e56ccb9b Mon Sep 17 00:00:00 2001 From: amit Date: Fri, 8 Dec 2017 19:00:46 +0900 Subject: [PATCH v3] Change how list partition constraint is emitted In its current form (key = ANY ()), a list partition constraint expression is not always accepted by the backend's parser. --- src/backend/catalog/partition.c | 102 ++ src/test/regress/expected/create_table.out| 18 - src/test/regress/expected/foreign_data.out| 6 +- src/test/regress/expected/partition_prune.out | 8 +- src/test/regress/sql/create_table.sql | 6 ++ 5 files changed, 97 insertions(+), 43 deletions(-) diff --git a/src/backend/catalog/partition.c b/src/backend/catalog/partition.c index 8adc4ee977..c82a52fc1b 100644 --- a/src/backend/catalog/partition.c +++ b/src/backend/catalog/partition.c @@ -1625,18 +1625,67 @@ make_partition_op_expr(PartitionKey key, int keynum, { case PARTITION_STRATEGY_LIST: { - ScalarArrayOpExpr *saopexpr; - - /* Build leftop = ANY (rightop) */ - saopexpr = makeNode(ScalarArrayOpExpr); - saopexpr->opno = operoid; - saopexpr->opfuncid = get_opcode(operoid); - saopexpr->useOr = true; - saopexpr->inputcollid = key->partcollation[keynum]; - saopexpr->args = list_make2(arg1, arg2); - saopexpr->location = -1; - - result = (Expr *) saopexpr; + ListCell *lc; + List *elems = (List *) arg2, + *elemops = NIL; + + Assert(keynum == 0); + if (type_is_array(key->parttypid[keynum])) + { + foreach (lc, elems) + { + Expr *elem = lfirst(lc), + *elemop; + + elemop = make_opclause(operoid, + BOOLOID, + false, + arg1, elem, + InvalidOid, + key->partcollation[keynum]); + elemops = lappend(elemops, elemop); + } + + result = list_length(elemops) > 1 + ? makeBoolExpr(OR_EXPR, elemops, -1) +
Re: [HACKERS][PATCH] Applying PMDK to WAL operations for persistent memory
On Thu, Jan 25, 2018 at 7:08 PM, Tsunakawa, Takayuki wrote: > No, I'm not saying we should make the persistent memory mode the default. > I'm simply asking whether it's time to make open_datasync the default > setting. We can write a notice in the release note for users who still use > ext3 etc. on old systems. If there's no objection, I'll submit a patch for > the next CF. Well, like I said, I think that will degrade performance for users of SSDs or spinning disks. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Removing WITH clause support in CREATE FUNCTION, for isCachable and isStrict
On Thu, Jan 25, 2018 at 06:30:04PM -0500, Tom Lane wrote: > I poked into the git log and confirmed Michael's statement that > CREATE FUNCTION ... WITH has been documented as deprecated since > 7.3 (commit 94bdc4855 to be exact). Thanks for the confirmation. > I think the original intention was that we'd keep it for PG-specific > function attributes (ie those not found in the SQL spec), so as to > avoid creating keywords for every attribute we thought of. That > intention has been roundly ignored since then, though, since COST > and ROWS and LEAKPROOF and PARALLEL (UN)SAFE have all been implemented > as separate keywords not WITH options. I'm dubious that that was a good > plan, but it's probably too late to change direction now. :( > In short, I'm on board with removing the WITH clause. I've not > reviewed the patch in detail, but will do so and push it if there's > not objections pretty soon. Glad to hear that, thanks! -- Michael signature.asc Description: PGP signature
Re: Fix a typo in autoprewarm.c
On Fri, Dec 22, 2017 at 02:30:56PM +0900, Masahiko Sawada wrote: > Hi, > > Attached a patch for fixing a typo in autoprewarm.c. I'm sorry if I'm > mistaken. > > s/withs/with/ Patch applied to head, thanks. --- > > Regards, > > -- > Masahiko Sawada > NIPPON TELEGRAPH AND TELEPHONE CORPORATION > NTT Open Source Software Center > diff --git a/contrib/pg_prewarm/autoprewarm.c > b/contrib/pg_prewarm/autoprewarm.c > index 006c315..abf8f03 100644 > --- a/contrib/pg_prewarm/autoprewarm.c > +++ b/contrib/pg_prewarm/autoprewarm.c > @@ -368,7 +368,7 @@ apw_load_buffers(void) > if (current_db != blkinfo[i].database) > { > /* > - * Combine BlockRecordInfos for global objects > withs those of > + * Combine BlockRecordInfos for global objects > with those of >* the database. >*/ > if (current_db != InvalidOid) -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Ancient Roman grave inscription +
Re: [HACKERS] GnuTLS support
On Fri, Jan 26, 2018 at 12:27:16AM +0100, Daniel Gustafsson wrote: >> On 25 Jan 2018, at 15:07, Peter Eisentraut >> wrote: >> >> On 1/19/18 13:43, Peter Eisentraut wrote: >>> Comparing the existing {be,fe}-secure-openssl.c with the proposed >>> {be,fe}-secure-gnutls.c, and with half an eye on the previously proposed >>> Apple Secure Transport implementation, I have identified a few more >>> areas of refactoring that should be done in order to avoid excessive >>> copy-and-pasting in the new implementations: >> >> And here is another place that needs cleaning up, where the OpenSSL API >> was used directly. > > +1 on these cleanups. Peter, could you change ssl_version() and ssl_cipher() in sslinfo at the same time please? I think that those should use the generic backend-side APIs as well. sslinfo depends heavily on OpenSSL, OK, but if possible getting this code more generic will help users of sslinfo to get something partially working with other SSL implementations natively. > Regarding this hunk: > > extern int be_tls_get_cipher_bits(Port *port); > extern bool be_tls_get_compression(Port *port); > -extern void be_tls_get_version(Port *port, char *ptr, size_t len); > -extern void be_tls_get_cipher(Port *port, char *ptr, size_t len); > +extern const char *be_tls_get_version(Port *port); > +extern const char *be_tls_get_cipher(Port *port); > extern void be_tls_get_peerdn_name(Port *port, char *ptr, size_t len); > > While only tangentially related to the issue this patch solves, converting > be_tls_get_peerdn_name() to return const char * seems reasonable too to keep > the API consistent. Why? This is not used for error message generation yet. We could always change the API as needed later on. -- Michael signature.asc Description: PGP signature
\describe*
Some of the discussions about making psql more user friendly (more tab completions help, exit, etc) got me thinking about other ways that psql could be more friendly, and the one that comes to mind is our terse but cryptic \d* commands. I think it would be helpful and instructive to have corresponding long-form describe commands. Take describing schemas. Is \dn intuitive? Not really. In hindsight, you may think "yeah, a schema is a namespace", but you never guessed 'n' on the first try, or the second. Looking over exec_command_d() a bit, I think it's a bit of a stretch do have each command handle a long form like this: \describe table my_table or \describe table verbose my_table because then each \d-variant has to account for objects named "table" and "verbose" and that's a path to unhappiness. But if we dash-separated them, then all of the strcmps would be in the 'e' subsection, and each one would just have to know it's long to short translation, and call exec_command_d with the corresponding short command describe => d describe-verbose => d+ describe-aggregates-verbose => da+ describe-roles => du We could even presume the verbose flag in all cases (after all, the user was being verbose...), which would also cut down on tab-completion results, and we could check for interactive mode and display a message like \describe-schemas (short: \dn+) so that the person has the opportunity to learn the corresponding short command. In additional to aiding tab completion discovery of the commands (i.e. typing "\desc" and then hitting tab, it would also make scripts a little more self-documenting. Thoughts?
Re: PATCH: psql tab completion for SELECT
On 26 January 2018 at 13:44, Vik Fearing wrote: > On 01/26/2018 01:28 AM, Edmund Horner wrote: >> The patch mentioned attempts to put savepoints around the tab >> completion query where appropriate. > > I am -1 on this idea. May I ask why? It doesn't stop psql working against older versions, as it checks that the server supports savepoints.
Re: [HACKERS] GnuTLS support
On Fri, Jan 19, 2018 at 01:55:30PM -0500, Robert Haas wrote: > On Wed, Jan 17, 2018 at 10:02 PM, Tom Lane wrote: >> Also, this isn't really a good argument against using uniform names >> for parameters that every implementation is certain to have, like >> ssl_key_file. > > Even then, it's not that hard to imagine minor variations between what > different implementations will accept. The most obvious difference is > probably that they might expect different file formats, but it's also > possible that a Windows-specific implementation might allow omitting > the file extension while some other implementation does not, for > example. I agree that it would probably be fairly low-risk to use one > parameter for the key file for every implementation, but I suggest > that it would be cleaner and less prone to confusion if we enforce a > full separation of parameters. That also spares us having to make a > judgement call about which parameters have semantics close enough that > we need not separate them. Having to debate about the most correct default values for one common parameter depending on N SSL implemententations will likely prove at the end that the most correct answer is just to split all parameters from the start. Some implementations are platform-specific, the format file is one thing that matters. There are also other things like ssl_ciphers which could depend on what kind of cipher types an SSL implementation is able to support... Another thing is that we cannot be 100% sure that one parameter can be set to say GUC_SIGHUP for an SSL implementation and needs to have other modes with another implementations. Of course this can be solved if some ifdefs, but splitting brings clarity. At the end, I agree with the position of Robert to split everything and rename the existing ssl_* parameters to openssl_* if v11 gains a new SSL implementation. -- Michael signature.asc Description: PGP signature
Re: AS OF queries
On Sat, Dec 23, 2017 at 11:53:19PM +0300, konstantin knizhnik wrote: > > On Dec 23, 2017, at 2:08 AM, Greg Stark wrote: > > > On 20 December 2017 at 12:45, Konstantin Knizhnik > > wrote: > > > >> It seems to me that it will be not so difficult to implement them in > >> Postgres - we already have versions of tuples. > >> Looks like we only need to do three things: > >> 1. Disable autovacuum (autovacuum = off) > > > > "The Wheel of Time turns, and Ages come and pass, leaving memories > > that become legend. Legend fades to myth, and even myth is long > > forgotten when the Age that gave it birth comes again" > > > > I think you'll find it a lot harder to get this to work than just > > disabling autovacuum. Notably HOT updates can get cleaned up (and even > > non-HOT updates can now leave tombstone dead line pointers iirc) even > > if vacuum hasn't run. > > > > Yeh, I suspected that just disabling autovacuum was not enough. > I heard (but do no know too much) about microvacuum and hot updates. > This is why I was a little bit surprised when me test didn't show lost of > updated versions. > May be it is because of vacuum_defer_cleanup_age. Well vacuum and single-page pruning do 3 things: 1. remove expired updated rows 2. remove deleted row 3. remove rows from aborted transactions While time travel doesn't want #1 and #2, it probably wants #3. -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Ancient Roman grave inscription +
Re: PATCH: psql tab completion for SELECT
On 01/26/2018 01:28 AM, Edmund Horner wrote: > On 19 January 2018 at 05:37, Vik Fearing wrote: >> On 01/18/2018 01:07 AM, Tom Lane wrote: >>> Edmund Horner writes: On 15 January 2018 at 15:45, Andres Freund wrote: > All worries like this are supposed to check the server version. >>> In psql there are around 200 such tab completion queries, none of which checks the server version. Many would cause the user's transaction to abort if invoked on an older server. Identifying the appropriate server versions for each one would be quite a bit of work. >>> Is there a better way to make this more robust? >>> >>> Maybe it'd be worth the effort to wrap tab completion queries in >>> SAVEPOINT/RELEASE SAVEPOINT if we're inside a user transaction >>> (which we could detect from libpq's state, I believe). >>> >>> That seems like an independent patch, but it'd be a prerequisite >>> if you want to issue tab completion queries with version dependencies. >>> >>> A bigger point here is: do you really want SELECT tab completion >>> to work only against the latest and greatest server version? >>> That would become an argument against committing the feature at all; >>> maybe not enough to tip the scales against it, but still a demerit. >> >> I don't really want such a patch. I use new psql on old servers all the >> time. > > I'm not sure where we got with this. I really should have put this > patch in a separate thread, since it's an independent feature. Yes, it should have been a separate thread. > The patch mentioned attempts to put savepoints around the tab > completion query where appropriate. I am -1 on this idea. -- Vik Fearing +33 6 46 75 15 36 http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support
Re: PATCH: psql tab completion for SELECT
On 19 January 2018 at 05:37, Vik Fearing wrote: > On 01/18/2018 01:07 AM, Tom Lane wrote: >> Edmund Horner writes: >>> On 15 January 2018 at 15:45, Andres Freund wrote: All worries like this are supposed to check the server version. >> >>> In psql there are around 200 such tab completion queries, none of >>> which checks the server version. Many would cause the user's >>> transaction to abort if invoked on an older server. Identifying the >>> appropriate server versions for each one would be quite a bit of work. >> >>> Is there a better way to make this more robust? >> >> Maybe it'd be worth the effort to wrap tab completion queries in >> SAVEPOINT/RELEASE SAVEPOINT if we're inside a user transaction >> (which we could detect from libpq's state, I believe). >> >> That seems like an independent patch, but it'd be a prerequisite >> if you want to issue tab completion queries with version dependencies. >> >> A bigger point here is: do you really want SELECT tab completion >> to work only against the latest and greatest server version? >> That would become an argument against committing the feature at all; >> maybe not enough to tip the scales against it, but still a demerit. > > I don't really want such a patch. I use new psql on old servers all the > time. I'm not sure where we got with this. I really should have put this patch in a separate thread, since it's an independent feature. The patch mentioned attempts to put savepoints around the tab completion query where appropriate.
Re: ALTER TABLE ADD COLUMN fast default
On Thu, Jan 25, 2018 at 6:10 PM, Thomas Munro wrote: > On Wed, Jan 17, 2018 at 2:21 AM, Andrew Dunstan > wrote: >> Yeah, got caught by the bki/pg_attribute changes on Friday. here's an >> updated version. Thanks for looking. > > A boring semi-automated update: this no long compiles, because > 8561e4840c8 added a new call to heap_attisnull(). Pretty sure it just > needs NULL in the third argument. > Yeah, thanks. revised patch attached cheers andrew -- Andrew Dunstanhttps://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services fast_default-v7.patch Description: Binary data
Re: PATCH: psql tab completion for SELECT
On 23 January 2018 at 21:47, Vik Fearing wrote: > Don't forget to include the list :-) > I'm quoting the entirety of the message---which I would normally trim---for > the archives. Thanks for spotting that. Sorry list! > If this were my patch, I'd have one query for 8.0, and then queries for all > currently supported versions if needed. If 9.2 only gets 8.0-level > features, that's fine by me. That limits us to a maximum of six queries. > Just because we keep support for dead versions doesn't mean we have to > retroactively develop for them. Au contraire. >> I'll make another patch version, with these few differences. I managed to do testing against servers on all the versions >= 8.0 and I discovered that prior to version 8.1 they don't support ALL/ANY on oidvectors; so I added another query form. But I don't like having so many different queries, so I am in favour of trimming it down. Can I leave that up to the committer to remove some cases or do we need another patch? > Great! Here's another. psql-select-tab-completion-v3.patch Description: Binary data
RE: [HACKERS][PATCH] Applying PMDK to WAL operations for persistent memory
From: Robert Haas [mailto:robertmh...@gmail.com] > On Wed, Jan 24, 2018 at 10:31 PM, Tsunakawa, Takayuki > wrote: > > As you said, open_datasync was 20% faster than fdatasync on RHEL7.2, on > a LVM volume with ext4 (mounted with options noatime, nobarrier) on a PCIe > flash memory. > > So does that mean it was faster than your PMDK implementation? The PMDK patch is not mine, but is from people in NTT Lab. I'm very curious about the comparison of open_datasync and PMDK, too. > > What do you think about changing the default value of wal_sync_method > on Linux in PG 11? I can understand the concern that users might hit > performance degredation if they are using PostgreSQL on older systems. But > it's also mottainai that many users don't notice the benefits of > wal_sync_method = open_datasync on new systems. > > Well, some day persistent memory may be a common enough storage technology > that such a change makes sense, but these days most people have either SSD > or spinning disks, where the change would probably be a net negative. It > seems more like something we might think about changing in PG 20 or PG 30. No, I'm not saying we should make the persistent memory mode the default. I'm simply asking whether it's time to make open_datasync the default setting. We can write a notice in the release note for users who still use ext3 etc. on old systems. If there's no objection, I'll submit a patch for the next CF. Regards Takayuki Tsunakawa
Re: [HACKERS] Patch: Add --no-comments to skip COMMENTs with pg_dump
Alvaro, * Alvaro Herrera (alvhe...@alvh.no-ip.org) wrote: > Stephen Frost wrote: > > Alvaro, Tom, > > > > * Alvaro Herrera (alvhe...@alvh.no-ip.org) wrote: > > > > Crazy idea: maybe a large fraction of that test could be replaced with > > > comparisons of the "pg_restore -l" output file rather than pg_dump's > > > text output (i.e. the TOC entry for each object, rather than the > > > object's textual representation.) Sounds easier in code than current > > > implementation. Separately, verify that textual representation for each > > > TOC entry type is what we expect. > > > > I'm not sure how that's different..? We do check that the textual > > representation is what we expect, both directly from pg_dump and from > > pg_restore output, and using the exact same code to check both (which I > > generally think is a good thing since we want the results from both to > > more-or-less match up). What you're proposing here sounds like we're > > throwing that away in favor of keeping all the same code to test the > > textual representation but then only checking for listed contents from > > pg_restore instead of checking that the textual representation is > > correct. That doesn't actually reduce the amount of code though.. > > Well, the current implementation compares a dozen of pg_dump output text > files, three hundred lines apiece, against a thousand of regexes (give > or take). Whenever there is a mismatch, what you get is "this regexp > failed to match " (or sometimes "matched when it > should have not"), so looking for the mismatch is quite annoying. Yeah, the big output isn't fun, I agree with that. > My proposal is that instead of looking at three hundred lines, you'd > look for 50 lines of `pg_restore -l` output -- is element XYZ in there > or not. Quite a bit simpler for the guy adding a new test. This tests > combinations of pg_dump switches: are we dumping the right set of > objects. That might be possible to do, though I'm not sure that it'd really be only 50 lines. > *Separately* test that each individual TOC entry type ("table data", > "index", "tablespace") is dumped as this or that SQL command, where you > create a separate dump file for each object type. So you match a single > TOC entry to a dozen lines of SQL, half of them comments -- pretty easy > to see where a mismatch is. Yikes. If you're thinking that we would call pg_restore independently for each object, I'd be worried about the runtime increasing quite a bit. There's also a few cases where we check dependencies between objects that we'd need to be mindful of, though those we might be able to take care of, if we can keep the runtime down. Certainly, part of the way the existing code works was specifically to try and minimize the runtime. Perhaps the approach taken of minimizing the invocations and building a bunch of regexps to run against the resulting output is more expensive than running pg_restore a bunch of times, but I'd want to see that to be sure as some really simple timings locally on a 6.5k -Fc dump file seems to take 30-40ms just to produce the TOC. This also doesn't really help with the big perl hash, but these two ideas don't seem to conflict, so perhaps we could possibly still tackle the big perl hash with the approach I described up-thread and also find a way to implement this, if it doesn't cause the regression tests to be too much slower. Thanks! Stephen signature.asc Description: PGP signature
Should we introduce a "really strict" volatility about return values?
Hi, Currently a good bit of time evaluation more complex expressions is spent checking whether a strict [transition] function should be called or not. There's a good number of cases where we could optimize that away based on the underlying data - if e.g. a slot column refers to a NOT NULL column (cf [1]), we do not need to do the NOT NULL checks. A significant limit of that is that for even slightly morecomplex expressions, say a + b + c that doesn't work for all the operator invocations, because even though functions like int8pl are strict, we do *not* guarantee that the return value is also strict. Thus I wonder if we shouldn't introduce a 'really strict' volatility level that signals, in addition to normal strict guarantees, that no other case returns NULL. In a lot of cases that should allow us to completely eliminate strictness checks from execution. I don't plan to work on this anytime soon, but I though it's interesting enough to put out there and see what others think. Greetings, Andres Freund [1] http://archives.postgresql.org/message-id/20171206093717.vqdxe5icqttpxs3p%40alap3.anarazel.de
Documentation of pgcrypto AES key sizes
Hi, I noticed that the documentation for encrypt()/decrypt() says "aes — AES (Rijndael-128)", but in fact 192 and 256 bit keys are also supported, whether you build --with-openssl or --without-openssl. Should that say "AES (Rijndael-128, -192 or -256)" instead? -- Thomas Munro http://www.enterprisedb.com
Re: Removing WITH clause support in CREATE FUNCTION, for isCachable and isStrict
Daniel Gustafsson writes: >> On 15 Jan 2018, at 03:27, Michael Paquier wrote: >> As noticed by Daniel here: >> https://www.postgresql.org/message-id/d5f34c9d-3ab7-4419-af2e-12f67581d...@yesql.se > In that thread I proposed a patch to fix this, but I certainly don’t object to > just removing it to make both syntax and code clearer. +1 I poked into the git log and confirmed Michael's statement that CREATE FUNCTION ... WITH has been documented as deprecated since 7.3 (commit 94bdc4855 to be exact). I think the original intention was that we'd keep it for PG-specific function attributes (ie those not found in the SQL spec), so as to avoid creating keywords for every attribute we thought of. That intention has been roundly ignored since then, though, since COST and ROWS and LEAKPROOF and PARALLEL (UN)SAFE have all been implemented as separate keywords not WITH options. I'm dubious that that was a good plan, but it's probably too late to change direction now. The only other argument I can see for keeping it is that pre-7.3 dump files could contain CREATE FUNCTION commands with the old spelling. But, even assuming somebody is still running <= 7.2 and wants to migrate to >= 11 in one jump, they could use some intermediate version of pg_dump to produce a dump with the modern spelling, or just whip out their trusty text editor. Maintaining backwards compatibility is good, but 15 years seems like enough. In short, I'm on board with removing the WITH clause. I've not reviewed the patch in detail, but will do so and push it if there's not objections pretty soon. regards, tom lane
Re: [HACKERS] GnuTLS support
> On 25 Jan 2018, at 15:07, Peter Eisentraut > wrote: > > On 1/19/18 13:43, Peter Eisentraut wrote: >> Comparing the existing {be,fe}-secure-openssl.c with the proposed >> {be,fe}-secure-gnutls.c, and with half an eye on the previously proposed >> Apple Secure Transport implementation, I have identified a few more >> areas of refactoring that should be done in order to avoid excessive >> copy-and-pasting in the new implementations: > > And here is another place that needs cleaning up, where the OpenSSL API > was used directly. +1 on these cleanups. Regarding this hunk: extern int be_tls_get_cipher_bits(Port *port); extern bool be_tls_get_compression(Port *port); -extern void be_tls_get_version(Port *port, char *ptr, size_t len); -extern void be_tls_get_cipher(Port *port, char *ptr, size_t len); +extern const char *be_tls_get_version(Port *port); +extern const char *be_tls_get_cipher(Port *port); extern void be_tls_get_peerdn_name(Port *port, char *ptr, size_t len); While only tangentially related to the issue this patch solves, converting be_tls_get_peerdn_name() to return const char * seems reasonable too to keep the API consistent. cheers ./daniel
Re: [HACKERS] Optional message to user when terminating/cancelling backend
> On 24 Jan 2018, at 16:45, Alvaro Herrera wrote: > > A quick suggestion from a passer-by -- would you put the new code in > src/backend/storage/ipc/backend_signal.c instead? Sounds like a better > place than utils/misc/; and "signal" is more general in nature than just > "cancel". A bunch of stuff from utils/adt/misc.c (???) can migrate to > the new file -- everything from line 201 to 362 at least, that is: > > pg_signal_backend > pg_cancel_backend > pg_terminate_backend > pg_reload_conf > pg_rotate_logfile +1, this makes a lot of sense. When writing this I didn’t find a good fit anywhere so I modelled it after utils/misc/backend_random.c, not because it was a good fit but it was the least bad one I could come up with. This seems a lot cleaner. > Maybe have two patches, 0001 creates the files moving the contents over, > then 0002 adds your new stuff on top. The two attached patches implements this. > /me wonders if there's anything that would suggest to make this > extensive to processes other than backends ... Perhaps, do you have an API in mind? cheers ./daniel 0001-Refactor-backend-signalling-code-v6.patch Description: Binary data 0002-Support-optional-message-in-backend-cancel-terminate-v6.patch Description: Binary data
Re: PATCH: pgbench - option to build using ppoll() for larger connection counts
Hello Doug, This time with the revised patch file: pgbench11-ppoll-v8.patch Patch applies cleanly. Compiles cleanly and runs fine in both ppoll & select cases. I'm okay with having a preferred ppoll implementation because of its improved capability. A few minor additional comments/suggestions: Cpp has an #elif that could be used to manage the ppoll/select alternative. It is already used elsewhere in the file. Or not. I must admit that I'm not fond of the alloc_socket_set trick with MAXCLIENTS, especially without any comment. I'd suggest to just have two distinct functions in their corresponding sections. I would add a comment that free_socket_set code is common to both versions, and maybe consider moving it afterwards. Or maybe just duplicate if in each section for homogeneity. It looks like error_on_socket and ignore_socket should return a boolean instead of an int. Also, maybe simplify the implementation of the later by avoiding the ?: expression. ISTM that the error_on_socket function in the ppoll case deserves some comments, especially on the condition. [...] Replaced USE_PPOLL with HAVE_PPOLL as having both seems redundant. I'm okay with that. I'm wondering whether there should be a way to force using one or the other when both are available. Not sure. -- Fabien.
Re: [HACKERS] Patch: Add --no-comments to skip COMMENTs with pg_dump
Alvaro Herrera writes: > Well, the current implementation compares a dozen of pg_dump output text > files, three hundred lines apiece, against a thousand of regexes (give > or take). Whenever there is a mismatch, what you get is "this regexp > failed to match " (or sometimes "matched when it > should have not"), so looking for the mismatch is quite annoying. Yeah, I've been getting my nose rubbed in that over the past couple days of pg_dump hacking: when you get a failure, the output is really unfriendly, even by the seriously low standards of the rest of our TAP tests. I'm not sure if Alvaro's idea is the best way to improve that, but it's something to think about. The thing that was annoying me though is that it seems like every test case interacts with every other test case, or at least way more of them than one could wish, due to the construction of that big hash constant. That's what I find unmaintainable. regards, tom lane
Re: MCV lists for highly skewed distributions
Dean Rasheed writes: > It occurs to me that maybe a better test to exclude a value from the > MCV list would be to demand that its relative standard error not be > too high. Such a test, in addition to the existing tests, might be > sufficient to solve the opposite problem of too many values in the MCV > list, because the real problem there is including a value after having > seen relatively few occurrences of it in the sample, and thus having a > wildly inaccurate estimate for it. Setting a bound on the relative > standard error would mean that we could have a reasonable degree of > confidence in estimates produced from the sample. This patch is marked Ready for Committer, but that seems wildly optimistic based on the state of the discussion. It doesn't look to me like we even have consensus on an algorithm, let alone code for it. Certainly, whatever we do needs to address the too-many-MCVs issue from [1] as well as Jeff's too-few-MCVs case. Even if we had a plausible patch, I'm not sure how we get to the point of having enough consensus to commit. In the previous thread, it seemed that some people would object to any change whatsoever :-( In any case, since it looks like the next step is for someone to come up with a new proposal, I'm going to set this to Waiting on Author. regards, tom lane [1] https://www.postgresql.org/message-id/flat/32261.1496611829%40sss.pgh.pa.us
Re: [HACKERS] Patch: Add --no-comments to skip COMMENTs with pg_dump
Stephen Frost wrote: > Alvaro, Tom, > > * Alvaro Herrera (alvhe...@alvh.no-ip.org) wrote: > > Crazy idea: maybe a large fraction of that test could be replaced with > > comparisons of the "pg_restore -l" output file rather than pg_dump's > > text output (i.e. the TOC entry for each object, rather than the > > object's textual representation.) Sounds easier in code than current > > implementation. Separately, verify that textual representation for each > > TOC entry type is what we expect. > > I'm not sure how that's different..? We do check that the textual > representation is what we expect, both directly from pg_dump and from > pg_restore output, and using the exact same code to check both (which I > generally think is a good thing since we want the results from both to > more-or-less match up). What you're proposing here sounds like we're > throwing that away in favor of keeping all the same code to test the > textual representation but then only checking for listed contents from > pg_restore instead of checking that the textual representation is > correct. That doesn't actually reduce the amount of code though.. Well, the current implementation compares a dozen of pg_dump output text files, three hundred lines apiece, against a thousand of regexes (give or take). Whenever there is a mismatch, what you get is "this regexp failed to match " (or sometimes "matched when it should have not"), so looking for the mismatch is quite annoying. My proposal is that instead of looking at three hundred lines, you'd look for 50 lines of `pg_restore -l` output -- is element XYZ in there or not. Quite a bit simpler for the guy adding a new test. This tests combinations of pg_dump switches: are we dumping the right set of objects. *Separately* test that each individual TOC entry type ("table data", "index", "tablespace") is dumped as this or that SQL command, where you create a separate dump file for each object type. So you match a single TOC entry to a dozen lines of SQL, half of them comments -- pretty easy to see where a mismatch is. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Invalid result from hash_page_items function
Hi, While researching hash index, I found that hash_page_items could return an invalid result as follows. postgres(1:1056)=# create table hash_test (c int); CREATE TABLE postgres(1:1056)=# insert into hash_test select generate_series(1,50) % 5; INSERT 0 50 postgres(1:1056)=# create index hash_idx on hash_test using hash (c); CREATE INDEX postgres(1:1056)=# create extension pageinspect ; CREATE EXTENSION postgres(1:1056)=# select * from hash_page_items(get_raw_page('hash_idx', 4)); itemoffset | ctid | data ++ 1 | (0,49) | 3283889963 2 | (2139062143,32639) | 2139062143 3 | (2139062143,32639) | 2139062143 4 | (2139062143,32639) | 2139062143 5 | (2139062143,32639) | 2139062143 6 | (2139062143,32639) | 2139062143 7 | (2139062143,32639) | 2139062143 8 | (2139062143,32639) | 2139062143 9 | (2139062143,32639) | 2139062143 10 | (2139062143,32639) | 2139062143 (10 rows) This appears at PostgreSQL 10 and current HEAD. The cause of this seems that hash_page_items allocates the memory space for the page before switching memory context. AFAICS there is no similar problem in pageinspect contrib module. Attached patch fixes it. Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center fix_hash_page_items.patch Description: Binary data
Re: After dropping the rule - Not able to insert / server crash (one time ONLY)
Can someone review this thread and determine if this patch is needed? Thanks. --- On Fri, Dec 22, 2017 at 04:58:47PM +0530, Dilip Kumar wrote: > On Mon, Dec 11, 2017 at 4:33 PM, Dilip Kumar wrote: > > On Mon, Dec 11, 2017 at 3:54 PM, tushar > wrote: > > Hi, > > While testing something , I found that even after rule has dropped > not > able to insert data and in an another scenario , there is a Crash/ > > Please refer this scenario's - > > 1) Rows not inserted after dropping the RULE > > postgres=# create table e(n int); > CREATE TABLE > postgres=# create rule e1 as on insert to e do instead nothing; > CREATE RULE > postgres=# create function e11(n int) returns int as $$ begin insert > into e values(10); return 1; end; $$ language 'plpgsql'; > CREATE FUNCTION > postgres=# select e11(1); > e11 > - > 1 > (1 row) > > postgres=# select e11(1); > e11 > - > 1 > (1 row) > > postgres=# select * from e; => Expected , due to the RULE enforced > n > --- > (0 rows) > > > postgres=# drop rule e1 on e; ==>Drop the rule > DROP RULE > > postgres=# select e11(1); =>again try to insert into table > e11 > - > 1 > (1 row) > > postgres=# select * from e; =>This time , value should be inserted > but > still showing 0 row. > n > --- > (0 rows) > > > I think this is becuase we are not invalidating the plan cache if rule is > dropped. You can reproduce same with prepared statements. > > > 2)Server crash (one time) > > postgres=# create table en(n int); > CREATE TABLE > postgres=# create function en1(n int) returns int as $$ begin insert > into en values(10); return 1; end; $$ language 'plpgsql'; > CREATE FUNCTION > postgres=# > postgres=# select en1(1); > en1 > - > 1 > (1 row) > > postgres=# select * from en; > n > > 10 > (1 row) > > postgres=# create rule en11 as on insert to en do instead nothing; > CREATE RULE > postgres=# select en1(1); > ostgres=# select en1(1); > TRAP: FailedAssertion("!(!stmt->mod_stmt)", File: "pl_exec.c", Line: > 3721) > server closed the connection unexpectedly > This probably means the server terminated abnormally > before or while processing the request. > > > I have looked into this crash, seems like below assert in > exec_stmt_execsql > is not correct > > case SPI_OK_REWRITTEN: > > Assert(!stmt->mod_stmt); > > In this issue first time execution of select en1(1); statement, stmt-> > mod_stmt is set to true for the insert query inside the function. Then > before next execution we have created the rule "create rule en11 as on > insert to en do instead nothing;". Because of this nothing will be > executed and output will be SPI_OK_REWRITTEN. But we are asserting that > stmt->mod_stmt should be false (which were set to true in first > execution). > > > IMHO if the query is rewritten, then this assert is not valid. I have attached > a patch which removes this assert. > > -- > Regards, > Dilip Kumar > EnterpriseDB: http://www.enterprisedb.com > diff --git a/src/pl/plpgsql/src/pl_exec.c b/src/pl/plpgsql/src/pl_exec.c > index dd575e7..9b85df8 100644 > --- a/src/pl/plpgsql/src/pl_exec.c > +++ b/src/pl/plpgsql/src/pl_exec.c > @@ -3727,8 +3727,6 @@ exec_stmt_execsql(PLpgSQL_execstate *estate, > break; > > case SPI_OK_REWRITTEN: > - Assert(!stmt->mod_stmt); > - > /* >* The command was rewritten into another kind of > command. It's >* not clear what FOUND would mean in that case (and > SPI doesn't -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Ancient Roman grave inscription +
Re: reducing isolation tests runtime
Tom Lane wrote: > Alvaro Herrera writes: > > Here's a concrete proposal. Runtime is 45.7 seconds on my laptop. It > > can be further reduced, but not by more than a second or two unless you > > get in the business of modifying other tests. (I only modified > > deadlock-soft-2 because it saves 5 seconds). > > Looks reasonable to me, but do we want to set any particular convention > about the max number of tests to run in parallel? If so there should > be at least a comment saying what. Hmm, I ran this in a limited number of connections and found that it fails with less than 27; and there's no MAX_CONNECTIONS like there is for pg_regress. So I'll put this back on the drawing board until I'm back from vacations ... -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: [HACKERS] Vacuum: allow usage of more than 1GB of work mem
On Fri, Jan 26, 2018 at 9:38 AM, Claudio Freire wrote: > I had the tests running in a loop all day long, and I cannot reproduce > that variance. > > Can you share your steps to reproduce it, including configure flags? Here are two build logs where it failed: https://travis-ci.org/postgresql-cfbot/postgresql/builds/332968819 https://travis-ci.org/postgresql-cfbot/postgresql/builds/332592511 Here's one where it succeeded: https://travis-ci.org/postgresql-cfbot/postgresql/builds/333139855 The full build script used is: ./configure --enable-debug --enable-cassert --enable-coverage --enable-tap-tests --with-tcl --with-python --with-perl --with-ldap --with-icu && make -j4 all contrib docs && make -Otarget -j3 check-world This is a virtualised 4 core system. I wonder if "make -Otarget -j3 check-world" creates enough load on it to produce some weird timing effect that you don't see on your development system. -- Thomas Munro http://www.enterprisedb.com
Re: reducing isolation tests runtime
Alvaro Herrera writes: > Here's a concrete proposal. Runtime is 45.7 seconds on my laptop. It > can be further reduced, but not by more than a second or two unless you > get in the business of modifying other tests. (I only modified > deadlock-soft-2 because it saves 5 seconds). Looks reasonable to me, but do we want to set any particular convention about the max number of tests to run in parallel? If so there should be at least a comment saying what. > Admittedly the new isolation_schedule file is a bit ugly. Meh, seems fine. We won't know if this really works till it hits the buildfarm, I suppose. regards, tom lane
Re: [HACKERS] Patch: Add --no-comments to skip COMMENTs with pg_dump
Alvaro, Tom, * Alvaro Herrera (alvhe...@alvh.no-ip.org) wrote: > Tom Lane wrote: > > > The changes in t/002_pg_dump.pl largely failed to apply, which is > > partially due to the age of the patch but IMO speaks more to the > > unmaintainability of that TAP test. It still didn't run after I'd > > manually fixed the merge failures, so I gave up in disgust and > > did not push any of the test changes. If someone is excited enough > > about testing this, they can submit a followon patch for that, > > but I judge it not really worth either the human effort or the > > future testing cycles. > > > > (Am I right in thinking that 002_pg_dump.pl is, by design, roughly > > O(N^2) in the number of features tested? Ick.) It certainly tries to cover a lot of combinations, but I did try to write it to not completely cross everything against everything but instead to try and cover the different code paths. > Yeah, that 002 test is pretty nasty stuff. I think we only put up with > it because it's the only idea we've come up with; maybe there are better > ideas. I've discussed the 'better idea' for it previously, which is to essentially break out the various parts of the big perl hash into multilpe independent files that would be a lot easier to work with and manage, and perhaps make them not be perl hashes but something else. In other words, make the core perl code similar to pg_regress and the input/output files likewise similar to how pg_regress works. Maybe we should rework the whole thing to work with pg_regress directly but we'd need to teach it about regexp's, I'd think.. > Crazy idea: maybe a large fraction of that test could be replaced with > comparisons of the "pg_restore -l" output file rather than pg_dump's > text output (i.e. the TOC entry for each object, rather than the > object's textual representation.) Sounds easier in code than current > implementation. Separately, verify that textual representation for each > TOC entry type is what we expect. I'm not sure how that's different..? We do check that the textual representation is what we expect, both directly from pg_dump and from pg_restore output, and using the exact same code to check both (which I generally think is a good thing since we want the results from both to more-or-less match up). What you're proposing here sounds like we're throwing that away in favor of keeping all the same code to test the textual representation but then only checking for listed contents from pg_restore instead of checking that the textual representation is correct. That doesn't actually reduce the amount of code though.. Thanks! Stephen signature.asc Description: PGP signature
Re: proposal: alternative psql commands quit and exit
On Mon, Jan 15, 2018 at 11:10:44AM -0500, Robert Haas wrote: > On Mon, Jan 15, 2018 at 10:57 AM, Tom Lane wrote: > > Robert Haas writes: > >> I've discovered one thing about this design that is not so good, which > >> is that if you open a single, double, or dollar quote, then the > >> instructions that are provided under that design do not work: > > > > I was kind of imagining that we could make the hint text vary depending > > on the parsing state. Maybe that's too hard to get to --- but if the > > prompt-creating code knows what it is, perhaps this can too. > > prompt_status does seem to be available in psql's MainLoop(), so I > think that could be done, but the problem is that I don't know exactly > what message would be useful to print when we're in a "in quotes" > state. If I had a good message text for that state, I might just > choose to use it always rather than multiplying the number of > messages. > > More broadly, I think what is needed here is less C-fu than > English-fu. If we come up with something good, we can make it print > that thing. I just read this thread and have some ideas. First, the reason 'help' was added so easily is because it pointed users at getting more information, and it required to be the first command in the query buffer. I realize we are now considering allowing 'help', 'quit', and 'exit' to appear alone on a line with whitespace before it, and remove the requirement that it be the first thing in the query buffer. I think there are a few things to consider. First, allowing whitespace to be before the keyword --- do we really think that typing exit will more likely be typed by a user trying to exit than part of an SQL query? I think requiring no space around the keyword would reduce the number of false hints. Second, I am thinking we can check prompt_status and report "Use \q" if we are not in a quoted string, and suggest "Use Control-D then \q" (or "Use Control-C then \q" on Windows) and be done with it. By printing the Control-D only when we are in a quoted strong, we minimize the number of times that we are wrong due to stty changes. -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Ancient Roman grave inscription +
Re: [HACKERS] Patch: Add --no-comments to skip COMMENTs with pg_dump
Tom Lane wrote: > The changes in t/002_pg_dump.pl largely failed to apply, which is > partially due to the age of the patch but IMO speaks more to the > unmaintainability of that TAP test. It still didn't run after I'd > manually fixed the merge failures, so I gave up in disgust and > did not push any of the test changes. If someone is excited enough > about testing this, they can submit a followon patch for that, > but I judge it not really worth either the human effort or the > future testing cycles. > > (Am I right in thinking that 002_pg_dump.pl is, by design, roughly > O(N^2) in the number of features tested? Ick.) Yeah, that 002 test is pretty nasty stuff. I think we only put up with it because it's the only idea we've come up with; maybe there are better ideas. Crazy idea: maybe a large fraction of that test could be replaced with comparisons of the "pg_restore -l" output file rather than pg_dump's text output (i.e. the TOC entry for each object, rather than the object's textual representation.) Sounds easier in code than current implementation. Separately, verify that textual representation for each TOC entry type is what we expect. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: [HACKERS] Vacuum: allow usage of more than 1GB of work mem
On Thu, Jan 25, 2018 at 10:56 AM, Claudio Freire wrote: > On Thu, Jan 25, 2018 at 4:11 AM, Thomas Munro > wrote: >> On Thu, Jan 18, 2018 at 9:17 AM, Claudio Freire >> wrote: >>> Huh. That was simpler than I thought. >>> >>> Attached rebased versions. >> >> Hi Claudio, >> >> FYI the regression test seems to have some run-to-run variation. >> Though it usually succeeds, recently I have seen a couple of failures >> like this: >> >> = Contents of ./src/test/regress/regression.diffs >> *** >> /home/travis/build/postgresql-cfbot/postgresql/src/test/regress/expected/vacuum.out >> 2018-01-24 01:41:28.200454371 + >> --- >> /home/travis/build/postgresql-cfbot/postgresql/src/test/regress/results/vacuum.out >> 2018-01-24 01:51:07.970049937 + >> *** >> *** 128,134 >> SELECT pg_relation_size('vactst', 'main'); >>pg_relation_size >> -- >> ! 0 >> (1 row) >> >> SELECT count(*) FROM vactst; >> --- 128,134 >> SELECT pg_relation_size('vactst', 'main'); >>pg_relation_size >> -- >> ! 8192 >> (1 row) >> >> SELECT count(*) FROM vactst; >> == >> >> -- >> Thomas Munro >> http://www.enterprisedb.com > > I'll look into it I had the tests running in a loop all day long, and I cannot reproduce that variance. Can you share your steps to reproduce it, including configure flags?
Re: reducing isolation tests runtime
Tom Lane wrote: > Alvaro Herrera writes: > > I think we could solve this by putting in the same parallel group only > > slow tests that mostly sleeps, ie. nothing that would monopolize CPU for > > long enough to cause a problem. Concretely: > > test: timeouts tuplelock-update deadlock-hard deadlock-soft-2 > > OK, but there'd better be a comment there explaining the concern > very precisely, or somebody will break it. Here's a concrete proposal. Runtime is 45.7 seconds on my laptop. It can be further reduced, but not by more than a second or two unless you get in the business of modifying other tests. (I only modified deadlock-soft-2 because it saves 5 seconds). Admittedly the new isolation_schedule file is a bit ugly. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services >From f80a2fd9e3320d033017e17a648ac1ca13396ef0 Mon Sep 17 00:00:00 2001 From: Alvaro Herrera Date: Thu, 25 Jan 2018 17:11:03 -0300 Subject: [PATCH] Parallelize isolation tests a bit It is possible to run isolation tests in parallel, which saves quite a bit of runtime (from 76 to 45 seconds in my machine). Test 'timeout' has strict timing requirements, which may fail in animals that are slow (either because of running under valgrind, doing cache clobbering or expensive memory checks, or hardware is just slow), so only put it in the same group with other tests that mostly sleep, to avoid disruption. Discussion: https://postgr.es/m/20180124231006.z7spaz5gkzbdvob5@alvherre.pgsql --- src/test/isolation/expected/deadlock-soft-2.out | 12 ++-- src/test/isolation/isolation_schedule | 84 + src/test/isolation/specs/deadlock-soft-2.spec | 22 +++ 3 files changed, 33 insertions(+), 85 deletions(-) diff --git a/src/test/isolation/expected/deadlock-soft-2.out b/src/test/isolation/expected/deadlock-soft-2.out index 14b0343ba4..5c33a5cdaa 100644 --- a/src/test/isolation/expected/deadlock-soft-2.out +++ b/src/test/isolation/expected/deadlock-soft-2.out @@ -1,12 +1,12 @@ Parsed test spec with 4 sessions starting permutation: s1a s2a s2b s3a s4a s1b s1c s2c s3c s4c -step s1a: LOCK TABLE a1 IN SHARE UPDATE EXCLUSIVE MODE; -step s2a: LOCK TABLE a2 IN ACCESS SHARE MODE; -step s2b: LOCK TABLE a1 IN SHARE UPDATE EXCLUSIVE MODE; -step s3a: LOCK TABLE a2 IN ACCESS EXCLUSIVE MODE; -step s4a: LOCK TABLE a2 IN ACCESS EXCLUSIVE MODE; -step s1b: LOCK TABLE a2 IN SHARE UPDATE EXCLUSIVE MODE; +step s1a: LOCK TABLE aa1 IN SHARE UPDATE EXCLUSIVE MODE; +step s2a: LOCK TABLE aa2 IN ACCESS SHARE MODE; +step s2b: LOCK TABLE aa1 IN SHARE UPDATE EXCLUSIVE MODE; +step s3a: LOCK TABLE aa2 IN ACCESS EXCLUSIVE MODE; +step s4a: LOCK TABLE aa2 IN ACCESS EXCLUSIVE MODE; +step s1b: LOCK TABLE aa2 IN SHARE UPDATE EXCLUSIVE MODE; step s1b: <... completed> step s1c: COMMIT; step s2b: <... completed> diff --git a/src/test/isolation/isolation_schedule b/src/test/isolation/isolation_schedule index 74d7d59546..0df00c6a53 100644 --- a/src/test/isolation/isolation_schedule +++ b/src/test/isolation/isolation_schedule @@ -1,68 +1,16 @@ -test: read-only-anomaly -test: read-only-anomaly-2 -test: read-only-anomaly-3 -test: read-write-unique -test: read-write-unique-2 -test: read-write-unique-3 -test: read-write-unique-4 -test: simple-write-skew -test: receipt-report -test: temporal-range-integrity -test: project-manager -test: classroom-scheduling -test: total-cash -test: referential-integrity -test: ri-trigger -test: partial-index -test: two-ids -test: multiple-row-versions -test: index-only-scan -test: deadlock-simple -test: deadlock-hard -test: deadlock-soft -test: deadlock-soft-2 -test: fk-contention -test: fk-deadlock -test: fk-deadlock2 -test: eval-plan-qual -test: lock-update-delete -test: lock-update-traversal -test: insert-conflict-do-nothing -test: insert-conflict-do-nothing-2 -test: insert-conflict-do-update -test: insert-conflict-do-update-2 -test: insert-conflict-do-update-3 -test: insert-conflict-toast -test: delete-abort-savept -test: delete-abort-savept-2 -test: aborted-keyrevoke -test: multixact-no-deadlock -test: multixact-no-forget -test: lock-committed-update -test: lock-committed-keyupdate -test: update-locked-tuple -test: propagate-lock-delete -test: tuplelock-conflict -test: tuplelock-update -test: freeze-the-dead -test: nowait -test: nowait-2 -test: nowait-3 -test: nowait-4 -test: nowait-5 -test: skip-locked -test: skip-locked-2 -test: skip-locked-3 -test: skip-locked-4 -test: drop-index-concurrently-1 -test: multiple-cic -test: alter-table-1 -test: alter-table-2 -test: alter-table-3 -test: alter-table-4 -test: create-trigger -test: sequence-ddl -test: async-notify -test: vacuum-reltuples -test: timeouts -test: vacuum-concurrent-drop +# In the first group of parallel tests, test "timeout" might be susceptible to +# additional load placed on the machine, so avoid anything in this group that +# does much more other than sleeping.
Re: [HACKERS] Patch: Add --no-comments to skip COMMENTs with pg_dump
Robins Tharakan writes: > Attached is an updated patch (v4) with basic tests for pg_dump / pg_dumpall. I've reviewed and pushed this, after making some changes so that the switch works in pg_restore as well, and minor cosmetic adjustments. The changes in t/002_pg_dump.pl largely failed to apply, which is partially due to the age of the patch but IMO speaks more to the unmaintainability of that TAP test. It still didn't run after I'd manually fixed the merge failures, so I gave up in disgust and did not push any of the test changes. If someone is excited enough about testing this, they can submit a followon patch for that, but I judge it not really worth either the human effort or the future testing cycles. (Am I right in thinking that 002_pg_dump.pl is, by design, roughly O(N^2) in the number of features tested? Ick.) regards, tom lane
Re: PATCH: Exclude unlogged tables from base backups
On 1/25/18 12:31 AM, Masahiko Sawada wrote: > On Thu, Jan 25, 2018 at 3:25 AM, David Steele wrote: >>> >>> Here is the first review comments. >>> >>> + unloggedDelim = strrchr(path, '/'); >>> >>> I think it doesn't work fine on windows. How about using >>> last_dir_separator() instead? >> >> I think this function is OK on Windows -- we use it quite a bit. >> However, last_dir_separator() is clearer so I have changed it. > > Thank you for updating this. I was concerned about a separator > character '/' might not work fine on windows. Ah yes, I see what you mean now. > On Thu, Jan 25, 2018 at 6:23 AM, David Steele wrote: >> On 1/24/18 4:02 PM, Adam Brightwell wrote: >> Actually, I was talking to Stephen about this it seems like #3 would be >> more practical if we just stat'd the init fork for each relation file >> found. I doubt the stat would add a lot of overhead and we can track >> each unlogged relation in a hash table to reduce overhead even more. >> > > Can the readdir handle files that are added during the loop? I think > that we still cannot exclude a new unlogged relation if the relation > is added after we execute readdir first time. To completely eliminate > it we need a sort of lock that prevents to create new unlogged > relation from current backends. Or we need to do readdir loop multiple > times to see if no new relations were added during sending files. As far as I know readdir() is platform-dependent in terms of how it scans the dir and if files created after the opendir() will appear. It shouldn't matter, though, since WAL replay will recreate those files. > If you're updating the patch to implement #3, this patch should be > marked as "Waiting on Author". After updated I'll review it again. Attached is a new patch that uses stat() to determine if the init fork for a relation file exists. I decided not to build a hash table as it could use considerable memory and I didn't think it would be much faster than a simple stat() call. The reinit.c refactor has been removed since it was no longer needed. I'll submit the tests I wrote for reinit.c as a separate patch for the next CF. Thanks, -- -David da...@pgmasters.net diff --git a/doc/src/sgml/protocol.sgml b/doc/src/sgml/protocol.sgml index 4c5ed1e6d6..5854ec1533 100644 --- a/doc/src/sgml/protocol.sgml +++ b/doc/src/sgml/protocol.sgml @@ -2552,6 +2552,12 @@ The commands accepted in walsender mode are: with pgsql_tmp. + + + Unlogged relations, except for the init fork which is required to + recreate the (empty) unlogged relation on recovery. + + pg_wal, including subdirectories. If the backup is run diff --git a/src/backend/replication/basebackup.c b/src/backend/replication/basebackup.c index dd7ad64862..688790ad0d 100644 --- a/src/backend/replication/basebackup.c +++ b/src/backend/replication/basebackup.c @@ -26,6 +26,7 @@ #include "nodes/pg_list.h" #include "pgtar.h" #include "pgstat.h" +#include "port.h" #include "postmaster/syslogger.h" #include "replication/basebackup.h" #include "replication/walsender.h" @@ -33,6 +34,7 @@ #include "storage/dsm_impl.h" #include "storage/fd.h" #include "storage/ipc.h" +#include "storage/reinit.h" #include "utils/builtins.h" #include "utils/elog.h" #include "utils/ps_status.h" @@ -959,12 +961,44 @@ sendDir(const char *path, int basepathlen, bool sizeonly, List *tablespaces, charpathbuf[MAXPGPATH * 2]; struct stat statbuf; int64 size = 0; + const char *lastDir; /* Split last dir from parent path. */ + boolisDbDir = false;/* Does this directory contain relations? */ + + /* +* Determine if the current path is a database directory that can +* contain relations. +* +* Start by finding the location of the delimiter between the parent +* path and the current path. +*/ + lastDir = last_dir_separator(path); + + /* Does this path look like a database path (i.e. all digits)? */ + if (lastDir != NULL && + strspn(lastDir + 1, "0123456789") == strlen(lastDir + 1)) + { + /* Part of path that contains the parent directory. */ + int parentPathLen = lastDir - path; + + /* +* Mark path as a database directory if the parent path is either +* $PGDATA/base or a tablespace version path. +*/ + if (strncmp(path, "./base", parentPathLen) == 0 || + (parentPathLen >= (sizeof(TABLESPACE_VERSION_DIRECTORY) - 1) && +strncmp(lastDir - (sizeof(TABLESPACE_VERSION_DIRECTORY) - 1), +TABLESPACE_VERSION_DIRECTORY, +sizeof(TABLESPACE_VERSION_DIRECTORY) - 1) == 0)) +
Re: [HACKERS] Proposal: Local indexes for partitioned table
Alvaro Herrera wrote: > I think this is the right fix for this problem. I wonder about > exploring other callers of RelationGetIndexList to see who else could be > confused ... CLUSTER was also affected (and ALTER TABLE .. CLUSTER ON). Patched both and added simple tests. Couldn't detect any other immediate problems. Maybe UNIQUE indexes will have a similar hazard, with replica identity. Case closed. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: JIT compiling with LLVM v9.0
Hi, On 2018-01-25 18:40:53 +0300, Konstantin Knizhnik wrote: > As far as I understand generation of native code is now always done for all > supported expressions and individually by each backend. Mostly, yes. It's done "always" done, because there's cost based checks whether to do so or not. > I wonder it will be useful to do more efforts to understand when compilation > to native code should be done and when interpretation is better. > For example many JIT-able languages like Lua are using traces, i.e. query is > first interpreted and trace is generated. If the same trace is followed > more than N times, then native code is generated for it. Right. That's where I actually had started out, but my experimentation showed that that's not that interesting a path to pursue. Emitting code in much smaller increments (as you'd do so for individual expressions) has considerable overhead. We also have a planner that allows us reasonable guesses when to JIT and when not - something not available in many other languages. That said, nothing in the infrastructure would preent you from pursuing that, it'd just be a wrapper function for the generated exprs that tracks infocations. > Another question is whether it is sensible to redundantly do expensive work > (llvm compilation) in all backends. Right now we kinda have to, but I really want to get rid of that. There's some pointers included as constants in the generated code. I plan to work on getting rid of that requirement, but after getting the basics in (i.e. realistically not this release). Even after that I'm personally much more interested in caching the generated code inside a backend, rather than across backends. Function addresses et al being different between backends would add some complications, can be overcome, but I'm doubtful it's immediately worth it. > So before starting code generation, ExecReadyCompiledExpr can first > build signature and check if correspondent library is already present. > Also it will be easier to control space used by compiled libraries in > this Right, I definitely think we want to do that at some point not too far away in the future. That makes the applicability of JITing much broader. More advanced forms of this are that you JIT in the background for frequently executed code (so not to incur latency the first time somebody executes). Aand/or that you emit unoptimized code the first time through, which is quite quick, and run the optimizer after the query has been executed a number of times. Greetings, Andres Freund
Re: JIT compiling with LLVM v9.0
Hi, On 2018-01-25 10:00:14 +0100, Pierre Ducroquet wrote: > I don't know when this would be released, August-October range. > but the minimal supported LLVM > version will have a strong influence on the availability of that feature. If > today this JIT compiling was released with only LLVM 5/6 support, it would be > unusable for most Debian users (llvm-5 is only available in sid). Even llvm 4 > is not available in latest stable. > I'm already trying to build with llvm-4 and I'm going to try further with > llvm > 3.9 (Debian Stretch doesn't have a more recent than this one, and I won't > have > something better to play with my data), I'll keep you informed. For sport, I > may also try llvm 3.5 (for Debian Jessie). I don't think it's unreasonable to not support super old llvm versions. This is a complex feature, and will take some time to mature. Supporting too many LLVM versions at the outset will have some cost. Versions before 3.8 would require supporting mcjit rather than orc, and I don't think that'd be worth doing. I think 3.9 might be a reasonable baseline... Greetings, Andres Freund
Re: Possible performance regression with pg_dump of a large number of relations
Luke, * Luke Cowell (lcow...@gmail.com) wrote: > > On Jan 24, 2018, at 2:56 PM, Stephen Frost wrote: > >>> ERROR: relation "pg_init_privs" does not exist > >>> LINE 139: LEFT JOIN pg_init_privs pip > > > > I certainly hope that works on 9.6, since that's when pg_init_privs was > > added.. > > My mistake. That error is from my 9.5 server. It does error on 9.6, but I get > the following error: No worries! > $ psql --version > psql (PostgreSQL) 9.6.6 > $ psql postgres < query.sql > ERROR: function pg_get_partkeydef(oid) does not exist > LINE 126: pg_get_partkeydef(c.oid) AS partkeydef, > ^ > HINT: No function matches the given name and argument types. You might need > to add explicit type casts. Ah, yes, that makes sense, that function wasn't included in 9.6. That's not really a problem though, pg_dump knows how to talk to different versions of PG, it just happens that the query I sent was the one for HEAD and not for older versions. > > Presuming I can make it work, the idea would be to back-port it to 9.6 > > and 10, since pg_init_privs and this code was added in 9.6. > > A 9.6 backport would be excellent. Yup, I'll make sure that the query that's fun for 9.6 server systems works on, well, 9.6 systems. ;) Thanks! Stephen signature.asc Description: PGP signature
Re: plpgsql function startup-time improvements
Hi 2018-01-25 0:16 GMT+01:00 Tom Lane : > Pavel Stehule writes: > > please, can you rebase all three patches necessary for patching? > > Done. These now need to be applied over > https://www.postgresql.org/message-id/833.1516834...@sss.pgh.pa.us Thank you I checked it 1. there are no problem with patching 2. all tests passed 3. I can confirm so some best case speedup is about 15-20%. Worst case is hard to test - but these changes should not be slower than current master. 4. faster-plpgsql-datum-setup-2.patch is simple patch with few lines of new code 5. plpgsql-promises-2.patch is little bit more complex, but still it is simple 6. the code is well formatted and well commented 7. there are not new test and code, what is not problem - these patches doesn't add any new feature I'll mark these patches as ready for commiter Regards Pavel > > > regards, tom lane > >
Re: CONSTANT/NOT NULL/initializer properties for plpgsql record variables
> On 25 Jan 2018, at 16:34, Tom Lane wrote: > Daniel Gustafsson writes: >> One tiny thing: while not introduced in this patch, I wonder if it would be >> worth adding an errhint in the following hunk when applied to arrays, to >> clarify what CONSTANT in an array declaration mean. I have seen confusion >> around what y means in ‘x int[y]’, and I can see ‘x CONSTANT int[y]’ being >> misinterpreted as “length fixed to y”. > > Hmm. What would you imagine the hint saying? It's certainly true that > a lot of people don't understand that a declared array length doesn't > mean anything in Postgres, but that's not specific to this context. I was thinking about something along the lines of: errhint("In array declarations, CONSTANT refers to elements and not length."), .. when datatype->typisarray is true, but again it might just be waffling that doesn’t help anyone. cheers ./daniel
Re: Possible performance regression with pg_dump of a large number of relations
> On Jan 24, 2018, at 2:56 PM, Stephen Frost wrote: > > Hi there! > > >>> ERROR: relation "pg_init_privs" does not exist >>> LINE 139: LEFT JOIN pg_init_privs pip > > I certainly hope that works on 9.6, since that's when pg_init_privs was > added.. My mistake. That error is from my 9.5 server. It does error on 9.6, but I get the following error: $ psql --version psql (PostgreSQL) 9.6.6 $ psql postgres < query.sql ERROR: function pg_get_partkeydef(oid) does not exist LINE 126: pg_get_partkeydef(c.oid) AS partkeydef, ^ HINT: No function matches the given name and argument types. You might need to add explicit type casts. > Presuming I can make it work, the idea would be to back-port it to 9.6 > and 10, since pg_init_privs and this code was added in 9.6. A 9.6 backport would be excellent. Thanks again! Luke
Re: [PATCH][PROPOSAL] Refuse setting toast.* reloptions when TOAST table does not exist
В письме от 25 января 2018 11:29:34 пользователь Tom Lane написал: > Alvaro Herrera writes: > > Tom Lane wrote: > >> Well, maybe the right answer is to address that. It's clear to me > >> why that would happen if we store these things as reloptions on the > >> toast table, but can't they be stored on the parent table? > > > > Actually, Nikolay provided a possible solution: if you execute ALTER > > TABLE SET (toast.foobar = xyz), and a toast table doesn't exist, create > > one at that point. > > That adds a lot of overhead if you never actually need the toast table. I think this overhead case is not that terrible if it is properly warned ;-) > Still, maybe it's an appropriate amount of effort compared to the size > of the use-case for this. If you came to some final conclustion, please close the commiffest item with "Return with feedback" resolution, and I write another patch... -- Do code for fun. signature.asc Description: This is a digitally signed message part.
Re: [HACKERS] [POC] Faster processing at Gather node
On Tue, Jan 9, 2018 at 7:09 PM, Andres Freund wrote: >> + * mq_sender and mq_bytes_written can only be changed by the sender. >> + * mq_receiver and mq_sender are protected by mq_mutex, although, >> importantly, >> + * they cannot change once set, and thus may be read without a lock once >> this >> + * is known to be the case. > > I don't recall our conversation around this anymore, and haven't read > down far enough to see the relevant code. Lest I forget: Such construct > often need careful use of barriers. I think the only thing the code assumes here is that if we previously read the value with the spinlock and didn't get NULL, we can later read the value without the spinlock and count on seeing the same value we saw previously. I think that's safe enough. > s/should/is/ or similar? I prefer it the way that I have it. > Perhaps a short benchmark for 32bit systems using shm_mq wouldn't hurt? > I suspect there won't be much of a performance impact, but it's probably > worth checking. I don't think I understand your concern here. If this is used on a system where we're emulating atomics and barriers in painful ways, it might hurt performance, but I think we have a policy of not caring. Also, I don't know where I'd find a 32-bit system to test. >> - * Importantly, mq_ring can be safely read and written without a lock. Were >> - * this not the case, we'd have to hold the spinlock for much longer >> - * intervals, and performance might suffer. Fortunately, that's not >> - * necessary. At any given time, the difference between mq_bytes_read and >> + * At any given time, the difference between mq_bytes_read and > > Hm, why did you remove the first part about mq_ring itself? Bad editing. Restored. >> @@ -848,18 +868,19 @@ shm_mq_send_bytes(shm_mq_handle *mqh, Size nbytes, >> const void *data, >> >> while (sent < nbytes) >> { >> - booldetached; >> uint64 rb; >> + uint64 wb; >> >> /* Compute number of ring buffer bytes used and available. */ >> - rb = shm_mq_get_bytes_read(mq, &detached); >> - Assert(mq->mq_bytes_written >= rb); >> - used = mq->mq_bytes_written - rb; >> + rb = pg_atomic_read_u64(&mq->mq_bytes_read); >> + wb = pg_atomic_read_u64(&mq->mq_bytes_written); >> + Assert(wb >= rb); >> + used = wb - rb; > > Just to make sure my understanding is correct: No barriers needed here > because "bytes_written" is only written by the sending backend, and > "bytes_read" cannot lap it. Correct? We can't possibly read a stale value of mq_bytes_written because we are the only process that can write it. It's possible that the receiver has just increased mq_bytes_read and that the change isn't visible to us yet, but if so, the sender's also going to set our latch, or has done so already. So the worst thing that happens is that we decide to sleep because it looks like no space is available and almost immediately get woken up because there really is space. >> Assert(used <= ringsize); >> available = Min(ringsize - used, nbytes - sent); >> >> /* Bail out if the queue has been detached. */ >> - if (detached) >> + if (mq->mq_detached) > > Hm, do all paths here guarantee that mq->mq_detached won't be stored on > the stack / register in the first iteration, and then not reread any > further? I think it's fine because every branch of the if below ends up > in a syscall / barrier, but it might be worth noting on that here. Aargh. I hate compilers. I added a comment. Should I think about changing mq_detached to pg_atomic_uint32 instead? > Perhaps mention that this could lead to spuriously signalling the wrong > backend in case of detach, but that that is fine? I think that's a general risk of latches that doesn't need to be specifically recapitulated here. > I know the scheme isn't new, but I do find it not immediately obvious > that 'wb' is short for 'bytes_written'. Sorry. >> - /* Write as much data as we can via a single memcpy(). >> */ >> + /* >> + * Write as much data as we can via a single memcpy(). >> Make sure >> + * these writes happen after the read of >> mq_bytes_read, above. >> + * This barrier pairs with the one in >> shm_mq_inc_bytes_read. >> + */ > > s/above/above. Otherwise a newer mq_bytes_read could become visible > before the corresponding reads have fully finished./? I don't find that very clear. A newer mq_bytes_read could become visible whenever, and a barrier doesn't prevent that from happening. What it does is ensure (together with the one in shm_mq_inc_bytes_read) that we don't try to read bytes that aren't fully *written* yet. Generally, my mental model is that barriers make things happen in program order
Re: Further cleanup of pg_dump/pg_restore item selection code
"David G. Johnston" writes: > Should pg_restore fail if asked to --create without a database entry in the > TOC? Yeah, I wondered about that too. This patch makes it a non-issue for archives created with v11 or later pg_dump, but there's still a hazard if you're restoring from an archive made by an older pg_dump. Is it worth putting in logic to catch that? Given the lack of field complaints, I felt fixing it going forward is sufficient, but there's room to argue differently. regards, tom lane
Re: [PROPOSAL] Shared Ispell dictionaries
Hello, Thank you for your review! Good catches. On Thu, Jan 25, 2018 at 03:26:46PM +0300, Ildus Kurbangaliev wrote: > In 0001 there are few lines where is only indentation has changed. Fixed. > 0002: > - TsearchShmemSize - calculating size using hash_estimate_size seems > redundant since you use DSA hash now. Fixed. True, there is no need in hash_estimate_size anymore. > - ts_dict_shmem_release - LWLockAcquire in the beginning makes no > sense, since dict_table couldn't change anyway. Fixed. In earlier version tsearch_ctl was used here, but I forgot to remove LWLockAcquire. > 0003: > - ts_dict_shmem_location could return IspellDictData, it makes more > sense. I assume that ts_dict_shmem_location can be used by various types of dictionaries, not only by Ispell. So void * more suitable here. > 0006: > It's very subjective, but I think it would nicer to call option as > Shared (as property of dictionary) or UseSharedMemory, the boolean > option called SharedMemory sounds weird. Agree. In our offline conversation we came to Shareable, that is a dictionary can be shared. It may be more appropriate because setting Shareable=true doesn't guarantee that a dictionary will be allocated in shared memory due to max_shared_dictionaries_size GUC. Attached new version of the patch. -- Arthur Zakirov Postgres Professional: http://www.postgrespro.com Russian Postgres Company diff --git a/src/backend/tsearch/spell.c b/src/backend/tsearch/spell.c index b9fdd77e19..e071994523 100644 --- a/src/backend/tsearch/spell.c +++ b/src/backend/tsearch/spell.c @@ -78,6 +78,8 @@ #define tmpalloc(sz) MemoryContextAlloc(Conf->buildCxt, (sz)) #define tmpalloc0(sz) MemoryContextAllocZero(Conf->buildCxt, (sz)) +#define tmpstrdup(str) MemoryContextStrdup(Conf->buildCxt, (str)) + /* * Prepare for constructing an ISpell dictionary. * @@ -498,7 +500,7 @@ NIAddSpell(IspellDict *Conf, const char *word, const char *flag) Conf->Spell[Conf->nspell] = (SPELL *) tmpalloc(SPELLHDRSZ + strlen(word) + 1); strcpy(Conf->Spell[Conf->nspell]->word, word); Conf->Spell[Conf->nspell]->p.flag = (*flag != '\0') - ? cpstrdup(Conf, flag) : VoidString; + ? tmpstrdup(flag) : VoidString; Conf->nspell++; } @@ -1040,7 +1042,7 @@ setCompoundAffixFlagValue(IspellDict *Conf, CompoundAffixFlag *entry, entry->flag.i = i; } else - entry->flag.s = cpstrdup(Conf, s); + entry->flag.s = tmpstrdup(s); entry->flagMode = Conf->flagMode; entry->value = val; @@ -1536,6 +1538,9 @@ nextline: return; isnewformat: + pfree(recoded); + pfree(pstr); + if (oldformat) ereport(ERROR, (errcode(ERRCODE_CONFIG_FILE_ERROR), diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml index 45b2af14eb..46617df852 100644 --- a/doc/src/sgml/config.sgml +++ b/doc/src/sgml/config.sgml @@ -1364,6 +1364,35 @@ include_dir 'conf.d' + + max_shared_dictionaries_size (integer) + + max_shared_dictionaries_size configuration parameter + + + + +Sets the maximum size of all text search dictionaries loaded into shared +memory. The default is 100 megabytes (100MB). This +parameter can only be set at server start. + + + +Currently controls only loading of Ispell +dictionaries (see ). +After compiling the dictionary it will be copied into shared memory. +Another backends on first use of the dictionary will use it from shared +memory, so it doesn't need to compile the dictionary second time. + + + +If total size of simultaneously loaded dictionaries reaches the maximum +allowed size then a new dictionary will be loaded into local memory of +a backend. + + + + huge_pages (enum) diff --git a/src/backend/commands/tsearchcmds.c b/src/backend/commands/tsearchcmds.c index bdf3857ce4..42be77d045 100644 --- a/src/backend/commands/tsearchcmds.c +++ b/src/backend/commands/tsearchcmds.c @@ -39,6 +39,7 @@ #include "nodes/makefuncs.h" #include "parser/parse_func.h" #include "tsearch/ts_cache.h" +#include "tsearch/ts_shared.h" #include "tsearch/ts_utils.h" #include "utils/builtins.h" #include "utils/fmgroids.h" @@ -396,7 +397,8 @@ verify_dictoptions(Oid tmplId, List *dictoptions) * Call the init method and see if it complains. We don't worry about * it leaking memory, since our command will soon be over anyway. */ - (void) OidFunctionCall1(initmethod, PointerGetDatum(dictoptions)); + (void) OidFunctionCall2(initmethod, PointerGetDatum(dictoptions), + ObjectIdGetDatum(InvalidOid)); } Releas
Re: [PATCH][PROPOSAL] Refuse setting toast.* reloptions when TOAST table does not exist
Alvaro Herrera writes: > Tom Lane wrote: >> Well, maybe the right answer is to address that. It's clear to me >> why that would happen if we store these things as reloptions on the >> toast table, but can't they be stored on the parent table? > Actually, Nikolay provided a possible solution: if you execute ALTER > TABLE SET (toast.foobar = xyz), and a toast table doesn't exist, create > one at that point. That adds a lot of overhead if you never actually need the toast table. Still, maybe it's an appropriate amount of effort compared to the size of the use-case for this. regards, tom lane
Re: Doc tweak for huge_pages?
On 1/22/18 01:10, Thomas Munro wrote: > Sorry, right, that was 100% wrong. It would probably be correct to > remove the "not", but let's just remove that bit. New version > attached. Committed that. I reordered some of the existing material because it seemed to have gotten a bit out of order with repeated patching. I also softened the advice against THP just a bit, since that is apparently still changing all the time. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: unique indexes on partitioned tables
Hi Alvaro, On 01/22/2018 05:55 PM, Alvaro Herrera wrote: Alvaro Herrera wrote: Version 4 of this patch, rebased on today's master. Passes make check-world. Maybe add a test case to indexing.sql that highlights that hash indexes doesn't support UNIQUE; although not unique to partitioned indexes. Thanks for working on this ! Best regards, Jesper
CONSTANT/NOT NULL/initializer properties for plpgsql record variables
On Thursday, January 25, 2018, Tom Lane wrote: > > The documentation currently says > > The CONSTANT option prevents the variable from being assigned to > after initialization, so that its value will remain constant for > the duration of the block. > While we don't really have the concept of mutable types other languages do. Maybe just saying "from being assigned or mutated after initialization" would suffice. I don't see a desirable way to reinforce that the y in arr[y] is unenforced documentation - in the code or docs - beyond what is already done. That distinction doesn't usually come up and for those where it does the docs will probably be an after-the-fact confirmation of observed behavior. David J.
Re: [PATCH][PROPOSAL] Refuse setting toast.* reloptions when TOAST table does not exist
Tom Lane wrote: > Peter Eisentraut writes: > > On 1/23/18 13:39, Robert Haas wrote: > >> I don't understand. AAUI, it is currently the case that if you set > >> the options before the TOAST table exists, they are lost. > > > Well, that's also weird. > > Well, maybe the right answer is to address that. It's clear to me > why that would happen if we store these things as reloptions on the > toast table, but can't they be stored on the parent table? Actually, Nikolay provided a possible solution: if you execute ALTER TABLE SET (toast.foobar = xyz), and a toast table doesn't exist, create one at that point. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: JIT compiling with LLVM v9.0
On 24.01.2018 10:20, Andres Freund wrote: Hi, I've spent the last weeks working on my LLVM compilation patchset. In the course of that I *heavily* revised it. While still a good bit away from committable, it's IMO definitely not a prototype anymore. There's too many small changes, so I'm only going to list the major things. A good bit of that is new. The actual LLVM IR emissions itself hasn't changed that drastically. Since I've not described them in detail before I'll describe from scratch in a few cases, even if things haven't fully changed. == JIT Interface == To avoid emitting code in very small increments (increases mmap/mremap rw vs exec remapping, compile/optimization time), code generation doesn't happen for every single expression individually, but in batches. The basic object to emit code via is a jit context created with: extern LLVMJitContext *llvm_create_context(bool optimize); which in case of expression is stored on-demand in the EState. For other usecases that might not be the right location. To emit LLVM IR (ie. the portabe code that LLVM then optimizes and generates native code for), one gets a module from that with: extern LLVMModuleRef llvm_mutable_module(LLVMJitContext *context); to which "arbitrary" numbers of functions can be added. In case of expression evaluation, we get the module once for every expression, and emit one function for the expression itself, and one for every applicable/referenced deform function. As explained above, we do not want to emit code immediately from within ExecInitExpr()/ExecReadyExpr(). To facilitate that readying a JITed expression sets the function to callback, which gets the actual native function on the first actual call. That allows to batch together the generation of all native functions that are defined before the first expression is evaluated - in a lot of queries that'll be all. Said callback then calls extern void *llvm_get_function(LLVMJitContext *context, const char *funcname); which'll emit code for the "in progress" mutable module if necessary, and then searches all generated functions for the name. The names are created via extern void *llvm_get_function(LLVMJitContext *context, const char *funcname); currently "evalexpr" and deform" with a generation and counter suffix. Currently expression which do not have access to an EState, basically all "parent" less expressions, aren't JIT compiled. That could be changed, but I so far do not see a huge need. Hi, As far as I understand generation of native code is now always done for all supported expressions and individually by each backend. I wonder it will be useful to do more efforts to understand when compilation to native code should be done and when interpretation is better. For example many JIT-able languages like Lua are using traces, i.e. query is first interpreted and trace is generated. If the same trace is followed more than N times, then native code is generated for it. In context of DBMS executor it is obvious that only frequently executed or expensive queries have to be compiled. So we can use estimated plan cost and number of query executions as simple criteria for JIT-ing the query. May be compilation of simple queries (with small cost) should be done only for prepared statements... Another question is whether it is sensible to redundantly do expensive work (llvm compilation) in all backends. This question refers to shared prepared statement cache. But even without such cache, it seems to be possible to use for library name some signature of the compiled expression and allow to share this libraries between backends. So before starting code generation, ExecReadyCompiledExpr can first build signature and check if correspondent library is already present. Also it will be easier to control space used by compiled libraries in this case. -- Konstantin Knizhnik Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: CONSTANT/NOT NULL/initializer properties for plpgsql record variables
Daniel Gustafsson writes: > I’ve reviewed this patch (disclaimer: I did not review the patches listed > above > which it is based on) and the functionality introduced. The code is straight- > forward, there are ample tests and I can’t make it break however many weird > combinations thrown at it. Regarding the functionality it’s clearly a +1 on > getting this in. Thanks for reviewing! > One tiny thing: while not introduced in this patch, I wonder if it would be > worth adding an errhint in the following hunk when applied to arrays, to > clarify what CONSTANT in an array declaration mean. I have seen confusion > around what y means in ‘x int[y]’, and I can see ‘x CONSTANT int[y]’ being > misinterpreted as “length fixed to y”. Hmm. What would you imagine the hint saying? It's certainly true that a lot of people don't understand that a declared array length doesn't mean anything in Postgres, but that's not specific to this context. > That might not be a common enough misunderstanding to warrant it, but it was > the only thing that stood out to me (and perhaps it would be better in the > docs > if done at all). The documentation currently says The CONSTANT option prevents the variable from being assigned to after initialization, so that its value will remain constant for the duration of the block. I don't mind changing that if we can think of clearer wording, but to me it seems pretty clear already. regards, tom lane
Re: [HACKERS] SERIALIZABLE with parallel query
On Wed, Jan 24, 2018 at 7:39 PM, Thomas Munro wrote: > This started crashing some time yesterday with an assertion failure in > the isolation tests after commit 2badb5af landed. Reordering of code > in parallel.c confused patch's fuzz heuristics leading > SetSerializableXact() to be called too soon. Here's a fix for that. I took a look at this today and thought it might be OK to commit, modulo a few minor issues: (1) you didn't document the new tranche and (2) I prefer to avoid if (blah) { Assert(thing) } in favor of Assert(!blah || thing). But then I got a little bit concerned about ReleasePredicateLocks(). Suppose that in the middle of a read-only transaction, SXACT_FLAG_RO_SAFE becomes true. The next call to SerializationNeededForRead in each process will call ReleasePredicateLocks. In the workers, this won't do anything, so we'll just keep coming back there. But in the leader, we'll go ahead do all that stuff, including MySerializableXact = InvalidSerializableXact. But in the workers, it's still set. Maybe that's OK, but I'm not sure that it's OK... -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: PATCH: pgbench - option to build using ppoll() for larger connection counts
ISTM that the v7 patch version you sent is identical to v6. -- Fabien.
Re: reducing isolation tests runtime
Alvaro Herrera writes: > Tom Lane wrote: >> Alvaro Herrera writes: >>> On the subject of test total time, we could paralelize isolation tests. >> BTW, one small issue there is that the reason the timeouts test is so >> slow is that we have to use multi-second timeouts to be sure slower >> buildfarm critters (eg valgrind animals) will get the expected results. >> So I'm worried that if the machine isn't otherwise idle, we will get >> random failures. > I think we could solve this by putting in the same parallel group only > slow tests that mostly sleeps, ie. nothing that would monopolize CPU for > long enough to cause a problem. Concretely: > test: timeouts tuplelock-update deadlock-hard deadlock-soft-2 OK, but there'd better be a comment there explaining the concern very precisely, or somebody will break it. regards, tom lane
Re: [HACKERS] Vacuum: allow usage of more than 1GB of work mem
Claudio Freire wrote: > On Thu, Jan 25, 2018 at 4:11 AM, Thomas Munro > wrote: > > *** 128,134 > > SELECT pg_relation_size('vactst', 'main'); > >pg_relation_size > > -- > > ! 0 > > (1 row) > > > > SELECT count(*) FROM vactst; > > --- 128,134 > > SELECT pg_relation_size('vactst', 'main'); > >pg_relation_size > > -- > > ! 8192 > > (1 row) > However, shouldn't an empty relation have an initial page anyway? In > that case shouldn't the correct value be 8192? No, it's legal for an empty table to have size 0 on disk. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: [PATCH][PROPOSAL] Refuse setting toast.* reloptions when TOAST table does not exist
Peter Eisentraut writes: > On 1/23/18 13:39, Robert Haas wrote: >> I don't understand. AAUI, it is currently the case that if you set >> the options before the TOAST table exists, they are lost. > Well, that's also weird. Well, maybe the right answer is to address that. It's clear to me why that would happen if we store these things as reloptions on the toast table, but can't they be stored on the parent table? Backward compatibility might be an issue, but I doubt that there are enough people using these options that we couldn't get away with breaking things, if we're unable to find a decent automatic conversion method. regards, tom lane